In Zeppelin notebook I’m trying to load some data from HBase table. I’m following the description given here, but Spark fails on HBaseConfiguration import:
%spark
import org.apache.hadoop.hbase.HBaseConfiguration
:21: error: object hbase is not a member of package org.apache.hadoop
I’ve also tried to use Spark SQL, because my HBase table is also visible through Hive, and Spark fails similarly:
%sql select * from mytable limit 5
MetaException(message:java.lang.ClassNotFoundException Class org.apache.hadoop.hive.hbase.HBaseSerDe not found)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:346)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:631)
at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:189)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1017)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:202)
I believe that Spark/Zeppelin doesn’t have appriopriate jars on the classpath. I’ve found that I can add them using %dep interpreter. The only question is: how do I find which jars should be added?
Is there some description about jars on Hortonworks Sandbox and/or which jars should be added to Spark so that data from HBase can be read?