We are currently working on HDP version 2.2.4.2-2 and there seem to be a problem running Hive queries that access the hive metastore from spark on cluster mode.
The “Hive From Spark”example provided with the spark installation can be used to highlight the issue we are facing. It can be reproduced by browsing to the $Spark_HOME folder and running:
./bin/spark-submit –class org.apache.spark.examples.sql.hive.HiveFromSpark –master yarn-cluster –num-executors 2 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar
The output that was retrieved from the error logs is:
Exception in thread “Driver” java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:233)
at scala.Option.orElse(Option.scala:257)
… 26 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
… 31 more
Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
… 36 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
… 55 more
This problem occurs with out-of-the-box configuration on the HDP distribution. The hive-site.xml for spark was not changed so it looks like this:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://sandbox.hortonworks.com:9083</value>
</property>
</configuration>
is there any configuration or access control changes required to access the HIVE metastore from spark?
Thanks in advance!