Accessing Hive From Spark Issue

May 6, 2015, 9:17 am

≪ Previous: How to view metrics with Ambari AMS

We are currently working on HDP version 2.2.4.2-2 and there seem to be a problem running Hive queries that access the hive metastore from spark on cluster mode.
The “Hive From Spark”example provided with the spark installation can be used to highlight the issue we are facing. It can be reproduced by browsing to the $Spark_HOME folder and running:

./bin/spark-submit –class org.apache.spark.examples.sql.hive.HiveFromSpark –master yarn-cluster –num-executors 2 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar

The output that was retrieved from the error logs is:
Exception in thread “Driver” java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:233)
at scala.Option.orElse(Option.scala:257)
… 26 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
… 31 more
Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
… 36 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
… 55 more

This problem occurs with out-of-the-box configuration on the HDP distribution. The hive-site.xml for spark was not changed so it looks like this:
<configuration>

<property>
<name>hive.metastore.uris</name>
<value>thrift://sandbox.hortonworks.com:9083</value>
</property>

</configuration>

is there any configuration or access control changes required to access the HIVE metastore from spark?

Thanks in advance!

↧

Reply To: Solr on HDFS ?

May 6, 2015, 9:21 am

≫ Next: Welcome to Ambari User View Forum!

≪ Previous: Accessing Hive From Spark Issue

Hi Christophe,

I had the same issue but the following link worked for me:

https://github.com/abajwa-hw/solr-stack

Regards,

Jorge.

↧

Welcome to Ambari User View Forum!

May 6, 2015, 2:39 pm

≫ Next: Reply To: Migrating to 2.2 : Ambari Hive fails because of stack version mismatch

≪ Previous: Reply To: Solr on HDFS ?

This page is a welcome message for the Ambari User Views Forums. We hope that you will find this a useful resource, as it is intended to be a place to post questions and share information. Please play nice. This introduction is designed to help you find your way around the forums and to hopefully get you started as quickly as possible.

Search First
Many of the questions that are posted in the forums have been posted many times before, and get the same answers each time. Before starting a new thread, please try searching the forums for an answer. The forums has been indexed by Google, so if you can’t find your answer with a forum search, it is worth typing it into Google too (e.g. Google for SITE:hortonworks.com/community/forums/forum/ambari-user-views/ widget). If you do find a topic that describes a problem similar to yours, read it by all means but do not post in that topic to ask for help.

Where To Post
If this is a general Ambari operations questions please post in the Ambari Forum (http://hortonworks.com/community/forums/forum/ambari/)
Multi posting is frowned upon. Duplicate postings will be closed or might be deleted. Posting your own topic is smiled upon.

Choose A Good Topic Title
When you start a new topic, try to use an appropriate title. “Ambari User View for Hive error when creating new SQL statement” is a good title. “URGENT HELP PLS!!!” is not a good title. A good title can be the difference between getting help and not.

If you have a question about a specific Ambari User View, then we suggest adding that user view into the topic.

Include as Much Information as Possible
Put down all the detail you can about your Hadoop Cluster, your version of Ambari and Ambari User Views and the problem you are having. If you are seeing errors, copy them word for word into your post. It’s best to include the full error than to summarize – sometimes wording is important. If you did something before you came across a problem, write down exactly what you did.

You can post links to screenshots to illustrate your issues, but don’t use screenshots in place of a website link. Forum volunteers need a link to the webpage with the issue, to see a working (or broken) example of it.

↧

Reply To: Migrating to 2.2 : Ambari Hive fails because of stack version mismatch

May 7, 2015, 1:09 am

≫ Next: How to Add Gnome GUI to Hortonworks Sandbox 2.1

≪ Previous: Welcome to Ambari User View Forum!

OK, figured out. It’s a bug in ambari oozie status that make it hang, I’ll post another message about it.

↧

How to Add Gnome GUI to Hortonworks Sandbox 2.1

May 7, 2015, 5:11 am

≫ Next: Problem upgrading to version 2.2.4.2

≪ Previous: Reply To: Migrating to 2.2 : Ambari Hive fails because of stack version mismatch

Guys,

here is the instruction of how to add Gnome GUI to the sandbox virtual machine:

http://ihorbobak.com/index.php/2015/05/07/how-to-add-gnome-gui-to-hortonworks-sandbox-2-1/

I was tired to work through console, wanted to create a comfortable IDE in the sandbox.
So, here it is.

With the best regards,
Ihor

↧

Problem upgrading to version 2.2.4.2

May 7, 2015, 5:29 am

≫ Next: Reply To: saveAsTextFile() to save output of Spark program to HDFS

≪ Previous: How to Add Gnome GUI to Hortonworks Sandbox 2.1

Hi,

I’ve been following this document on how to upgrade HDP 2.2.0 to 2.2.4:

http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#ref-3186d972-cf7e-4711-b56c-a2221fed5909

4 of my cluster nodes status is Upgraded
While one node is still stuck on Upgrading.

Executing ambari-server set-current –cluster-name=MyCluster –version-display-name=HDP-2.2.4.2
Returns:
ERROR: Exiting with exit code 1.
REASON: Error during setting current version. Http status code – 500.
{
“status” : 500,
“message” : “org.apache.ambari.server.controller.spi.SystemException: Finalization failed. More details: \nSTDOUT: Begin finalizing the upgrade of cluster CELHDP to version 2.2.4.2-2\nThe following 1 host(s) have not been upgraded to version 2.2.4.2-2. Please install and upgrade the Stack Version on those hosts and try again.\nHosts: edgenode01\n\nSTDERR: The following 1 host(s) have not been upgraded to version 2.2.4.2-2. Please install and upgrade the Stack Version on those hosts and try again.\nHosts: edgenode01\n”
}

How can I know what is still missing for my edgenode01 to accomplish the upgrading task?

↧

Reply To: saveAsTextFile() to save output of Spark program to HDFS

May 7, 2015, 6:20 am

≫ Next: Grant permissions not working

≪ Previous: Problem upgrading to version 2.2.4.2

I would suggest you double-check permissions on hdfs:///tmp

Interestingly, I’m trying to get the same scenario running, but I packaged Hortonworks HDFS in separate Docker containers (namenodes and datanodes) and Spark 1.3.1 in the standalone mode into another Docker container. And I’m facing yet another problem: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. Have you seen this along your way?

↧

Grant permissions not working

May 7, 2015, 6:48 am

≫ Next: Reply To: Grant permissions not working

≪ Previous: Reply To: saveAsTextFile() to save output of Spark program to HDFS

Hi,
I’m using Hive14.0 on actual HDP-Sanbox2.2 as one-node system out of the box (changed no settings before). Now I’m trying to grant SQL based permissions to hive as described in Hive Wiki
If I Try to do anything with permissions, as example: “SHOW ROLES;”, in Hive-Shell (as user Hive) I got:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Failed to retrieve roles for hive: Metastore Authorization api invocation for remote metastore is disabled in this configuration.

The settings I found in Hive-wiki for Hive-site.xml were already set:
<property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> <property> <name>hive.users.in.admin.role</name> <value>hue,hive</value> </property> <property> <name>hive.security.metastore.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider,org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly</value> </property> <property> <name>hive.security.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory</value> </property>

My hiveserver2-site.xml is now this:
<property> <name>hive.security.authorization.enabled</name> <value>true</value> </property> <property> <name>hive.security.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value> </property> <property> <name>hive.security.authenticator.manager</name> <value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value> </property> <property> <name>hive.metastore.uris</name> <value>' '</value> </property> <property> <name>hive.conf.restricted.list</name> <value>hive.security.authorization.enabled,hive.security.authorization.manager,hive.security.authenticator.manager</value> </property>

I restarted hiveserver2 and the virtual machine / sandbox. Always the same error.

Did I missed any settings?

↧

Reply To: Grant permissions not working

May 7, 2015, 7:20 am

≫ Next: Reply To: Grant permissions not working

≪ Previous: Grant permissions not working

Short version : Please use jdbc(/beeline) or odbc instead.

Longer version:
SQL standards based authorization mode is designed for fine grained access control (row and column level). Fine grained access control can’t be enforced for hive-cli users as they have direct access to HDFS. As a result, the authorization checks are enforced via HiveServer2. To prevent modification of the authorization policy by accessing metastore directly, the authorization api call invocations are allowed only from HiveServer2.

I have improved the error message in Hive 1.2.0 . https://issues.apache.org/jira/browse/HIVE-10543

↧

Reply To: Grant permissions not working

May 7, 2015, 7:43 am

≫ Next: Reply To: Grant permissions not working

≪ Previous: Reply To: Grant permissions not working

thanks for your fast response, this solved the first error (metastore authorization), but the message changed:
0: jdbc:hive2://localhost:10000> show roles; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. getAllRoles() not implemented in Argus HiveAuthorizer (state=08S01,code=1) 0: jdbc:hive2://localhost:10000> set role admin; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. setCurrentRole() not implemented in Argus HiveAuthorizer (state=08S01,code=1)

Hope you can give me again the right hint.

↧

Reply To: Grant permissions not working

May 7, 2015, 7:49 am

≫ Next: Reply To: Accessing Hive From Spark Issue

≪ Previous: Reply To: Grant permissions not working

Looks like your current configuration is using Apache Ranger (earlier called Argus) for authorization.
The hive configuration files above look correct for sql standards based authorization, but maybe some config options are being specified as hiveserver2 startup command line options ?

↧

Reply To: Accessing Hive From Spark Issue

May 7, 2015, 9:54 am

≫ Next: Reply To: Network issues with sandbox

≪ Previous: Reply To: Grant permissions not working

You need to add the Datanucleus jars explicitly. Please try:

$ ./bin/spark-submit –class org.apache.spark.examples.sql.hive.HiveFromSpark –master yarn-cluster –num-executors 2 –driver-memory 512m –executor-memory 512m –executor-cores 1 –jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar lib/spark-examples*.jar

Alternatively, you can run in the yarn-client mode:

$ ./bin/spark-submit –class org.apache.spark.examples.sql.hive.HiveFromSpark –master yarn-client –num-executors 2 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar

↧

Reply To: Network issues with sandbox

May 7, 2015, 7:29 pm

≫ Next: changing thrift to use hive.server2.transport.mode=http

≪ Previous: Reply To: Accessing Hive From Spark Issue

when you type “route” on the cli do you get a line that starts with “default 192.168.xxx.2″ ?
can you ping that address – no? you can’t see the gateway you need ot fix the vmware-netcfg
can you ping 8.8.8.8 – no? you don’t have a gateway setup on the VM
can you ping http://www.google.com – no? you don’t have dns to resolve to. put DNS1=8.8.8.8 in ifcfg-eth0 or put nameserver 8.8.8.8 in /etc/resolv (then you must restart the network)

↧

changing thrift to use hive.server2.transport.mode=http

May 7, 2015, 7:42 pm

≫ Next: Map-Reduce and Homogeneous vs Heterogeneous Data sets

≪ Previous: Reply To: Network issues with sandbox

I followed a couple of parts of the following tutorial

http://hortonworks.com/blog/secure-jdbc-odbc-clients-access-hiveserver2/

So I can ensure that I can access thrift on port 10001 for http mode access to cliservice.

This is because a client tool TeradataStudio15.10 requires connection to the metastore via this method.

Currently I cannot get port 10001 to light up with I do netstat -letpn | grep 10001 -> and this makes me sad.

Does anybody have a how to that can tweak the Sandbox to do this happily?

↧

Map-Reduce and Homogeneous vs Heterogeneous Data sets

May 7, 2015, 9:59 pm

≫ Next: Reply To: Grant permissions not working

≪ Previous: changing thrift to use hive.server2.transport.mode=http

Diff b/w Homogeneous and Heterogeneous Data sets and why simple map reduce is not suitable for relational algebra?and also why Map-Reduce-Merge has been evolved? Proper explanation would be highly appreciated. Thanks in advance, as no one was able to answer it on Quora.

↧

Reply To: Grant permissions not working

May 8, 2015, 1:05 am

≫ Next: Hive job kill

≪ Previous: Map-Reduce and Homogeneous vs Heterogeneous Data sets

There are no options at startup which are displayed. With manually hiveserver2 restart it doesn’t work too. In the whole hive-site.xml there are no string matching Ranger or Argus. The sandbox is as I said out of the box, only the first-post-changes were made.
The full hive-env.sh is:
if [ "$SERVICE" = "cli" ]; then if [ -z "$DEBUG" ]; then export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit" else export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" fi fi

# The heap size of the jvm stared by hive shell script can be controlled via:

export HADOOP_HEAPSIZE="250"
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m $HADOOP_CLIENT_OPTS"

# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server (hwi etc).

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/etc/hive/conf

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then
export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then
export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog
fi

export METASTORE_PORT=9083

What I found in hive-site.xml and looks like more authentications is:
<property> <name>hive.server2.authentication</name> <value>NONE</value> </property> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/etc/security/keytabs/hive.service.keytab</value> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>hive/_HOST@EXAMPLE.COM</value> </property> <property> <name>hive.server2.authentication.spnego.keytab</name> <value>HTTP/_HOST@EXAMPLE.COM</value> </property> <property> <name>hive.server2.authentication.spnego.principal</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </property>

The ranger-security.xml contains:
<ranger>\n<enabled>Tue Dec 16 19:30:52 UTC 2014</enabled>\n</ranger>

At xasecure-autid.xml I found:
<property> <name>xasecure.audit.credential.provider.file</name> <value>jceks://file/etc/ranger/sandbox_hive/cred.jceks</value> </property> <property> <name>xasecure.audit.jpa.javax.persistence.jdbc.url</name> <value>jdbc:mysql://localhost/ranger_audit</value> </property>

↧

Hive job kill

May 8, 2015, 2:45 am

≫ Next: Reply To: Tez on HDP 2.2

≪ Previous: Reply To: Grant permissions not working

Hello,

Is there a way for end-user (fe. analytics team member) to kill his running query (other than job kill)?

Regards,
Karol

↧

Reply To: Tez on HDP 2.2

May 8, 2015, 4:46 am

≫ Next: Pig cannot read from HDFS

≪ Previous: Hive job kill

No one knows in this forum how could I improve the performance before looking for professional support?

↧

Pig cannot read from HDFS

May 8, 2015, 5:24 am

≫ Next: Reply To: Kafka source/channel

≪ Previous: Reply To: Tez on HDP 2.2

Environment :
HDP 2.2
Apache Pig version 0.14.0.2.2.4.2-2

Issue : No matter what,Pig refuse to read from HDFS

Eg : I have a testfile in hdfs /user directory

Pig can read the file as shown from below Grunt shell

grunt> ls /user/testfile
hdfs://node:8020/user/testfile

Now when i try to load and dump this file as below

grunt> A = load ‘/user/testfile';
grunt> dump A;

it gives a Java error as below

java.io.FileNotFoundException: File does not exist: hdfs://node.

and the final output is

Input(s):
Failed to read data from “/user/testfile”

I tried using the full HDFS path as below but the end result remains the same

Input(s):
Failed to read data from “hdfs://node:8020/user/testfile”

I was wondering whether anyone faced this issue?

My JAVA_HOME,PIG_HOME ,bin/pig and HADOOPDIR environemnt variables are set

Regards

Sanjay

↧

Reply To: Kafka source/channel

May 8, 2015, 7:26 am

≫ Next: Services failing to start during Kerberos setup from Ambari

≪ Previous: Pig cannot read from HDFS

I would also like an answer to this question. We have a work-around in the form of the Camus software, but I think I would prefer an official Flafka Kafka Flume Source.

↧

Latest Images