Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Spark action job tries to connect to Resource Mgr on wrong (default) address

$
0
0

Hi,

Environment: HDP 2.3. Oozie v4.2 (Oozie server 4.2.0.2.3.0.0-2557) on a cluster setup in EC2

Spark workflow job is stuck in RUNNING state as it tries to connect to the Resource Manager using the default address: 0.0.0.0:8032.
The namenode and jobtracker address and port have been correctly set to the yarn.resourcemanager.address and port.
As seen in the logs below, the MR task tries to connect to the ResourceManager at the correct address only once and then on tries connecting using the default address.

2015-09-07 04:33:30,865 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2015-09-07 04:33:31,213 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-172-31-15-94.us-west-2.compute.internal/172.31.15.94:8032
2015-09-07 04:33:32,109 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

2015-09-07 04:33:33,120 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-09-07 04:33:34,121 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Before running the Oozie job, I export YARN_CONF_DIR and HADOOP_CONF_DIR to /etc/hadoop/conf (where yarn-site.xml is present) as well.
I have trying the example Spark workflow job that comes with the Oozie archive (http://archive.apache.org/dist/oozie/4.2.0/). The MapReduce, Shell and Java workflow examples all work fine.
This is a small cluster with 2 data nodes and 1 name node.

job.properties:
nameNode=hdfs://ip-172-31-15-93.us-west-2.compute.internal:8020
jobTracker=ip-172-31-15-94.us-west-2.compute.internal:8032
master=yarn-client
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark

workflow.xml:
<workflow-app xmlns=’uri:oozie:workflow:0.5′ name=’SparkFileCopy’>
<start to=’spark-node’ />

<action name=’spark-node’>
<spark xmlns=”uri:oozie:spark-action:0.1″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path=”${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark”/>
</prepare>
<master>${master}</master>
<name>Spark-FileCopy</name>
<class>org.apache.oozie.example.SparkFileCopy</class>
<jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/oozie-examples-4.2.0.jar</jar>
<arg>${nameNode}/user/${wf:user()}/${examplesRoot}/data/data.txt</arg>
<arg>${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark</arg>
</spark>
<ok to=”end” />
<error to=”fail” />
</action>

<kill name=”fail”>
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name=’end’ />
</workflow-app>

Any leads to fix this issue will be of immense help.

Regards,
Ram


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>