Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Running Spark on a cluster?

$
0
0

Its nice to set it up on a sandbox vm. But lets get real. There’s a huge gap when you want to install and run on a real cluster.
Looking at the spark apache website:

https://spark.apache.org/docs/latest/running-on-yarn.html

There are gaps.
It seems to imply that we add the spark properties in the yarn-site.xml (Is this the case?)
If we decide to push the jar to hdfs, then we have to set the shell variable:
export SPARK_JAR=hdfs:///some/path.

This would involve updating the yarn template, no?
On your site, you write:
<property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*</value>
</property>

Silly me, where’s the spark library? (Is this a typo?)

Does Hortonworks require the spark libs to be from the HDP site, or will the release from the Apache site work?

And if we update the property does that mean we have to push the jars to every node, rather than use HDFS?

Thx

-Mike


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>