Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

PySpark on HDP YARN

$
0
0

We have installed the latest version of HDP (2.2) in a little cluster of 6 machines and followed the instructions at: http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ to run spark on the cluster. But we are using PySpark, and when we submit the pyspark script (.py) to the cluster it is being run only on the main node, the one with the spark installation.

What do we need to do to make it run on all the nodes?

Should we install spark on each node? Which environment variables should we set? Or what should we add to the PATH variable?

Or should we submit the python script with any special parameter?


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>