We have installed the latest version of HDP (2.2) in a little cluster of 6 machines and followed the instructions at: http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ to run spark on the cluster. But we are using PySpark, and when we submit the pyspark script (.py) to the cluster it is being run only on the main node, the one with the spark installation.
What do we need to do to make it run on all the nodes?
Should we install spark on each node? Which environment variables should we set? Or what should we add to the PATH variable?
Or should we submit the python script with any special parameter?