Spark jobs run as a YARN job on the Hadoop/YARN cluster and read data from HDFS so they need HDFS permissions.
Let’s say you want to run Spark jobs as a user called “spark”
As root create a user “spark” and make the user part of group hadpp
>useradd spark -g hadoop
As hdfs admin, create hdfs dir for user “spark” and set ownership
> su hdfs;
> hdfs dfs -mkdir /user/spark;
> hdfs dfs -chown sparkuser:hdfs /user/sparkuser
Now as user “spark” run Spark jobs, e.g SparkPi
> su spark
> cd /usr/hdp/current/spark-client/
> ./bin/spark-submit –class org.apache.spark.examples.SparkPi –master yarn-cluster –num-executors 3 –driver-memory 512m