So – here is what I found after much debugging and little help from documentation. It appears that when running a job remotely, the yarn.application.classpath must be in the configuration otherwise the job will fail. I found this out when I examined the logs by running this command:
yarn logs -applicationId application_1411128663444_0046
Container: container_1411128663444_0046_01_000001 on sandbox.hortonworks.com_45454
====================================================================================
LogType: stderr
LogLength: 88
Log Contents:
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
LogType: stdout
LogLength: 0
Log Contents:
Now this class does exist in /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.4.0.2.1.1.0-385.jar. However, the container could not find it until it was put into the job configuration:
Configuration configuration = new Configuration();
configuration.set(“fs.defaultFS”, “hdfs://sandbox.hortonworks.com:8020″);
configuration.set(“hadoop.job.ugi”, “jbrinnand”);
configuration.set(“fs.hdfs.impl”,
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
);
configuration.set(“fs.file.impl”,
org.apache.hadoop.fs.LocalFileSystem.class.getName()
);
configuration.set(“yarn.application.classpath”,
“/etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,”
+ “/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,”
+ “/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,”
+ “/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*”);
configuration.set(“mapreduce.framework.name”, “yarn”);
configuration.set(“yarn.resourcemanager.address”, “192.168.195.161:8050″);
configuration.set(“yarn.nodemanager.aux-services”, “mapreduce_shuffle”);
ToolRunner.run(configuration, jobDriver, args);