Hello,
I followed this guide: http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#Installing_HDP_Using_Ambari
I’m using Ubuntu 12.04.5, HDP-2.2.4.2. I have used the “Download the Repo” method instead of setting up my own internal proxy/repo server. My cluster is not behind a firewall that limits their internet access at this moment.
Basically every node in my cluster is failing to start services. When I do “Start All Components” just about every one of them fails except for the History Server. However, oddly enough if I go into my namenode and manually start the Namenode component, it will start without issue, among other components. During the install I selected to install every component Ambari offered.
Even with the Namenode turned on, my Datanodes will not start. I simply want to just start the basis of Hadoop which is HDFS. For instance, my Datanodes all get this same error:
2015-05-13 13:31:19,657 – Error while executing command ‘start':
Traceback (most recent call last):
File “/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py”, line 214, in execute
method(env)
File “/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py”, line 63, in start
datanode(action=”start”)
File “/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py”, line 61, in datanode
create_log_dir=True
File “/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py”, line 219, in service
environment=hadoop_env_exports
File “/usr/lib/python2.6/site-packages/resource_management/core/base.py”, line 148, in __init__
self.env.run()
File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”, line 152, in run
self.run_action(resource, action)
File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”, line 118, in run_action
provider_action()
File “/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py”, line 274, in action_run
raise ex
Fail: Execution of ‘ambari-sudo.sh su hdfs -l -s /bin/bash -c ‘ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh –config /etc/hadoop/conf start datanode” returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-hddata1a.out
Note: Interesting enough, the NodeManager for YARN starts up just fine on the same datanode. All datanodes have this problem.
The log that the error is referring to ( /var/log/hadoop/hdfs/hadoop-hdfs-datanode-hddata1a.out ) only has this in it:
ulimit -a for user hdfs
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257123
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 257123
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Things I have tried:
– Disabled IPv6 on all my nodes. From my previous times installing Hadoop I’ve been told it doesn’t play nice with IPv6.
– Deployed using root on all machines with ssh private key for passwordless.
– Restart machines
– Noticed Python 2.6 and 2.7 are both installed but I didn’t figure this would hurt anything – – left this alone unless someone thinks otherwise.
Please let me know if you have any advice.