Quantcast
Viewing all articles
Browse latest Browse all 3435

NodeManager won't start on Ambrari 1.6.1

Ambrari 1.6.1 building and HDP 2.1 cluster, JDK 1.7, all hosts RedHat 6.5. All checks passed, iptables and selinux off on all machines. I’ve tried several different cluster topologies and I always get the same result–the NodeManager won’t start. It initially showed as HiveServer2 failing to start, but by pairing back to the bare minimum services, it looks like it’s actually the NodeManager. It manifests in the logs as a connection refused when the slaves try to reach back to the master on 8020 (which, according to the HDFS > Config > Advanced section, is the location of the NDFS service). Now, I know it is tempting to write this off as something wrong with the way I configured my hosts or my network, but then the following maybe suggests otherwise:
The services seem to be running ok on the master:

[root@master ~]# jps
6229 QuorumPeerMain
8736 SecondaryNameNode
7515 ApplicationHistoryServer
8135 JobHistoryServer
10550 Jps
8317 ResourceManager
6530 NameNode
[root@master ~]#

There is listening on selected hadoop ports:

[root@master ~]# netstat -l | grep tcp | grep ":8"
tcp master.jhuapl:8141 *:* LISTEN
tcp master.jhuapl:8050 *:* LISTEN
tcp master.jhuapl:8188 *:* LISTEN
tcp master.jhuapl:8030 *:* LISTEN
tcp *:8670 *:* LISTEN

And here’s the kicker. If I run a little listener on the master:

[root@master ~]# nc -l 12345

I can communicate with it from the slave:

[root@slave1 ~]# nc -z master 12345
Connection to master 12345 port [tcp/italk] succeeded!

But not on the expected Hadoop service ports!

[root@slave1 ~]# nc -z master 8020
[root@slave1 ~]# nc -z master 8020
[root@slave1 ~]#

A wild guess might be that there is something wrong with the services on the master, but I don’t know how to prove that or not.

Please suggest anything I might try at this point.

Thanks,
Clark

p.s. Here’s the original stacktrace from the slave that shows the communication problem with the master:

2014-08-27 17:51:34,556 - Error while executing command 'start':
Traceback (most recent call last):

File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/nodemanager.py", line 40, in start
...SNIP...
Fail: Execution of 'hadoop fs -mkdir rpm -q hadoop | grep -q "hadoop-1" || echo "-p" /app-logs /mapred /mapred/system /mr-history/tmp /mr-history/done && hadoop fs -chmod -R 777 /app-logs && hadoop fs -chmod 777 /mr-history/tmp && hadoop fs -chmod 1777 /mr-history/done && hadoop fs -chown mapred /mapred && hadoop fs -chown hdfs /mapred/system && hadoop fs -chown yarn:hadoop /app-logs && hadoop fs -chown mapred:hadoop /mr-history/tmp /mr-history/done' returned 1. mkdir: Call From slave1.jhuapl.edu/127.0.1.1 to master.jhuapl.edu:8020 failed on connection exception: java.net.ConnectException: Connection refused; see: http://wiki.apache.org/hadoop/ConnectionRefused


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>