Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Data Node lost heart beat, not able to recover ir in the hadoop cluster.

$
0
0

Hi,
I have a 4 node HDP cluster with Ambari 2.3.
The cluster was live and working fine until couple of days back, when other installation on one of the data nodes happened.

In the UI is shows all the data nodes but hear beat from one of the nodes is lost.

I’ve tried couple of things from my side but have failed to get the heart beat back.
1. Restarted the ambari-agents and ambari-server.
2. Crossed checked the ‘/etc/hosts’ and ‘hostname -f’ on all hte nodes.
3. Checked ‘/etc/ambari-agent/conf/ambari-agent.ini’ parameters.
4. The nodes belong to the same time zone, even then have configured for ‘ntp’.
5. Rebooted the nodes and started the ambari-server and ambari-agent services.
6. Turned On the maintenance mode on the node, but have not got any useful alert that would point me towards a solution.
7. Restarted ambari-server in debug mode.
8. Checked for clashes of any ports used by the other installation, but the ports seem to be free.
9. Have stopped other services other than hadoop components and ambari.
10. iptables stopped.

Following are the logs:
1. ambari-server.out

WARNING: The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.ambari.server.api.services.HostService.getHosts(java.lang.String,javax.ws.rs.core.HttpHeaders,javax.ws.rs.core.UriInfo), should not consume any entity.
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.ambari.server.api.services.HostService.getHost(java.lang.String,javax.ws.rs.core.HttpHeaders,javax.ws.rs.core.UriInfo,java.lang.String), should not consume any entity.

2. ambari-server.log

08 Dec 2015 15:14:47,792 ERROR [qtp-client-270] MetricsPropertyProvider:183 – Error getting timeline metrics. Can not connect to collector, socket error.
08 Dec 2015 15:15:06,676 ERROR [qtp-client-253] MetricsPropertyProvider:183 – Error getting timeline metrics. Can not connect to collector, socket error.
08 Dec 2015 15:15:25,703 ERROR [qtp-client-270] MetricsPropertyProvider:183 – Error getting timeline metrics. Can not connect to collector, socket error.

3. ambari-agent.log

INFO 2015-12-08 15:20:34,261 security.py:98 – SSL Connect being called.. connecting to the server
ERROR 2015-12-08 15:20:34,261 Controller.py:186 – Unable to connect to: https://master.ds.com:8441/agent/v1/register/slave1.ds.com
Traceback (most recent call last):
File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 140, in registerWithServer
ret = self.sendRequest(self.registerUrl, data)
File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 413, in sendRequest
raise IOError(‘Request to {0} failed due to {1}’.format(url, str(exception)))
IOError: Request to https://master.ds.com:8441/agent/v1/register/slave1.ds.com failed due to VerifiedHTTPSConnection instance has no attribute ‘_tunnel_host’
ERROR 2015-12-08 15:20:34,261 Controller.py:187 – Error:Request to https://master.ds.com:8441/agent/v1/register/slave1.ds.com failed due to VerifiedHTTPSConnection instance has no attribute ‘_tunnel_host’
WARNING 2015-12-08 15:20:34,262 Controller.py:188 – Sleeping for 27 seconds and then trying again

Guys let me know if i have gone wrong some place, and what steps should i follow to get it working.

The last option i have is to uninstall ambari-agent from the deceased data node the reinstall it.

Regards
Valent Pawar
Cheers!!!!


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>