Namenode HA migration with Ambari

I went through a Namenode HA migration with Ambari 1.6.1 yesterday and wanted to share my experiences. I saw there was an Ambari feature to migrate HA NN, and several bugs were fixed in prior ambari versions, and thought “easy; I’m only moving my standby NN, no user downtime”.

tl;dr: HA migration is buggy; plan for downtime!

Started with: Namenodes in HA, standby NN on server A1, active NN on server A2. Goal was to end up with a NN in server A3 and A2 (removing it from A1).

Went through the Ambari wizard to migrate NNs. Set new NNs as A2 and A3. Everything worked up until “restarting services”, at which point it failed because it couldn’t “bind to address” when starting the NN on A2.
– I debugged it and the conf on A2 was showing nn1=A3 and nn2=A3 (both A3).
– Changing hdfs-site.xml doesn’t work (as ambari redeploys it’s version).
– Edited the config in the database. There’s not a lot of docs (any docs) on this, but what I did was add a new field in the clusterconfig table with a new version number, , type of hdfs-site, with the corrected config. Then also updated clusterconfigmapping to insert a new entry to make that version line. After making these changes, then AA2 NN was able to start up.

2. Next steps were to formatZK / boostrapStandby on the NNs.
– formatZK worked fine, but -boostrapStandby resulted in “invalid last txid in stream” errors. I failed over the QJM but that didn’t help (all the QJMs were internally consistent; good). I didn’t find any useful information on how to fix this error (if anyone knows how, please let me know).
At this point, the NN on A3 wasn’t operational, though I could access HDFS w/datanodes running.

3. I decided “turn off NN HA” (to tear down and re-initializing the QJMs). I went through the ambari “disable HA” which worked reasonable well. I skipped the step of enabling a secondary NN since I didn’t actually want a secondary NN (I was about to turn back on HA NN).
– the NN didn’t start back up because it was complaining that it wanted to format itself (ack!) but there were files in hdfs and didn’t want to kill them (thank goodness).
– I edited the ambari py scripts to disable the automatic-format (in ambari-agent on the NN, edited hdfs_namenode.py to disable calls to format_namenode). This got the NN up and running.

4. Ambari decided “you have HA already enabled” and wouldn’t let me re-enable it through the HA wizard.
– I ended up going through the ambari source to figure this one out (was looking for a DB flag; it’s not in the database — it’s actually “if the secondary namenode exists in the config then it is not namenode-HA, otherwise it is).
– Fixed by adding a secondary namenode (but didn’t configure)

5. Ambari then allowed the NN HA wizard to run. This worked for the most part (it failed starting nagios; there was a missing config option & I had to edit the ambari scripts to temporarily work around it until I could add the option in ambari)

Namenode HA migration with Ambari

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List