Reply To: hadoop distcp error

You describe your pair of servers as “a two node cluster”. If indeed they were installed as a single Hadoop installation consisting of two nodes, then together they form a single instance of the HDFS file system, and only one of the servers is the Namenode (or primary Namenode, if you’ve set it up for HA). Within the HDFS file system, you don’t really worry (in most cases) about which physical server a given file is stored on — in fact, a big file on a big cluster will be split among many servers, because it gets split into “blocks”, each of which is replicated to multiple Datanodes! Anyway, the HDFS directory tree is an abstraction that doesn’t tell you what server a given file (or block of a file) is actually stored on. Files in this directory tree are named with URIs as “hdfs://<namenode>:8020/directoryA/subdirectoryB/filenameX”. You can ONLY use the Namenode server in the <namenode> part of the URI.

Either “hadoop1″ or “hadoop2″ is your Namenode, but not both. If hadoop1 is your Namenode, then “hdfs://hadoop2:8020/users/hadoop/dir2″ is invalid. The correct command would be:
hadoop distcp hdfs://hadoop1:8020/users/hadoop/dir1/filename.txt hdfs://hadoop1:8020/users/hadoop/dir2
This does not copy from one node to another node (eg, hadoop1 to hadoop2), but rather it copies from one location in the HDFS directory tree (/users/hadoop/dir1/filename.txt) to another location in the HDFS directory tree (/users/hadoop/dir2/filename.txt), both of which are under the control of the Namenode, “hadoop1″.

If you don’t remember which is your Namenode, but you used Ambari to install your cluster, then Ambari can tell you which node is your Namenode. Otherwise you can check running services on each node to figure it out.

It might also be good to mention (sorry if you already know this), that a file in HDFS under “hdfs://hadoop1:8020/users/hadoop/dir1/filename.txt” does NOT show up on server hadoop1 as a file named /users/hadoop/dir1/filename.txt. And vice versa, if you reach in at the Linux or Windows level, and put a file at /users/hadoop/dir1/filename.txt on server hadoop1, it will NOT show up in the HDFS file system. The directory trees for what we call the “Local File System” on each of the nodes, is completely separate from the directory tree for the HDFS file system which is an abstraction spread across all the nodes and controlled by the Namenode. But you can use the “hdfs dfs -put” and “hdfs dfs -get” commands to move files between HDFS and Local file system directories.

Hope this helps,
–Matt

Reply To: hadoop distcp error

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Principal’s past includes domestic violence case

Black Angus Grilled Artichokes

Sexual Assault Alert, Man Wanted in an ongoing Sexual Assault investigation,...

Can I request a sedan if I book full-size luxury suv?

Skint TV teen to be sentenced

Shanike Mcbride

Rapist Malachi Williams in contempt for 'uncontrolled' behaviour...

[GET] Steal My $1,566.66/Month BLACK HAT SEO Method Before It Gets Saturated...

ESENT データベース USS.jtx で、エラーイベント ID 490、454、489、455 が記録される事象について

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

The 10 Tennessee Cities With The Largest Black Population For 2021

Henrique & Juliano – Manifesto Musical 2 (Ao Vivo) – EP 3 [iTunes Plus M4A]

Bradford County Court News 4/7/2013

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Teenage girl from North Devon suffered panic attacks from being...

Outlook でメールを保存または送信時に...

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

99 God Status for Whatsapp, Facebook

Download New Album: Wizkid – Morayo (Full Album)