I have installed the HDP 2.2 with Ambari 2.0.
I’m facing a big issue – I had kept the DataNode directories as :
/nsr/hadoop/hdfs/data,/opt/hadoop/hdfs/data,/usr/hadoop/hdfs/data,/usr/local/hadoop/hdfs/data,/var/hadoop/hdfs/data
Now, I want several disks to be mounted on each node and I want the ALL THE DATA BLOCKS TO BE STORED ON THESE DISKS. There is not much data on the HDFS and the same can be deleted.
Is it possible now? Will only changing the DataNode directories help or any more paths need to be changed ?
UPDATE :
I went to Services -> HDFS -> Configs -> DataNode and I replaced the below values with the 7 disk paths with
/opt/dev/sdb/hadoop/hdfs/data
/opt/dev/sdc/hadoop/hdfs/data
And so on.
As expected, I got an alert for the HDFS service/namenode host :
This service-level alert is triggered if the number of corrupt or missing blocks exceeds the configured critical threshold. The threshold values are in blocks.
I have the below queries :
1. How can I ensure that now the 7 disks only will be used for the data storage(and NO other path like /, /var etc.)? Any more ‘clean-ups’ or config changes required ? Is there some doc. available(I couldn’t find)
2. After I start data loading, how can I verify point 1