Hi,
With the latest version of HDFS, does it rebalance itself? The only thing I can find about this is here (https://wiki.apache.org/hadoop/FAQ#If_I_add_new_DataNodes_to_the_cluster_will_HDFS_move_the_blocks_to_the_newly_added_nodes_in_order_to_balance_disk_space_utilization_between_the_nodes.3F) which indicates I will have to take manual action to rebalance.
I’m thinking about this question for Autoscaling on EC2. If I use a CloudWatch Alarm to spin up another DataNode at 75% percent capacity, and it doesn’t rebalance the data, I will be in a scale-up loop until I rebalance the data because my original nodes will still be at around 75% capacity.
If this is handled automatically now? Or if that is a flag that can be set, that would resolve my concern. Otherwise, I will have to script the rebalancing of the data, have a long wait period between autoscaling events and plan for the performance impact of the rebalancing.
Thanks for any ideas you may have.
Thanks,
Scott Edwards