Hi, Installation of HDP2.1.5 Using ambari1.6.1 is successful on cluster of
1NN(4x1TB) 1SecNN(4x1TB) 3DN(10x4TB Physical machines with the n/w speed 1000Mb/s Full)
Trying to run benchmark Teragen 50GB with the following properties set on config files.
calculation has been provided by the python script hdp.
Using cores=24 memory=126GB disks=12 hbase=False
Profile: cores=24 memory=128000MB reserved=1GB usableMem=125GB disks=12
Num Container=22
Container Ram=5120MB
Used Ram=110GB
Unused Ram=1GB
yarn.scheduler.minimum-allocation-mb=5120
yarn.scheduler.maximum-allocation-mb=112640
yarn.nodemanager.resource.memory-mb=112640
mapreduce.map.memory.mb=5120
mapreduce.map.java.opts=-Xmx4096m
mapreduce.reduce.memory.mb=5120
mapreduce.reduce.java.opts=-Xmx4096m
yarn.app.mapreduce.am.resource.mb=5120
yarn.app.mapreduce.am.command-opts=-Xmx4096m
mapreduce.task.io.sort.mb=1792
tez.am.resource.memory.mb=5120
tez.am.java.opts=-Xmx4096m
hive.tez.container.size=5120
hive.tez.java.opts=-Xmx4096m
hive.auto.convert.join.noconditionaltask.size=1342177000
Submit job shows Containers running only 3 and takes time to complete TeraGen50GB 10min15secs.
Ii would like to improve performace and reduce execution time to ~2mins.
Please help in the configuration of my cluster to boost the performace.
Thanks