CapacityScheduler not being elastic

I am trying to set up YARN MapReduce queues on my system so that I don’t have individual jobs sucking up all the resources on the cluster. However I still want the cluster to be 100% utilised if there are jobs ready to run. However this elastic use is not working for me on HDP 2.2.0 (on CentoOS6). Can you suggest things I should do to fix it or diagnose better.

To test this I added a second queue to the default one, and allocated default with a capacity of 80 and the new ingestion queue with capacity of 20

This seems to work fine – with some big MR jobs (teragen from the terasort benchmark) the default queue uses four times as many maps as the ingestion queue.

I can even change the config xml file (swap 20 and 80) tell yarn to refresh queues (without restarting) and I see that the new ingestion queue starts using more maps, and the default one less.

Result!

HOWEVER

If one of the jobs finishes and the queue is empty I would expect the other queue to stop limiting itself to the capacity, and instead go all the way up to its maximum-capacity (which is currently 100)
It does not do this. The cluster carries on with the smaller capacity queue trundling along with most of the cluster idle.

Is there anything I need to do to enable elastic use of the resources? Here is my Ambari Capacity Scheduler config

yarn.scheduler.capacity.default.minimum-user-limit-percent=100
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.accessible-node-labels.default.capacity=-1
yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity=-1
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default-node-label-expression=
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=80
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.ingestion.acl_administer_jobs=*
yarn.scheduler.capacity.root.ingestion.acl_submit_applications=*
yarn.scheduler.capacity.root.ingestion.capacity=20
yarn.scheduler.capacity.root.ingestion.maximum-capacity=100
yarn.scheduler.capacity.root.ingestion.state=RUNNING
yarn.scheduler.capacity.root.ingestion.user-limit-factor=1
yarn.scheduler.capacity.root.queues=default,ingestion

Now interestingly here is an example of what I am talking about…

$ yarn queue -status default ; yarn queue -status ingestion ; yarn queue -status root

15/04/28 17:35:39 INFO impl.TimelineClientImpl: Timeline service address: http://rmmachine:8188/ws/v1/timeline/
15/04/28 17:35:39 INFO client.RMProxy: Connecting to ResourceManager at rmmachine/10.34.37.2:8050
Queue Information :
Queue Name : default
State : RUNNING
Capacity : 80.0%
Current Capacity : .0%
Maximum Capacity : 100.0%
Default Node Label expression :
Accessible Node Labels : *
15/04/28 17:35:41 INFO impl.TimelineClientImpl: Timeline service address: http://rmmachine:8188/ws/v1/timeline/
15/04/28 17:35:42 INFO client.RMProxy: Connecting to ResourceManager at rmmachine/10.34.37.2:8050
Queue Information :
Queue Name : ingestion
State : RUNNING
Capacity : 20.0%
Current Capacity : 108.7%
Maximum Capacity : 100.0%
Default Node Label expression :
Accessible Node Labels : *
15/04/28 17:35:44 INFO impl.TimelineClientImpl: Timeline service address: http://rmmachine:8188/ws/v1/timeline/
15/04/28 17:35:45 INFO client.RMProxy: Connecting to ResourceManager at rmmachine/10.34.37.2:8050
Queue Information :
Queue Name : root
State : RUNNING
Capacity : 100.0%
Current Capacity : 21.7%
Maximum Capacity : 100.0%
Default Node Label expression :
Accessible Node Labels : *

So in this example I have nothing running in the default queue which has 80% capacity set, and 20% is the ingestion queue, but either queue should be allowed to elastically use 100% of their parent queue (root).

This seems to tell me that the ingestion queue is running at full speed, but it is only running at 20% of the full cluster capacity.

CapacityScheduler not being elastic

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

HANNAH ELIZABETH BYERS Arrested by Clackamas County Sheriff's Office on Jan...

Black Angus Grilled Artichokes

Windows Update / Microsoft Update の接続先 URL について

Family photograph inserted in Facebook by Mahindananda Aluthgamage

Kendal Mcclam

Update: Pro-Rated Mobile Golf Tour (Games)

McLaren MP4-12c GT3 @ Spa. Setup

Miley Cyrus – Something Beautiful – Pre-Single [iTunes Plus M4A]

Process Monitor で OS 起動時のログを採取する手順

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

NOTICE OF INVITATION TO BID FOR PAVING

Ko Droka na Bogi

Bureau of Internal Revenue: Regional Offices (Directory)

Zigi – U Sey Wey Tin ( Throw Back Thursday Hiplife )

Strong column weak beam check as per IS13920

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

Solved CBSE Sample Papers for Class 9 English Set 1

BICHPOO / PURE BREAD

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में