I have a yarn application (camus – for reading data from Kafka, and writing to HDFS) which is running out of virtual memory. It is using 4Gb of real, and 11 of virtual memory. I am worried that – because I am on RHEL6 I might be hitting the glibc arena bug/feature where the rare mallocs done use up far more virtual memory than they should.
The documentation in several places suggests setting the env variable MALLOC_ARENA_MAX (setting it to 4 or even 1) but I am not sure where.
I see that yarn.nodemanager.admin-env is set to MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
So I went to the yarn-env template in Ambari and added
export MALLOC_ARENA_MAX=4
(I also restarted all the node managers in a rolling restart).
No change. Is this right?
Perhaps I should just override yarn.nodemanager.admin-env setting it to a hard coded value directly.
Any ideas?