Hello everyone,
Recently when I run lots of MapReduce request in parallel in a Hadoop Cluster, I always found that NameNode is killed by OOM-Killer.
Observing the statistic captured in control node, I find that the highest memory utilization of NameNode is just 10%. The memory utilization of ResourceManager of YARN is just 8%.
The sum of utilization is not enough to make memory congestion in control node.
In control node, except NameNode and ResourceManager, I do not run any other program.
But when I use “jps” command to check the JVMs, I find there are many processes run “RunJar”.
I know that when a request is submitted to Hadoop System, Hadoop will create a JVM to read the jar file provided by the request. But the JVM probably is very small. Are they possible to make the control node overcrowded?
I hope also that someone can explain the Hadoop request-processing flow to me or can give me some informations. I do not find it by google…..
For example, when a Hadoop system receive a new request, what it will do? Who create the process “RunJar”? What is the responsibilities of the process? What is the relationship among the process, HDFS and YARN?
Maybe there are too many questions, but I think they are very important for me to really understand the internal request-procesing of Hadoop.
Thanks very much in advance.