Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Reducer questions

$
0
0

I am reading Eric Sammer’s Hadoop operations. In the book, regarding map reduce performance/tuning, mapred.reduce.parallel.copies – “Each reducer task must fetch intermediate map output data from each of the task trackers where a map task from the same job ran. In other words there are Reducers x Mappers number of total copies that must be performed”.

Questions :
1. A reducer task will only work on a single key. Is this true? So if a job is dealing with data having 10 keys, that would mean 10 reducer tasks for the job?
2. So a reducer working on a particular key on a particular node, will get data for the same key from the intermediate data locally as well as from the tasktrackers/intermediate data on other nodes running the same job? BTW is data not sorted in hdfs, so that data for a key is mostly locally on a particular node – than spread across nodes?
3. So all the reducers for the job running on different nodes, complete their aggregation for their set of data – but finally who collects/compiles the data from the various reducers and produces final output?

Appreciate the insights.


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>