Hi,
I am a newbie in Apache Hadoop.
I made few successful runs of basic mapreduce programs in Java using example programs (that typical word count example too).
Then I also made few successful runs of custom designed programs. But I noticed that whenever the Value type for Mapper class is different than the Value type of Reducer class, it causes run time failure. Logically it should not happen. Because the output of my reducer may not look same as the output of the mapper.
e.g. My Mapper may produce output as:-
<Male,23>
<Female,19>
<Male,54>
Whereas I may want to produce Reducer output as (average age per gender):-
<Male,38.50>
<Female,19.00>
So in this case the output type for my mapper class would be <Text,IntWritable> whereas the output type for my Reducer will be <Text,FloatWritable>.
So this doesn’t work. I get run time error. So I am not showing that exact code here, but rather a much simpler code. So that it will be easy for someone to explain what is causing the issue using simplest example. In the code I am showing here the output type for Mapper is <Text,IntWritable> and the output type for Reducer is <Text,Text>. It’s my requirement right, there is no reason why I should not be allowed to do this.
// Driver class
public class Driver
{
public static String inputPath = “/home//myinput/sample.data”;
public static String outputPath =”/home/myoutput”;
public static void main(String[] args) throws Exception
{
JobConf conf = new JobConf(Driver.class);
conf.setJobName(“my_job_demographics”);
// Output key class
conf.setOutputKeyClass(Text.class);
// Output value class
conf.setOutputValueClass(Text.class);
// Mapper class
conf.setMapperClass(MapperClass.class);
// Reducer class
conf.setReducerClass(ReducerClass.class);
// InputFileFormat
FileInputFormat.addInputPath(conf, new Path(inputPath));
// OutputFileFormat
FileOutputFormat.setOutputPath(conf, new Path(outputPath));
JobClient.runJob(conf);
}
} // driver class ends
// Mapper Class
public class MapperClass extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>
{
@Override
public void map(LongWritable key, Text value,OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
output.collect(new Text(“TestKey”), new IntWritable(1)); // 1 is a sample value for testing
}
} // mapper class ends
// Reducer class
public class ReducerClass extends MapReduceBase implements Reducer<Text,FloatWritable,Text,Text>
{
@Override
public void reduce(Text key, Iterator<FloatWritable> value,OutputCollector<Text, Text> output, Reporter reporter) throws IOException
{
output.collect(key,new Text(“TestValue”));
}
} // reducer class ends
So when I run this I get the following error:-
Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:401)
Caused by: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
Can someone please point out what is causing this and how to correct it? The input file sample.txt has few lines of records, but that is immaterial as you can see from the code.
Thanks