Hadoop one Map and multiple Reduce

Question

Maybe a simple solution would be to write a job that doesn’t have a reduce function. So you would pass all the mapped data directly to the output of the job. You just set the number of reducers to zero for the job.

Then you would write a job for each different reduce function that works on that data. This would mean storing all the mapped data on the HDFS though.

Another alternative might be to combine all your reduce functions into a single Reducer which outputs to multiple files, using a different output for each different function. Multiple outputs are mentioned in this article for hadoop 0.19. I’m pretty sure that this feature is broken in the new mapreduce API released with 0.20.1, but you can still use it in the older mapred API.

Leave a Comment Cancel reply