Mapreduce

MapReduce is a processing technique and a program model for distributed computing. The main goal of mapreduce is to make distributed system easy to program – failures and data movement are hidden.

Execution overview

MapReduce implements various mathematical algorithms to divide a task into small parts and assign them to multiple systems. In technical terms, MapReduce algorithm helps in sending the Map & Reduce tasks to appropriate servers in a cluster.

The MapReduce library in the user program first splits the input files into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece (con- trollable by the user via an optional parameter). It then starts up many copies of the program on a clus- ter of machines.
One of the copies of the program is special – the master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task.
A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory.
Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.
When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers. When a reduce worker has read all in- termediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit in memory, an external sort is used.
The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition.
When all map tasks and reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user program returns back to the user code.

Worker Failure

The master pings every worker periodically. If no re- sponse is received from a worker in a certain amount of time, the master marks the worker as failed. Any map tasks completed by the worker are reset back to their ini- tial idle state, and therefore become eligible for schedul- ing on other workers. Similarly, any map task or reduce task in progress on a failed worker is also reset to idle and becomes eligible for rescheduling.

Example

Overview

input is (already) split into M files
Input1 -> Map -> a,1 b,1
Input2 -> Map ->     b,1
Input3 -> Map -> a,1     c,1
                  |   |   |
                  |   |   -> Reduce -> c,1
                  |   -----> Reduce -> b,2
                  ---------> Reduce -> a,2

Input is read from local disk (via GFS), not over the network (i.e. Input -> Map).
Map worker writes to local disk. Reduce workers read directly from Map workers, not via GFS. Workers are the same machine
Shuffle stage transfers the map output from Mapper to a Reducer. ( in this case <a, [1, 1]>)
In shuffle stage, it reads the same keys across different computers over the network.

Code

Map

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
	
	/*
	 * call the defined map function each time when reading one line
	 * parameters:
	 * 		key: vector of current line
	 * 		context: text value of current line
	 */
	@Override
	protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {
		String line = value.toString();
		String[] words=line.split(" ");
		
		// output: <word，1>
		for(String word:words){
			context.write(new Text(word), new IntWritable(1));
		}
	}
}

Reduce

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	/*
	 * get the key-value pairs from the shuffle stage
	 * all occurrences of the same key are grouped together. <same key, list(value)>. the value is an iterator
	 */
	protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
		int count = 0;
    
		for(IntWritable value:values){
			count+=value.get();
		}
		
		context.write(key, new IntWritable(count));
	}
}

Client code

public class WordCountJobSubmitter {
	
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		Job wordCountJob = Job.getInstance(conf);
		
		wordCountJob.setJarByClass(WordCountJobSubmitter.class);
		
		wordCountJob.setMapperClass(WordCountMapper.class);
		wordCountJob.setReducerClass(WordCountReducer.class);
		
		// Type of key-value in the Map stage
		wordCountJob.setMapOutputKeyClass(Text.class);
		wordCountJob.setMapOutputValueClass(IntWritable.class);
		
		// Type of key-value for the output
		wordCountJob.setOutputKeyClass(Text.class);
		wordCountJob.setOutputValueClass(IntWritable.class);
		
		FileInputFormat.setInputPaths(wordCountJob, "PATH");
		FileOutputFormat.setOutputPath(wordCountJob, new Path("PATH"));
		
		//submit job to hadoop cluster
		wordCountJob.waitForCompletion(true);
	}
}

Reference

http://nil.csail.mit.edu/6.824/2020/schedule.html

http://nil.csail.mit.edu/6.824/2020/papers/mapreduce.pdf