Monday, June 12, 2017

MapReduce-Intorduction-Step by Step analysis of Mapper,Reducer with complete execution flow

In hadoop every application is converted to MapReduce job.

It is data processing model which consists of 2 core components

i)Mapper
ii)Reducer

Execution flow of MapReduce Application
---------------------------------------
step1:
we give text file as input

abc.txt
-------
My Name is Manohar        
Iam born for Teaching
I love Teaching

step2
-----
FileInputFormat receives the input and covert the text file into key/value pairs as shown below

FileInputFormat-->Input splits

Note:Default InputFormat is TextInputFormat

key-->byteoffset
value--->Text(row)

split1
0,
My Name is Manohar        

split2
18,
Iam born for Teaching

split3
40,
I love Teaching

step3:
-----
The input splits are givena as input to Mapper
Mapper will receive the key/value pairs from FileInputFormat,here the key is byteoffset,value is text

Mapper will read the value which is Text and convert the text into key/value pairs as shown below

My,1
Name,1
is,1
Manohar,1

Iam,1
born,1
for,1
teaching,1

I,1
love,1
teaching,1

Here in the above Mapper output,key is word/text and for every mapper initializes one(1).so Mapper accepts a row as input and converts the text as output in the form of key/pairs as shown as above


step4:
sorting and shuffling
Here all the key/value pairs are sorted and shuffled according to alphabetical order.

born,1
for,1
..
...
..



step5:
The output generated by Mapper after sorting and shuffling it is given as input to Reducer.

Reducer takes ket/value pairs as input and perform aggregation and generate output as key/values as shown'


born,1
for,1
..
..
teaching,2

step1:
cat > file1
My Name is Manohar
I live for teaching
I die while teaching


step2:
we need 3 classes
i)Mapper class
ii)Reducer class
iii)Driver class

step2.1
-------
How to create a Mapper class
Mapper accpets key value pairs as input from  FileInputStream and converts the value received from FIS into key and value pairs as output

file1[FileInputStream]
----------------------
0  -->key
My Name is Manohar-->value

20 -->key
I live for teaching-->value

40-->key
I die while teaching-->value


MapReduce
---------
java datatypes[primitives] --->Mapreduce datatypes[Classes]
byte-->ByteWritable
int--IntWritable
long-->LongWritable
float-->FloatWritable
double-->DoubleWritable
String--->Text

A Writable consists of set() and get()
set()--sets a values to a avariable.
get()-->gets a value from a variable.


Mapper Logic
------------
Input-->key-->LongWritable
       value-->Text

0,
My Name is Manohar.

Mapper logic for wordcount
--------------------------
Read the Text value
-------------------
Text-->My Name is Manohar

splits the Text into key/value pairs

My-->key
value-->1

Name->key
value-->1

is-->key
value-->1

Manohar-->key
value-1

And finally Write key/value pair back as output.

steps to create Mapper
----------------------
step1:

Extend a class from a Mapper class available in org.pache.mapreduce.io.* package.

public class MyMapper extends  Mapper<LongWritable, Text, Text, IntWritable> {




}

Mapper<LongWritable, Text, Text, IntWritable> {
        k1            v1   k2       v2


k1,v1-->inputs coming from FileInputStream
K2,v2-->output generated by map() method


Mapper class has 4 methods
--------------------------
protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)

Called once at the end of the task.

protected void map(KEYIN key, VALUEIN value, Context context)

Called once for each key/value pair in the input split.

void run(org.apache.hadoop.mapreduce.Mapper.Context context)
Expert users can override this method for more complete control over the execution of the Mapper.

protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)

Called once at the beginning of the task.


step2:override map() method in Mapper class
-------------------------------------------
public void map(LongWritable k,Text v,Context context)
{


}


Reducer
-------
A Reducer is used to perform an aggregation on a key/value pair.

A Reducer accpets input as key/value pair and generate output as key/value pair.

Reducer logic
-------------
Input from Mapper
-----------------
My-->key
value-->1

Name-->key
value-->1

is-key
value-1

..
I-key
value-1

..

Teaching--key
value--1

..
I-key
value

..

Teaching--key
value--1


output
-------
My,1
name,1
is,1
Manohar,1
I,2
live,1
..
Teaching,2


steps to create Reducer class
-----------------------------
step1:Extend a class from Reducer

acessmodifier class ClassName extends Reducer<K1,V1,K2,V2>
{
}


K1,V1-->Input from Mapper
K2,V2-->output from Reducer

step2:
override reduce() method of Reducer in the sublcass

public void reduce(Text k, Iterable<IntWritable> list, Context       con) throws IOException, InterruptedException{
              ....
              ....
}

Mapper logic for wordcount
--------------------------
0,
My name is Manohar

0-->k[LongWritable]
My name is Manohar-->value[Text]

step1:
convert Text to String
----------------------
String str=value.toString();

step2:
split string into tokens

String s[]=str.split(" ");

My s[0]
Name s[1]
is s[2]
Manohar s[3]


step3:
Iterate String array and write each token/word with 1

for(int i=0;i<s.length();i++)
{
   context.write(new Text(s[i]),new IntWritable(1));
}











































































































No comments: