In hadoop every application is converted to MapReduce job.
It is data processing model which consists of 2 core components
i)Mapper
ii)Reducer
Execution flow of MapReduce Application
---------------------------------------
step1:
we give text file as input
abc.txt
-------
My Name is Manohar
Iam born for Teaching
I love Teaching
step2
-----
FileInputFormat receives the input and covert the text file into key/value pairs as shown below
FileInputFormat-->Input splits
Note:Default InputFormat is TextInputFormat
key-->byteoffset
value--->Text(row)
split1
0,
My Name is Manohar
split2
18,
Iam born for Teaching
split3
40,
I love Teaching
step3:
-----
The input splits are givena as input to Mapper
Mapper will receive the key/value pairs from FileInputFormat,here the key is byteoffset,value is text
Mapper will read the value which is Text and convert the text into key/value pairs as shown below
My,1
Name,1
is,1
Manohar,1
Iam,1
born,1
for,1
teaching,1
I,1
love,1
teaching,1
Here in the above Mapper output,key is word/text and for every mapper initializes one(1).so Mapper accepts a row as input and converts the text as output in the form of key/pairs as shown as above
step4:
sorting and shuffling
Here all the key/value pairs are sorted and shuffled according to alphabetical order.
born,1
for,1
..
...
..
step5:
The output generated by Mapper after sorting and shuffling it is given as input to Reducer.
Reducer takes ket/value pairs as input and perform aggregation and generate output as key/values as shown'
born,1
for,1
..
..
teaching,2
step1:
cat > file1
My Name is Manohar
I live for teaching
I die while teaching
step2:
we need 3 classes
i)Mapper class
ii)Reducer class
iii)Driver class
step2.1
-------
How to create a Mapper class
Mapper accpets key value pairs as input from FileInputStream and converts the value received from FIS into key and value pairs as output
file1[FileInputStream]
----------------------
0 -->key
My Name is Manohar-->value
20 -->key
I live for teaching-->value
40-->key
I die while teaching-->value
MapReduce
---------
java datatypes[primitives] --->Mapreduce datatypes[Classes]
byte-->ByteWritable
int--IntWritable
long-->LongWritable
float-->FloatWritable
double-->DoubleWritable
String--->Text
A Writable consists of set() and get()
set()--sets a values to a avariable.
get()-->gets a value from a variable.
Mapper Logic
------------
Input-->key-->LongWritable
value-->Text
0,
My Name is Manohar.
Mapper logic for wordcount
--------------------------
Read the Text value
-------------------
Text-->My Name is Manohar
splits the Text into key/value pairs
My-->key
value-->1
Name->key
value-->1
is-->key
value-->1
Manohar-->key
value-1
And finally Write key/value pair back as output.
steps to create Mapper
----------------------
step1:
Extend a class from a Mapper class available in org.pache.mapreduce.io.* package.
public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
}
Mapper<LongWritable, Text, Text, IntWritable> {
k1 v1 k2 v2
k1,v1-->inputs coming from FileInputStream
K2,v2-->output generated by map() method
Mapper class has 4 methods
--------------------------
protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)
Called once at the end of the task.
protected void map(KEYIN key, VALUEIN value, Context context)
Called once for each key/value pair in the input split.
void run(org.apache.hadoop.mapreduce.Mapper.Context context)
Expert users can override this method for more complete control over the execution of the Mapper.
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
Called once at the beginning of the task.
step2:override map() method in Mapper class
-------------------------------------------
public void map(LongWritable k,Text v,Context context)
{
}
Reducer
-------
A Reducer is used to perform an aggregation on a key/value pair.
A Reducer accpets input as key/value pair and generate output as key/value pair.
Reducer logic
-------------
Input from Mapper
-----------------
My-->key
value-->1
Name-->key
value-->1
is-key
value-1
..
I-key
value-1
..
Teaching--key
value--1
..
I-key
value
..
Teaching--key
value--1
output
-------
My,1
name,1
is,1
Manohar,1
I,2
live,1
..
Teaching,2
steps to create Reducer class
-----------------------------
step1:Extend a class from Reducer
acessmodifier class ClassName extends Reducer<K1,V1,K2,V2>
{
}
K1,V1-->Input from Mapper
K2,V2-->output from Reducer
step2:
override reduce() method of Reducer in the sublcass
public void reduce(Text k, Iterable<IntWritable> list, Context con) throws IOException, InterruptedException{
....
....
}
Mapper logic for wordcount
--------------------------
0,
My name is Manohar
0-->k[LongWritable]
My name is Manohar-->value[Text]
step1:
convert Text to String
----------------------
String str=value.toString();
step2:
split string into tokens
String s[]=str.split(" ");
My s[0]
Name s[1]
is s[2]
Manohar s[3]
step3:
Iterate String array and write each token/word with 1
for(int i=0;i<s.length();i++)
{
context.write(new Text(s[i]),new IntWritable(1));
}
It is data processing model which consists of 2 core components
i)Mapper
ii)Reducer
Execution flow of MapReduce Application
---------------------------------------
step1:
we give text file as input
abc.txt
-------
My Name is Manohar
Iam born for Teaching
I love Teaching
step2
-----
FileInputFormat receives the input and covert the text file into key/value pairs as shown below
FileInputFormat-->Input splits
Note:Default InputFormat is TextInputFormat
key-->byteoffset
value--->Text(row)
split1
0,
My Name is Manohar
split2
18,
Iam born for Teaching
split3
40,
I love Teaching
step3:
-----
The input splits are givena as input to Mapper
Mapper will receive the key/value pairs from FileInputFormat,here the key is byteoffset,value is text
Mapper will read the value which is Text and convert the text into key/value pairs as shown below
My,1
Name,1
is,1
Manohar,1
Iam,1
born,1
for,1
teaching,1
I,1
love,1
teaching,1
Here in the above Mapper output,key is word/text and for every mapper initializes one(1).so Mapper accepts a row as input and converts the text as output in the form of key/pairs as shown as above
step4:
sorting and shuffling
Here all the key/value pairs are sorted and shuffled according to alphabetical order.
born,1
for,1
..
...
..
step5:
The output generated by Mapper after sorting and shuffling it is given as input to Reducer.
Reducer takes ket/value pairs as input and perform aggregation and generate output as key/values as shown'
born,1
for,1
..
..
teaching,2
step1:
cat > file1
My Name is Manohar
I live for teaching
I die while teaching
step2:
we need 3 classes
i)Mapper class
ii)Reducer class
iii)Driver class
step2.1
-------
How to create a Mapper class
Mapper accpets key value pairs as input from FileInputStream and converts the value received from FIS into key and value pairs as output
file1[FileInputStream]
----------------------
0 -->key
My Name is Manohar-->value
20 -->key
I live for teaching-->value
40-->key
I die while teaching-->value
MapReduce
---------
java datatypes[primitives] --->Mapreduce datatypes[Classes]
byte-->ByteWritable
int--IntWritable
long-->LongWritable
float-->FloatWritable
double-->DoubleWritable
String--->Text
A Writable consists of set() and get()
set()--sets a values to a avariable.
get()-->gets a value from a variable.
Mapper Logic
------------
Input-->key-->LongWritable
value-->Text
0,
My Name is Manohar.
Mapper logic for wordcount
--------------------------
Read the Text value
-------------------
Text-->My Name is Manohar
splits the Text into key/value pairs
My-->key
value-->1
Name->key
value-->1
is-->key
value-->1
Manohar-->key
value-1
And finally Write key/value pair back as output.
steps to create Mapper
----------------------
step1:
Extend a class from a Mapper class available in org.pache.mapreduce.io.* package.
public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
}
Mapper<LongWritable, Text, Text, IntWritable> {
k1 v1 k2 v2
k1,v1-->inputs coming from FileInputStream
K2,v2-->output generated by map() method
Mapper class has 4 methods
--------------------------
protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)
Called once at the end of the task.
protected void map(KEYIN key, VALUEIN value, Context context)
Called once for each key/value pair in the input split.
void run(org.apache.hadoop.mapreduce.Mapper.Context context)
Expert users can override this method for more complete control over the execution of the Mapper.
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
Called once at the beginning of the task.
step2:override map() method in Mapper class
-------------------------------------------
public void map(LongWritable k,Text v,Context context)
{
}
Reducer
-------
A Reducer is used to perform an aggregation on a key/value pair.
A Reducer accpets input as key/value pair and generate output as key/value pair.
Reducer logic
-------------
Input from Mapper
-----------------
My-->key
value-->1
Name-->key
value-->1
is-key
value-1
..
I-key
value-1
..
Teaching--key
value--1
..
I-key
value
..
Teaching--key
value--1
output
-------
My,1
name,1
is,1
Manohar,1
I,2
live,1
..
Teaching,2
steps to create Reducer class
-----------------------------
step1:Extend a class from Reducer
acessmodifier class ClassName extends Reducer<K1,V1,K2,V2>
{
}
K1,V1-->Input from Mapper
K2,V2-->output from Reducer
step2:
override reduce() method of Reducer in the sublcass
public void reduce(Text k, Iterable<IntWritable> list, Context con) throws IOException, InterruptedException{
....
....
}
Mapper logic for wordcount
--------------------------
0,
My name is Manohar
0-->k[LongWritable]
My name is Manohar-->value[Text]
step1:
convert Text to String
----------------------
String str=value.toString();
step2:
split string into tokens
String s[]=str.split(" ");
My s[0]
Name s[1]
is s[2]
Manohar s[3]
step3:
Iterate String array and write each token/word with 1
for(int i=0;i<s.length();i++)
{
context.write(new Text(s[i]),new IntWritable(1));
}
No comments:
Post a Comment