Wednesday, May 17, 2017

Hadoop[sqoop]

How to install mysql-server
---------------------------
[cloudera@quickstart ~]$ sudo yum install mysql-server.


$mysql -u root -pcloudera

mysql>
mysql> create table employee(eno int primary key,name varchar(20),sal double,gender varchar(10),dno int);


insert data
-----------
1,aaa,2200.0,m,11
2,aaa,2200.0,f,11
3,bbb,3200.0,f,13
4,bbb,3200.0,m,13
5,ccc,4200.0,m,14
6,ccc,4200.0,f,14



Importing data from a databse to hdfs
-------------------------------------
To import data from rdbms to hdfs we use a sqoop command import

syntax
------
sqoop import  \  
--connect  jdbc:mysql://localhost/databasename  \
--username user   --password pass \
--table  tablename  -m n --target-dir  dirname

--connect-->is used to specify url of a database from which we need to migrate data.

--username
--password

database credentials

--table -->It is used to specify tablename from which we need to migrate daya.

--target-dir-->It is used to specify from where to store the data in hdfs.


example
-------
[cloudera@quickstart ~]$ sqoop import \
--connect  jdbc:mysql://localhost/hadoop \
--username root --password cloudera \
--table employee  --target-dir sqoopimp1

Note
----
Make sure a java drive for mysql is installed in hadoop distribution.

When we run a sqoop job by default 4 mappers are initiated by hadoop and the dataset is distributed into 4 map files in hdfs.

In hdfs
-------
$hadoop fs -ls /user/cloudera/sqoopimp1

/user/cloudera/sqoopimp1/part-m-00000
/user/cloudera/sqoopimp1/part-m-00001
/user/cloudera/sqoopimp1/part-m-00002
/user/cloudera/sqoopimp1/part-m-00003

examining data
--------------
$hadoop fs -cat /user/cloudera/sqoopimp1/part-m-00000

we can specify no of mappers by using the flag -m.

example
-------
[cloudera@quickstart ~]$ sqoop import \
--connect  jdbc:mysql://localhost/hadoop \
--username root --password cloudera \
--table employee --m 2  --target-dir sqoopimp1

Here we are specifying 2 mappers.














































No comments: