JavaMastar: Hive-Bucketing

Monday, June 19, 2017

Hive-Bucketing

Note:For emp dataset please refer to previous posts.

Bucketing
---------
A bucket is a file where the keys are mainatained in a Hashtable
in the form of a directory.

In bucketing all similar keys are placed into same bucket also we choose how many buckets we need based on available records so that we can keep the size of the buckets in control.

Bucket is nothing but a kind of partitioning.

To implement bucketing in hive we should set a property

hive>set hive.enforce.bucketing=true;

How to create a bucket table
----------------------------
we create a bucketed table by using command clustered by also we should specify how many buckets we should create.

syntax
------
create table tablename(col1 type,col2 type,...)
clustered by(col1,...)
into n buckets

n-->no of buckets.

example
-------
hive> create table emp_buckets(eno int,name string,sal int,gender string,dno int)
> clustered by(dno)
> into 3 buckets;

inserting data into bucketed table
----------------------------------
hive> insert overwrite table emp_buckets
select *from empdata;

hive>select *from emp_buckets;
In hdfs 3 bucket files are created
/user/hive/warehouse/emp_buckets/000000_0
000001_0
000002_0

creating hive table with multiple columns
------------------------------------------
hive> create table emp_buckets12(eno string,name string,sal double,gender string,dno int)
> clustered by(dno,gender)
> into 4 buckets;

hive>insert overwrite table emp_buckets12
select *from empdata;

In hdfs,4 data files are created
/user/hive/warehouse/emp_buckets12/000000_0
000001_0
000002_0
000003_0

hive>select *from emp_bucktes12;

creating table combining partition and bucketing
------------------------------------------------
hive> create table emp_part_buckets(eno int,name string,sal double,dno int)
> partitioned by(gender string)
> clustered by(dno)
> into 3 buckets
> row format delimited fields terminated by ',';

loading data file partition buckets
hive> insert overwrite table emp_part_buckets
> partition(gender)
> select eno,name,sal,dno,gender from empdata;

In hdfs
/user/hive/warehouse/emp_part_buckets/gender=m/000000_0
000001_0
000002_0

/user/hive/warehouse/emp_part_buckets/gender=f/000000_0
000001_0
000002_0

hive>select *from emp_part_buckets

JavaMastar

Monday, June 19, 2017

Hive-Bucketing

No comments:

Total Pageviews