JavaMastar: Hive-collection datatypes

Array
-----
Array is collection of similar type of values.

we use datatype array to declare array in a hive table

syntax
------
colname array<type>

cat > courses.txt

1,satya,hadoop$c$cpp$java
2,aaaa,hadoop$c$cpp$java
3,bbbb,photo$animation

syntax
------
create table tablename(col1 type,col2 type,...)
row format delimited fields terminated by 'symbol'
collection items terminated by 'symbol'

ex:
hive> create table courses(id int,iname string,course array<string>)
> row format delimited fields terminated by ','
> collection items terminated by '$';

hive> load data local inpath 'courses.txt' into table courses;
hive> select *from courses;
1 satya ["hadoop","c","cpp","java"]
2 aaaa ["hadoop","c","cpp","java"]
3 bbbb ["photo","animation"]

Internal representation of array
---------------------------------
------
haoop|0
-----
c |1
-----
cpp |2
-----
java |3
-----

An array always starts with index 0 and ends with size-1

query1
------
Display id,courses from courses table
Display id,name,first 2 courses from array in a courses table
Display all courses whose id is 1
Display first course from courses table whose id is 1
Display 3,4 courses from courses table whose id is 3

hive> select id,course from courses;
OK
1 ["hadoop","c","cpp","java"]
2 ["hadoop","c","cpp","java"]
3 ["photo","animation"]

hive> select id,iname,course[0],course[1] from courses;
1 satya hadoop c
2 aaaa hadoop c
3 bbbb photo animation

hive>select course from courses where id=1;
["hadoop","c","cpp","java"]

hive> select course[0] from courses where id=1;
hadoop

hive> select course[2],course[3] from courses where id=3;
NULL NULL

Note:if we try to get values from an array whose index does not exist then hive returns NULL values if the index is not found.

Map
---
A Map is a collection used to represent data in the form of key and value pairs.

key=value
key must be unique,values can be duplcated.
we use key to perform operations on a map.

ex:
1=manohar
2=sanjay

syntax
------
key1=val1<symbol>key2=val2<symbol>key3=val3......

cat>items.txt
1,more,1=soap#2=sampoo#3=perfume
2,spencer,1=choclate#2=gum#3=lays
3,reliance,1=cola#2=maggy#3=colgate

How to declare map in a hive
----------------------------
we represent map in hive table using map datatype

syntxa
------
colname map<type1,type2>

while creating a table with map datatype we should specify 3 delimiters with
i)row format
ii)collection items
iii)Map keys

syntax
-----
create table tablename(colname type1,........)
row format delimited fields termianted by 'symbol'
collection items terminated by 'symbol'
map keys terminated by 'symbol';

ex:
hive> create table items(id int,name string,products map<int,string>)
> row format delimited fields terminated by ','
> collection items terminated by '#'
> map keys terminated by '=';

hive>load data local inpath 'items.txt' into table items;

hive>select *from items;
1 more {1:"soap",2:"sampoo",3:"perfume"}
2 spencer {1:"choclate",2:"gum",3:"lays"}
3 reliance {1:"cola",2:"maggy",3:"colgate"}

How to access data from a map collection
---------------------------------
colname[key]

display a map collection whose id is 3,2
display a map collection whose key is 1,2
display a map collection whose key is 1,2 where id is 1
display a map collection whose key is 2,3 where id is 1,3
display a map collection whose key is 3 where id is 1,2,3

hive> select *from items where id=2 or id=3;
hive> select products[1],products[2] from items;
hive> select products[1],products[2] from items where id=1;
hive> select products[1],products[2] from items where id=2 or id=3;

structure
---------
A structure is collection of different types of values.
we declare a strcuture in a hive table using the datatype struct.

syntax
------
colname struct<var1:type,var2:type,.....>

ex:
address struct<city:string,pin:int,stno:int>

How to represent struct in a dataset
-------------------------------------
emp.txt
-------
1,aaa,hyd$5002$43
2,bbb,bombay$5002$43
3,ccc,delhi$5002$43

Here we should we should use 2 delimiters with
i)row format
ii)collection items

table syntax
------------
create table tablename(colname type1,........)
row format delimited fields termianted by 'symbol'
collection items terminated by 'symbol';

ex:
hive> create table emp_struct(id int,name string,addr struct<city:string,pin:int,stno:int>)
row format delimited fields termianted by ','
collection items terminated by '$';

hive>load data local inpath 'emp.txt' into table emp_struct;
hive>select *from emp_struct;

How to access data from a struct
--------------------------------
syntax
------
colname.varnamecut

ex:
address.city

display city,pin from table emp_struct
display eno,name,stno from emp
display eno,name,stno from emp where id is 1,2

hive> select addr.city,addr.pin from emp_struct;
hive> select id,name,addr.stno from emp_struct;
hive> select id,name,addr.stno from emp_struct where id=1 or id=2;

Note:hive does not support union datatype.

JavaMastar

Monday, June 12, 2017

Hive-collection datatypes

No comments:

Total Pageviews