Monday, June 12, 2017

Hive-Internal and external tables

Hive tables
-----------
Inner and external tables.
partitioned and Non partitioned tables.
Bucketd and Non bucketed tables.

inner vs external tables

when an inner table is dropped both metadata and data will be deleted.

when an external table is dropped only metadata is deleted.

From hive if we drop an inner table,the directory in hdfs is also deleted.

From hive if we drop an external table,the directory in hdfs is available.

if we drop inner table we loose both table and data where as if we drop external table we loose only table but not data and we can reuse the data if needed.

creating and testing external tables
---------------------------------
hive>cat > file1
AAAAA
AAAAA
^c

Note:By default each table is inner table.

creating external table.
--------------------
hive>create external table tab1(line string);

In hdfs /user/hive/warehouse/tab1 a directory is created.

hive>desc tab1;

loading data into table
-----------------------
syntax
------
load data local inpath 'filename' into table tablename;


hive>load data local inpath 'file1' into table tab1;

In hdfs
   /user/hive/warehouse/tab1/file1

file1 is copied into tab1 directory from local.

displaying data
---------------
hive>select *from tab1;
AAAA
AAAA
AAAA
AAAA

drop table tab1
---------------
drop table tab1;


Now hive starts reading data from all the tables.

cat>file2
BBBB
BBBB
^c

creating inner and external tables in custom hdfs location
----------------------------------------------------------
hive> create table mytab1(line string)
    > location '/guru/myloc';

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=cloudera, access=WRITE, inode="/mano":hdfs:supergroup:drwxr-xr-x

How to handle issue
-------------------
create a supergroup and assign it to cloudera.

[cloudera@quickstart ~]$ sudo groupadd supergroup
[cloudera@quickstart ~]$ usermod -a -G supergroup cloudera
bash: /usr/sbin/usermod: Permission denied
[cloudera@quickstart ~]$ sudo usermod -a -G supergroup cloudera


$cat > abc.txt
aaaaa
bbbbb
ccccc
ddddd
eeeee

hive>load data local inpath 'abc.txt' into table mytab1;

In hdfs,
   /guru/myloc/abc.txt is available
   
hive>drop table tab1;

if it is inner table /guru/myloc  will be deleted otherwise it is available

No comments: