使用過Oracle數據庫的童鞋都知道Oracle里面有兩張經典的表員工(emp)和部門(dept),本文就以這兩張表為基礎参滴,介紹一些Hive中表的一些基表操作有巧。
1.創(chuàng)建表
根據emp和dept的字段類型,將其轉換為Hive中的對應的數據類型牺弹,我們可以得到這兩張表的建表語句:
--員工表
create table IF NOT EXISTS default.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
--部門表
create table IF NOT EXISTS default.dept(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
在hive控制臺界面執(zhí)行上面的建表命令,然后檢查兩張表是否創(chuàng)建成功:
hive (default)> show tables;
OK
tab_name
dept
emp
Time taken: 0.079 seconds, Fetched: 2 row(s)
可以看到dept和emp兩張表都已經創(chuàng)建成功了捎谨。
2.加載數據
將Oracle中這兩張表的數據導出成文本格式emp.txt和dept.txt诫尽,導出時以制表符(\t)對字段進行分割,然后將這兩個文件上傳到hive客戶端的服務器上禀酱,接下來就可以對數據進行加載了。
hive (default)> load data local inpath '/opt/datas/emp.txt' overwrite into table emp ;
Loading data to table default.emp
Table default.emp stats: [numFiles=1, totalSize=659]
OK
Time taken: 0.843 seconds
hive (default)> load data local inpath '/opt/datas/dept.txt' overwrite into table dept ;
Loading data to table default.dept
Table default.dept stats: [numFiles=1, totalSize=82]
OK
Time taken: 0.417 seconds
然后查看加載的數據是否正確:
hive (default)> select * from emp;
OK
empno ename job mgr hiredate sal comm deptno
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
Time taken: 0.272 seconds, Fetched: 14 row(s)
hive (default)> select * from dept;
OK
deptno dname loc
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
Time taken: 0.14 seconds, Fetched: 4 row(s)
可以到數據已經正確加載進來了牧嫉。
3.創(chuàng)建子表
創(chuàng)建子表的語句如下:
create table if not exists default.dept_cats
as
select deptno, dname from dept ;
在hive中執(zhí)行的結果如下:
hive (default)> create table if not exists default.dept_cats
> as
> select deptno, dname from dept ;
Query ID = hive_20190213212727_c554cafb-6c5d-4e0c-8ad6-a19f902f3222
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1550060164760_0002, Tracking URL = http://node1:8088/proxy/application_1550060164760_0002/
Kill Command = /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib/hadoop/bin/hadoop job -kill job_1550060164760_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-02-13 21:28:39,419 Stage-1 map = 0%, reduce = 0%
2019-02-13 21:29:14,401 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.64 sec
MapReduce Total cumulative CPU time: 1 seconds 640 msec
Ended Job = job_1550060164760_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://node1:8020/user/hive/warehouse/.hive-staging_hive_2019-02-13_21-27-40_322_2052979003044233314-1/-ext-10001
Moving data to: hdfs://node1:8020/user/hive/warehouse/dept_cats
Table default.dept_cats stats: [numFiles=1, numRows=4, totalSize=49, rawDataSize=45]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.64 sec HDFS Read: 3352 HDFS Write: 122 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 640 msec
OK
deptno dname
Time taken: 96.555 seconds
可以看出剂跟,在創(chuàng)建子表的時候會在Yarn平臺上運行MapReduce,運行完之后我們查看一下結果:
hive (default)> show tables;
OK
tab_name
dept
dept_cats
emp
Time taken: 0.017 seconds, Fetched: 3 row(s)
hive (default)> select * from dept_cats;
OK
deptno dname
10 ACCOUNTING
20 RESEARCH
30 SALES
40 OPERATIONS
Time taken: 0.121 seconds, Fetched: 4 row(s)
新增的表dept_cats有4條記錄酣藻,說明我們已經成果的創(chuàng)建了子表曹洽。
4.清除表數據
清除一張表中的數據使用truncate命令,我們把表dept_cats中的數據進行清除:
hive (default)> truncate table dept_cats;
OK
Time taken: 0.29 seconds
hive (default)> select * from dept_cats;
OK
deptno dname
Time taken: 0.121 seconds
可以看出剛才創(chuàng)建的子表的數據已經被清除了辽剧。
5.修改表名稱
首先我們用like的方式創(chuàng)建一張表:
hive (default)> create table if not exists default.dept_like
> like
> default.dept ;
OK
Time taken: 0.188 seconds
hive (default)> show tables;
OK
tab_name
dept
dept_cats
dept_like
emp
Time taken: 0.065 seconds, Fetched: 4 row(s)
然后對新建的表dept_like修改名稱:
hive (default)> alter table dept_like rename to dept_like_rn ;
OK
Time taken: 0.501 seconds
hive (default)> show tables;
OK
tab_name
dept
dept_cats
dept_like_rn
emp
Time taken: 0.052 seconds, Fetched: 4 row(s)
可以看到表dept_like已經被修改為dept_like_rn送淆。注意,在hive中一般不去修改字段的名稱或者增加字段抖仅,hive主要是用來進行數據分析的,所以如果要修改字段名稱可以使用創(chuàng)建子表的方式來進行砖第。
6.刪除表
刪除一張表用drop命令撤卢,我們刪除剛才新建的表:
hive (default)> drop table if exists dept_like_rn ;
OK
Time taken: 0.542 seconds
hive (default)> show tables;
OK
tab_name
dept
dept_cats
emp
Time taken: 0.049 seconds, Fetched: 3 row(s)