1.INSERT
官網(wǎng)說明:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)[IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)]select_statement1 FROM from_statement;
hive> create table ruozedata_emp4 like ruozedata_emp;
hive> INSERT OVERWRITE TABLE ruozedata_emp4 select * FROM ruozedata_emp;
hive> select * from ruozedata_emp4;
hive> INSERT INTO TABLE ruozedata_emp4 select * FROM ruozedata_emp;
hive> select * from ruozedata_emp4;
數(shù)據(jù)被追加進(jìn)去
上邊兩個命令都是select * 侠讯,兩表都是一一對應(yīng)的挖藏,如果insert的表的列數(shù)與原表不等或者列的順序與原表不同會怎樣呢?
hive> INSERT OVERWRITE TABLE ruozedata_emp4
? ? > SELECT empno,ename from ruozedata_emp;
hive> INSERT INTO TABLE ruozedata_emp4
? ? > select empno,job, ename,mgr, hiredate, salary, comm, deptno from ruozedata_emp;
(列的順序與原表不同)
hive> select * from ruozedata_emp4;
2.Writing data into the filesystem from queries
官網(wǎng)說明:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive0.11.0)
??SELECT ... FROM ...
3.where語句
hive> select * from ruozedata_emp where deptno=10;
hive> select * from ruozedata_emp where empno>=8000;
hive> select * from ruozedata_emp where salary between 800 and 1500;(左閉右閉)
hive> select * from ruozedata_emp limit 5;
hive> select * from ruozedata_emp where ename in ("SMITH","KING");
hive> select * from ruozedata_emp where ename not in ("SMITH","KING");
hive> select * from ruozedata_emp where comm is null;
4.聚合函數(shù)
max/min/count/sum/avg? ?特點:多進(jìn)一出
hive> select count(1) from ruozedata_emp where deptno=10;
hive>select max(salary),?min(salary),?avg(salary),?sum(salary) from ruozedata_emp;
5.分組函數(shù) group by
1)
hive> select deptno,avg(salary) from ruozedata_emp group by deptno;(求部門平均工資)
hive> select ename,deptno,avg(salary) from ruozedata_emp group by deptno;
FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'ename'
報錯原因:select中出現(xiàn)的字段厢漩,如果沒有出現(xiàn)在組函數(shù)/聚合函數(shù)中膜眠,必須出現(xiàn)在group by里面
2)求每個部門(deptno)、工作崗位(job)的最高工資(salary)
hive> select deptno,job,max(salary) from ruozedata_emp group by deptno,job;
3)求平均工資大于2000的部門
hive> select deptno,avg(salary) from ruozedata_emp group by deptno having avg(salary)>2000;
不能用where溜嗜,對于分組之后的結(jié)果進(jìn)行篩選只能用having宵膨;where是用來對單條數(shù)據(jù)進(jìn)行篩選的,而且where需要寫在group by之前的
4)case when then
select ename,salary,
case
when salary>1 and salary<=1000 then 'lower'
when salary>1000 and salary<=2000 then 'middle'
when salary>2000 and salary<=4000 then 'high'
else 'highest'
end
from ruozedata_emp;
5)join
hive> select * from a join b;
inner join=join
outer join:left join,right join焙压,full join
hive> select a.id,a.name,b.age from a join b on a.id=b.id;
hive> select a.id,a.name,b.age from a left join b on a.id=b.id;
hive> select a.id,a.name,b.age from a full join b on a.id=b.id;
6.分區(qū)表? ?partition
1)靜態(tài)分區(qū)
hive> create table order_partition(
? ? > ordernumber string,
? ? > eventtime string
? ? > )
? ? > partitioned by (event_month string)
? ? > row format delimited fields terminated by '\t';
hive> LOAD DATA LOCAL INPATH '/home/hadoop/data/order.txt'
? ? > OVERWRITE INTO TABLE order_partition
? ? > PARTITION(event_month='2014-05');
hive> select * from order_partition;
OK
10703007267488? 2014-05-01? ? ? 2014-05
10101043505096? 2014-05-01? ? ? 2014-05
10103043509747? 2014-05-01? ? ? 2014-05
10103043501575? 2014-05-01? ? ? 2014-05
10104043514061? 2014-05-01? ? ? 2014-05
hive> desc formatted order_partition;
[hadoop@hadoop001 data]$ hdfs dfs -ls hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/order_partition
[hadoop@hadoop001 data]$ hdfs dfs -ls hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-05
hive> select * from order_partition where event_month='2014-05';
分區(qū)查詢的時候務(wù)必要記得帶上分區(qū)
添加分區(qū)的方法
a)以上是通過建表的時候直接分區(qū)鸿脓,hdfs里顯示相應(yīng)的分區(qū)會有相應(yīng)的文件夾/order_partition/event_month=2014-05,那么可不可以通過直接在hdfs里新建一個分區(qū)文件夾來進(jìn)行分區(qū)呢涯曲?
[hadoop@hadoop001 data]$ hdfs dfs -mkdir /user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-06
[hadoop@hadoop001 data]$ hdfs dfs -put order.txt /user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-06
新建了/event_month=2014-06文件夾野哭,并將order.txt放入了該文件夾內(nèi)
[hadoop@hadoop001 data]$ hdfs dfs -ls hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db/order_partition
hive> select * from order_partition where event_month='2014-06';
在hive里查詢一下分區(qū)event_month=2014-06下的數(shù)據(jù),卻發(fā)現(xiàn)是空的:
進(jìn)入mysql查看一下元數(shù)據(jù)
mysql> show databases;
mysql> use ruozedata_basic03;
mysql> show tables;
mysql> select * from partitions;
mysql> select * from partition_keys;
mysql> select * from partition_key_vals;
發(fā)現(xiàn)元數(shù)據(jù)里只有event_month=2014-05這一個分區(qū)幻件,為什么呢匆浙?
查詢官網(wǎng)的說明:
metastore沒有感知到产舞,需使用msck命令修復(fù)MySQL表的元數(shù)據(jù)信息,Hive上才能查到到相應(yīng)的數(shù)據(jù)結(jié)果
hive> MSCK REPAIR TABLE order_partition;
OK
Partitions not in metastore:? ? order_partition:event_month=2014-06
Repair: Added partition to metastore order_partition:event_month=2014-06
再次查看mysql里的元數(shù)據(jù),分區(qū)信息都進(jìn)來了
hive> select * from order_partition where event_month='2014-06';
但是幔崖,這個命令會刷新所有的分區(qū)信息,過于簡單粗暴呜舒,不建議使用
b)應(yīng)該用Add Partitions
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION?'location'][, PARTITION partition_spec [LOCATION?'location'],?...];
partition_spec:
??: (partition_column = partition_col_value, partition_column = partition_col_value,?...)
再來試驗一下
[hadoop@hadoop001 data]$ hdfs dfs -mkdir /user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-07
[hadoop@hadoop001 data]$ hdfs dfs -put order.txt /user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-07
[hadoop@hadoop001 data]$ hdfs dfs -ls /user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-07
Found 1 items
-rw-r--r--? 1 hadoop supergroup? ? ? ? 217 2018-06-17 21:11 /user/hive/warehouse/ruozedata.db/order_partition/event_month=2014-07/order.txt
hive> select * from order_partition where event_month='2014-07';
OK(沒有數(shù)據(jù))
使用Add Partitions命令
hive> ALTER TABLE order_partition ADD IF NOT EXISTS
? ? > PARTITION (event_month='2014-07') ;
hive> select * from order_partition where event_month='2014-07';
c)還有一種方法反璃,如下:
hive> create table order_4_partition(
? ? > ordernumber string,
? ? > eventtime string
? ? > )
? ? > row format delimited fields terminated by '\t';
hive> load data local inpath '/home/hadoop/data/order.txt' overwrite into table order_4_partition;
hive> insert overwrite table order_partition
? ? > partition(event_month='2014-08')
? ? > select * from order_4_partition;
hive> select * from order_partition where event_month='2014-08';
在hive里查看當(dāng)前有幾個分區(qū)的方法:
hive> show partitions order_partition;
靜態(tài)分區(qū)是最簡單的分區(qū),單級分區(qū)秃臣,分區(qū)與表中的字段內(nèi)容沒有關(guān)系涧衙,而且出現(xiàn)在分區(qū)內(nèi)的字段內(nèi)容是不能出現(xiàn)在表中的
d)多級分區(qū)
hive> create table order_mulit_partition(
? ? > ordernumber string,
? ? > eventtime string
? ? > )
? ? > partitioned by (event_month string,event_day string)
? ? > row format delimited fields terminated by '\t';
hive> desc formatted order_mulit_partition;
hive> LOAD DATA LOCAL INPATH '/home/hadoop/data/order.txt'
? ? > OVERWRITE INTO TABLE order_mulit_partition
? ? > PARTITION(event_month='2014-05', event_day='01');
hive> select * from order_mulit_partition where event_month='2014-05' and event_day='01';
查看一下在hdfs下的目錄結(jié)構(gòu)
2)動態(tài)分區(qū)
hive> create table ruozedata_static_emp
? ? > (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double,deptno string)
? ? > PARTITIONED by(deptno string)
? ? > ROW FORMAT DELIMITED
? ? > FIELDS TERMINATED BY '\t';
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns(報錯,分區(qū)字段不能出現(xiàn)在表中)
hive> create table ruozedata_static_emp
? ? > (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double)
? ? > PARTITIONED by(deptno string)
? ? > ROW FORMAT DELIMITED
? ? > FIELDS TERMINATED BY '\t';
hive> select * from ruozedata_static_emp;
OK(里面沒有數(shù)據(jù))
將表?ruozedata_emp中deptno=10的數(shù)據(jù)寫入?ruozedata_static_emp的deptno='10'的分區(qū)內(nèi)
hive> insert into table ruozedata_static_emp partition(deptno='10')
? ? > select empno,ename,job,mgr,hiredate,salary,comm from ruozedata_emp
? ? > where deptno=10;
hive> select * from ruozedata_static_emp;
以上還是靜態(tài)分區(qū)的方法奥此,如果需要分區(qū)很多(比如1000個分區(qū))弧哎,這種方式太耗時間精力
所以需要動態(tài)分區(qū)
hive> create table ruozedata_dynamic_emp
? ? > (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double)
? ? > PARTITIONED by(deptno string)
? ? > ROW FORMAT DELIMITED
? ? > FIELDS TERMINATED BY '\t' ;
動態(tài)分區(qū)明確要求:分區(qū)字段寫在select的最后面
hive> insert into table ruozedata_dynamic_emp partition(deptno)
? ? > select empno,ename,job,mgr,hiredate,salary,comm,deptno from ruozedata_emp ;
hive> set hive.exec.dynamic.partition.mode=nonstrict;(只是臨時設(shè)置為 nonstrict ,重新進(jìn)入hive又會自動變?yōu)閟trict模式)
【set hive.exec.dynamic.partition.mode=nonstrict;
這是hive中常用的設(shè)置key=value的方式
語法格式:
set key=value; 設(shè)置
set key;? ? ? ? ? ? ?取值? 】
hive> insert into table ruozedata_dynamic_emp partition(deptno)
? ? > select empno,ename,job,mgr,hiredate,salary,comm,deptno from ruozedata_emp ;
hive> show partitions ruozedata_dynamic_emp;