Hive表重命名
- ALTER TABLE oldName RENAME TO newName;
查看Hive表簡表語句
- show create table tableName
查看表字段詳情
- desc tableName
插入多個(gè)值
- insert into table dbName.tableName values ('bj','1234','哈哈'), ('ah','1245','菜雞')
添加/刪除分區(qū)
- 添加分區(qū):
- alter table tableName add partition (dt='2019-02-12', du='0')
- 刪除分區(qū):
- alter table ods.ods_vas_user_order_increment drop partition (prov_alias='bj', dt='20200505')
添加字段
- alter table dwd.dwd_vas_user_order_increment_day add columns (operation string comment 'add/update/delete', operation_dt string)
修改字段
- ALTER TABLE dept.demo CHANGE name ename string comment '測試說明';
取多個(gè)字段中第幾大的數(shù)
- 取五個(gè)字段中最小的值(index=0)
- sort_array(array(filed4,filed3,filed1,filed0,filed2))[0]
- 取10個(gè)數(shù)字中最大的值(index=9)
- sort_array(array(10,1,8,3,6,5,4,7,2,9))[9]
通過hdfs dfs -put或者h(yuǎn)dfs api寫入hive分區(qū)表的數(shù)據(jù)在hive中無法被查詢到的問題
- 用法:MSCK REPAIR TABLE tableName
- 原理:將關(guān)于分區(qū)的元信息添加到Hive metastore中
- 優(yōu)點(diǎn):避免頻繁使用alter table add partition
往表分區(qū)插入數(shù)據(jù)時(shí)的注意事項(xiàng)
- insert overwrite table tableName1 partition (c='ah') select a,b,c from tableName2;
- 覆蓋插入數(shù)據(jù)到指定表的指定分區(qū)(c='ah')時(shí),插入數(shù)據(jù)不能再包括分區(qū)字段c,因?yàn)橹付ǚ謪^(qū)(c='ah')時(shí)已經(jīng)就確定了分區(qū)字段的值'ah'挺份。所以上述錯(cuò)誤厕妖,應(yīng)該把select后面的c刪了。
日期減n天(Hive函數(shù))
- date_sub('2020-01-02',n)
- 注意:這里的格式必須為'yyyy-mm-dd'
使用join on 時(shí)的注意是事項(xiàng)
- 記住on后面一定使用‘=’,不要使用‘<>’
- 等號(hào)‘=’前后的字段,為NULL的都會(huì)關(guān)聯(lián)不上姻灶,一定注意
- on后面的關(guān)聯(lián)字段,如果存在重復(fù)值诈茧,則結(jié)果數(shù)據(jù)會(huì)翻倍
本地?cái)?shù)據(jù)導(dǎo)入Hive
- 從服務(wù)器本地加載文件進(jìn)hive(csv2hive.csv中只能包含數(shù)據(jù)产喉,不能包含字段名)
- load data local inpath 'csv2hive.csv' overwrite into table tableName
- 從服務(wù)器的分布式環(huán)境加載數(shù)據(jù)進(jìn)hive(csv2hive.csv中只能包含數(shù)據(jù),不能包含字段名)
- load data inpath 'csv2hive.csv' overwrite into table tableName
往hdfs導(dǎo)入文件
- hadoop fs -put < local/hdfs src > < hdfs dest >
- hadoop fs -copyFromLocal < local src > < hdfs dest >
從hdfs導(dǎo)出文件
- hadoop fs -get < hdfs src > < local/hdfs dest >
- hadoop fs -copyToLocal < hdfs src > < local dest >
grouping sets與grouping__id
- 案例如下:
select id, city, type, sum(income), grouping__id from db.table group by id,city,type grouping_set(id,city,type,(id,city,type))
- 上述代碼是分別按照id敢会、city、type走触、(id,city,type)進(jìn)行g(shù)roup by晦譬,等價(jià)于下面代碼
select id, null,null,sum(income),1 as grouping__id from db.table group by id union all select null,city,null,sum(income),2 as grouping__id from db.table group by city union all select null,null,type,sum(income),4 as grouping__id from db.table group by type union all select id, city,type,sum(income),7 as grouping__id from db.table group by id,city,type
- grouping__id的值怎么來的敛腌?
- 注意案例中的'group by id,city,type'的順序
id | city | type | grouping__id | ||
---|---|---|---|---|---|
group by id | 1 | 0 | 0 | 1×2^0 + 0×2^1 + 0×2^2 | 1 |
group by city | 0 | 1 | 0 | 0×2^0 + 1×2^1 + 0×2^2 | 2 |
group by type | 0 | 0 | 1 | 0×2^0 + 0×2^1 + 1×2^2 | 4 |
group by id,city,type | 1 | 1 | 1 | 1×2^0 + 1×2^1 + 1×2^2 | 7 |
hive.auto.convert.join
- hive.auto.convert.join待更新