基本命令
切換數(shù)據(jù)庫: use <db_name>;
表重命名:ALTER TABLE table1 RENAME TO table2;
增加表分區(qū)
ALTER TABLE table1 ADD IF NOT EXISTS
PARTITION (year = 2001)
PARTITION (year = 2002)
...;
刪除表分區(qū)
ALTER TABLE table1 DROP IF EXISTS PARTITION (year = 2001);
重命名列名
ALTER TABLE table1 CHANGE COLUMN <old_col> <new_col>
插入新列
ALTER TABLE table1 ADD COLUMNS(
app_name STRING
session_id LONG);
改變存儲格式
ALTER TABLE table1
[PATITION (year = 2001)]
SET FILEFORMAT SEQUENCEFILE;
本地導(dǎo)入數(shù)據(jù)
LOAD DATA LOCAL INPATH '${env: HOME}/ California/employees'
OVERWRITE INTO TABLE employees
[PARTITION (country = 'US')];
通過查詢語句插入數(shù)據(jù)
INSERT OVERWRITE TABLE employees
[PARTITION(...)]
SELECT * FROM stage_employees se
WHERE se.cnty = 'US' and se.et = 'OR';
通過單個查詢語句創(chuàng)建表
CREATE TABLE table1
AS SELECT name, salary, address
FROM employees es
WHERE es.cnty = 'US';
導(dǎo)出數(shù)據(jù)
INSERT OVERWRITE DIRECTORY '/tmp/ca_employees'
SELECT name, salary, address
FROM employees es
WHERE es.cnty = 'US'
Hive 查詢
SELECT col1, col2, col3 FROM table1 WHERE ...
可以增加一些數(shù)學(xué)運算符 或者 自帶函數(shù)
聚合函數(shù)
SELECT count(*), avg(salary)
FROM employees
GROUP BY cnty;
LIMIT: LIMIT從句用于限制結(jié)果的行數(shù)
設(shè)置列別名
SELECT count(*) as sum_num , avg(salary) as avg_sal
FROM employees
GROUP BY cnty;
LIKE: 與通配符 % 一起用, %可代替任何內(nèi)容
HAVING: HAVING從句用來限制 GROUP BY產(chǎn)生的結(jié)果的輸出。
SELECT avg(salary)
FROM employees
GROUP BY cnty
HAVING avg(salary) > 500;
JOIN
Hive SQL支持JOIN操作球订,但是只支持等值連接晌区。
inner join
SELECT a.ymd, a.price, b.price
FROM stocks a inner join stocks b on a.ymd = b.ymd
WHERE a.name = "IBM" and b.name = "APPLE"
理論上一個JOIN 需要啟動一個mapreduce任務(wù),但是當(dāng)多個表做連接時ON子句的連接鍵一樣的話只啟用一個mapreduce任務(wù)。
JOIN優(yōu)化遵倦,把大表放在最后 或者 在代碼中加入/*+STREAMTABLE(小表)*/
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI-JOIN
左半邊連接 只返回左表中滿足on條件的結(jié)果梦抢。
map-side join
在JOIN時,把小表加載到內(nèi)存中能夠加快JOIN效率交洗。
SORT BY骑科、ODER BY 、DISTRIBUTE BY 构拳、CLUSTER BY
ODER BY 對數(shù)據(jù)做全局排序咆爽,最后用一個reduce做排序
SORT BY 對數(shù)據(jù)做局部排序梁棠,在每一個reduce中排序
DISTRIBUTE BY控制map的輸出在reduce中是如何劃分的。
CLUSTER BY 相當(dāng)于結(jié)合DISTRIBUTE BY 和 SORT BY斗埂,對同一列做DISTRIBUTE 和 SORT符糊,但是只支持默認(rèn)的升序排列。
數(shù)值轉(zhuǎn)化
CAST( VALUE AS TYPE)