MySQL
書寫順序:select--from--where--group by--having--order by
執(zhí)行順序:from--where--group by--having--select--order by
from:從哪個表檢索
where:過濾表中數(shù)據(jù)的條件
group by:分組
having:對已經(jīng)分組的數(shù)據(jù)進行過濾
select:查看結(jié)果集中的哪個列浑玛,或列的結(jié)果
order by:按照什么樣的順序來查看
hive與sql的區(qū)別(轉(zhuǎn)自鏈接處):
- hive不支持等值連接:
例如
sql:select * from a,b where a.id=b.id
hive必須為:select * from a join b on a.id=b.id; - 分號字符
sql:select concat(key,concat(';',key)) from dual;
hive需要進行轉(zhuǎn)義:select concat(key,concat('\073',key)) from dual; - hive不支持insert into 表 values() ,delete,update
- hive支持嵌入mapreduce程序挟冠,來處理復(fù)雜的邏輯
- Hive不支持將數(shù)據(jù)插入現(xiàn)有的表或分區(qū)中
只支持覆蓋重寫整個表 insert overwrite table t1 - hive支持將轉(zhuǎn)換后的數(shù)據(jù)直接寫入不同的表,還能寫入分區(qū)涯曲、hdfs和本地目錄
from t1
insert overwrite table t2
select t3.c2, avg(t3.c1) from t3 where
怎么防止數(shù)據(jù)傾斜
1.開窗函數(shù)
-排名開窗 ROW_NUMBER雀监、DENSE_RANK双吆、RANK、NTILE
--查詢按組累加
select id,[group],num,sum(num) over(partition by [group] order by id) from testaa
group by id,[group],num
ROW_NUMBER()為每一組的行按順序生成一個唯一的序號
RANK() 若有重復(fù)值則生成重復(fù)的序號,且下一個序號不連續(xù)
DENSE_RANK()若有重復(fù)值則生成重復(fù)的序號好乐,下一個序號是連續(xù)的
#row_number() over (partition by 分組列 order by 排序列)
SELECT WeekNo,Price,Sales_Amount,
row_number() over (partition by WeekNo order by Sales_Amount) as rk from forecasting
order by WeekNo
#DENSE_RANK() over (partition by 分組列 order by 排序列)
SELECT WeekNo,Price,Sales_Amount,
DENSE_RANK() over (partition by WeekNo order by Sales_Amount) as rk from forecasting
order by WeekNo
#RANK() over (partition by 分組列 order by 排序列)
SELECT WeekNo,Price,Sales_Amount,
RANK() over (partition by WeekNo order by Sales_Amount) as rk from forecasting
order by WeekNo
-聚合開窗函數(shù) sum() max() min()
#max(聚合列) over(partition by 排序列)
SELECT WeekNo,Price,Sales_Amount,
max(Sales_Amount) over (partition by WeekNo) as rk from forecasting
order by WeekNo
2.join,left join,right join,union all,union
-join:取交集匾竿,INNER JOIN 與 JOIN 是相同的;
-left join:以左表為主表曹宴,若左(m條)右(n條)兩表是一(多)對多的關(guān)系搂橙,則結(jié)果集顯示左表中連接字段的值分別與右表連接字段的多個值進行連接歉提,最多m*n條笛坦,最少0條,left join與left outer join是相同的苔巨;
-right join:以右表為主表版扩,結(jié)果集返回右表的行數(shù),一(多)對多與上面left join類似侄泽;
-union all:取并集礁芦,包括重復(fù)行,不進行排序悼尾,m+n條柿扣;
-union:取并集,不包括重復(fù)行闺魏,排序未状,最多m+n條;
select * from (SELECT distinct WeekNo from forecasting) a
left join forecasting b
on a.WeekNo=b.WeekNo
order by a.WeekNo
#union all
select * from (SELECT distinct WeekNo from forecasting where month_num=12) a
union all
SELECT distinct WeekNo from forecasting b
order by WeekNo
#union
select * from (SELECT distinct WeekNo from forecasting where month_num=12) a
union
SELECT distinct WeekNo from forecasting b
order by WeekNo
3.日期函數(shù)
-now() 返回當(dāng)前日期時間析桥;
-curdate()返回當(dāng)前日期司草;
-datediff(end_date,start_date)返回相差的天數(shù);
-date_sub/add(start_date,interval n day/hour/year);hive:date_sub/add(start_date,n)返回減/加天數(shù)的日期
SELECT WeekNo,NOW(),datediff(curdate(),'2020/12/31'),
date_add(now(),interval -4 day) from forecasting order by WeekNo
4.空值處理函數(shù)
-isnull(字段名,返回值)泡仗,coalesce(),ifnull()一樣的用法
SELECT ifnull(Inventory_Unit,0) from forecasting order by WeekNo
5. 去除重復(fù)值只保留一行
用row_number() over(partition by order by )編號埋虹,取第一條
6.數(shù)據(jù)庫
ODS層:粒度最細;當(dāng)前需要加載的數(shù)據(jù)娩怎,存儲處理完后的歷史數(shù)據(jù)
DW層:數(shù)據(jù)倉庫層搔课;經(jīng)過清洗了的,滿足第三范式截亦;粒度與ODS相同爬泥;只允許增加,不允許修改刪除魁巩;按照一定主題進行組織急灭;
DM層:數(shù)據(jù)集市層;以某個業(yè)務(wù)應(yīng)用為出發(fā)點谷遂,雪花狀葬馋;匯總級,不存在明細數(shù)據(jù);
7. 將字符串轉(zhuǎn)化為數(shù)組,并滿足top_category_name_new包含在rule_category 中
array_contains(split(colnameA,','),colnameB)
eg: colnameA: a,b,c colnameB:c
8. concat相關(guān)
-
連接多個字段
(1) concat(colname1,'separator',colname2,'separator',...,colnameN)
只要其中一個是NULL畴嘶,那么將返回NULL
(2) concat_ws連接多個字段蛋逾,并一次性指定分隔符
concat_ws('separator',colname1,colname2,...,colnameN)
只要有一個字符串不是NULL,就不會返回NULL,separator不能為NULL窗悯,否則返回NULL
eg:
-
實現(xiàn)行轉(zhuǎn)列,并放在一條記錄里
(1) 【Hive SQL】concat_ws('seperator',collect_set(colname))
collect_set將同一個分組中的值轉(zhuǎn)化成數(shù)組区匣,去重?zé)o序,排除NULL值
需要排序可以改成collect_list(不去重)或在外面加sort_array進行排序
concat_ws('seperator',collect_list(colname))
concat_ws('seperator',sort_array(collect_set(colname)),false)
eg:
colname
1
2
3
轉(zhuǎn)換為(1,2,3)
(2)group_concat( [distinct] colname [order by 排序字段 asc/desc ] [separator '分隔符'] )
或者group_concat(colname,'separator'),表示將group by產(chǎn)生的同一個分組中的值連接起來蒋院,返回一個字段
后面別忘了group by
eg:
9. array_contains(split( CASE WHEN rule_category IN ('服裝','女裝') THEN '服裝' ELSE rule_category END,','), category_lv1)
表示某個字段的值包含在數(shù)組中
10. 分類匯總(a,b,c,total)
grouping(col==1,'total',col)
group by cube(col)
11.json字符串解析
get_json_object(col,'$.name')