Order By語法
對(duì)全局?jǐn)?shù)據(jù)的排序迅皇,只有一個(gè)reduce
colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST) -- (Note: Available in Hive 2.1.0 and later)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy
order by 樣例
select * from emp;
Sort By
對(duì)每一個(gè)Reduce內(nèi)部進(jìn)行排序昧辽,對(duì)全局結(jié)果集來說不是排序的
設(shè)置 reduce 執(zhí)行的個(gè)數(shù)
set mapreduce.job.reduces=<number>
sort by樣例
hive>set mapreduce.job.reduces=3;
hive>insert overwrite local directory '/opt/datas/hive_exp_emp0308' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
select * from emp sort by empno asc
Distribute By
也就是分區(qū)partition,類似MapReduce中分區(qū)partition登颓,對(duì)數(shù)據(jù)進(jìn)行分區(qū)后搅荞,結(jié)合sort by 進(jìn)行排序使用。
insert overwrite local directory '/opt/datas/hive_exp_distribute_emp0308' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
select * from emp distribute by deptno sort by empno asc
第一個(gè)分區(qū)數(shù)據(jù)000000_0
第二個(gè)分區(qū)000001_0
第三個(gè)分區(qū)000002_0
Cluster By
當(dāng)sort by 和 distribute by的字段相同時(shí)框咙,就可以使用Cluster By替換榴徐。
insert overwrite local directory '/opt/datas/hive_exp_cluster_emp0308' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
select * from emp cluster by empno
總結(jié)
Hive中select新特性
Order By 全局排序增显,一個(gè)Reduce
Sort By 每個(gè)reduce內(nèi)部進(jìn)行排序,全局不是排序 Distribute By 類似MR中partition,進(jìn)行分區(qū)嗽仪,結(jié)合sort by使用 Cluster By 當(dāng)distribute和sort字段相同時(shí)惕虑,使用方式