備注:
Hive 版本 2.1.1
一.Hive explain命令概述
Hive的explain命令用來看Hive sql的執(zhí)行計(jì)劃,通過分析執(zhí)行計(jì)劃來達(dá)到優(yōu)化Hive sql的目的热监。
語法:
EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
Hive查詢被轉(zhuǎn)換成一個(gè)階段序列(它更像是一個(gè)有向非循環(huán)圖)恤筛。這些階段可能是map/reduce階段俱萍,甚至可能是執(zhí)行元存儲或文件系統(tǒng)操作(如移動和重命名)的階段库菲。explain輸出有三部分:
- 查詢語句的抽象語法樹(AST)
- 執(zhí)行計(jì)劃不同階段間的依賴關(guān)系
- 每個(gè)階段的描述
階段本身的描述顯示了一系列操作符以及與操作符關(guān)聯(lián)的元數(shù)據(jù)琐鲁。元數(shù)據(jù)可能包括FilterOperator的過濾表達(dá)式挽拂、SelectOperator的選擇表達(dá)式或FileSinkOperator的輸出文件名惭每。
二.Hive explain 案例
map/reduce階段本身有兩個(gè)部分:
從表別名到映射操作符樹的映射——該映射告訴映射器要調(diào)用哪個(gè)操作符樹來處理來自特定表的行或前一個(gè)Map /reduce階段的結(jié)果。
以一個(gè)簡單的group by語句來看explain的效果:
hive>
> explain extended
> select t.sale_date ,
> t.prod_name ,
> sum(t.sale_nums) total_nums
> from ods_fact_sale t
> group by t.sale_date,t.prod_name ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (GROUP, 914)
DagName: root_20201230141411_24e56f4b-fdc1-461f-8ac9-81594cc3b1ad:1
Vertices:
Map 1
Map Operator Tree: //發(fā)生在job的 map 處理階段過程
TableScan //讀取表的數(shù)據(jù)
alias: t //讀取表的別名
Statistics: Num rows: 767830000 Data size: 30653263662 Basic stats: COMPLETE Column stats: NONE //表的統(tǒng)計(jì)信息
GatherStats: false
Select Operator
expressions: sale_date (type: string), prod_name (type: string), sale_nums (type: int)
outputColumnNames: sale_date, prod_name, sale_nums
Statistics: Num rows: 767830000 Data size: 30653263662 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(sale_nums)
keys: sale_date (type: string), prod_name (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 767830000 Data size: 30653263662 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string)
null sort order: aa
sort order: ++
Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
Statistics: Num rows: 767830000 Data size: 30653263662 Basic stats: COMPLETE Column stats: NONE
tag: -1
value expressions: _col2 (type: bigint)
auto parallelism: false
Path -> Alias:
hdfs://nameservice1/user/hive/warehouse/test.db/ods_fact_sale [t]
Path -> Partition:
hdfs://nameservice1/user/hive/warehouse/test.db/ods_fact_sale
Partition
base file name: ods_fact_sale
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","prod_name":"true","sale_date":"true","sale_nums":"true"}}
bucket_count -1
columns id,sale_date,prod_name,sale_nums
columns.comments
columns.types bigint:string:string:int
comment Imported by sqoop on 2020/11/25 19:14:01
field.delim ?
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
line.delim
location hdfs://nameservice1/user/hive/warehouse/test.db/ods_fact_sale
name test.ods_fact_sale
numFiles 4
numRows 767830000
rawDataSize 30653263662
serialization.ddl struct ods_fact_sale { i64 id, string sale_date, string prod_name, i32 sale_nums}
serialization.format ?
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 31421093662
transient_lastDdlTime 1608796507
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat //輸入格式 Sequence文件格式
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat //輸出格式 Sequence文件格式
properties:
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","prod_name":"true","sale_date":"true","sale_nums":"true"}}
bucket_count -1
columns id,sale_date,prod_name,sale_nums
columns.comments
columns.types bigint:string:string:int
comment Imported by sqoop on 2020/11/25 19:14:01
field.delim ?
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
line.delim
location hdfs://nameservice1/user/hive/warehouse/test.db/ods_fact_sale
name test.ods_fact_sale
numFiles 4
numRows 767830000
rawDataSize 30653263662
serialization.ddl struct ods_fact_sale { i64 id, string sale_date, string prod_name, i32 sale_nums}
serialization.format ?
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 31421093662
transient_lastDdlTime 1608796507
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: test.ods_fact_sale
name: test.ods_fact_sale
Truncated Path -> Alias:
/test.db/ods_fact_sale [t]
Reducer 2
Execution mode: vectorized
Needs Tagging: false
Reduce Operator Tree: //發(fā)生在job的 reduce 處理階段過程
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 383915000 Data size: 15326631831 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
GlobalTableId: 0
directory: hdfs://nameservice1/tmp/hive/root/2c65817a-8195-495c-9218-5435913ad279/hive_2020-12-30_14-14-11_934_7035611081945658845-1/-mr-10000/.hive-staging_hive_2020-12-30_14-14-11_934_7035611081945658845-1/-ext-10001
NumFilesPerFileSink: 1
Statistics: Num rows: 383915000 Data size: 15326631831 Basic stats: COMPLETE Column stats: NONE
Stats Publishing Key Prefix: hdfs://nameservice1/tmp/hive/root/2c65817a-8195-495c-9218-5435913ad279/hive_2020-12-30_14-14-11_934_7035611081945658845-1/-mr-10000/.hive-staging_hive_2020-12-30_14-14-11_934_7035611081945658845-1/-ext-10001/
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 式
properties:
columns _col0,_col1,_col2
columns.types string:string:bigint
escape.delim \
hive.serialization.extend.additional.nesting.levels true
serialization.escape.crlf true
serialization.format 1
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
Stage: Stage-0
Fetch Operator
limit: -1 //--job中沒有Limit
Processor Tree:
ListSink
Time taken: 0.731 seconds, Fetched: 136 row(s)
hive>