HIVE常用命令之ANALYZE TABLE命令簡述

上次講過HIVE 的一個常用命令 MSCK REPAIR TABLE ，這次講講HIVE的ANALYZE TABLE命令卵皂，接下來還會講下Impala的 COMPUTE STATS 命令。這幾個命令都是用來統(tǒng)計表的信息的，用于加速查詢。

HIVE的 ANALYZE TABLE 命令是做啥的

Statistics such as the number of rows of a table or partition and the histograms of a particular interesting column are important in many ways. One of the key use cases of statistics is query optimization. Statistics serve as the input to the cost functions of the optimizer so that it can compare different plans and choose among them. Statistics may sometimes meet the purpose of the users' queries. Users can quickly get the answers for some of their queries by only querying stored statistics rather than firing long-running execution plans. Some examples are getting the quantile of the users' age distribution, the top 10 apps that are used by people, and the number of distinct sessions.

其實主要就是為了優(yōu)化查詢，加快查詢的速度式撼。

HIVE的主要統(tǒng)計數(shù)據(jù)

1. 表和分區(qū)的統(tǒng)計數(shù)據(jù) (Table and Partition Statistics):

行數(shù) （Number of rows）
文件數(shù) （Number of files）
文件大小（Size in Bytes）

2. 列的統(tǒng)計數(shù)據(jù) (Column Statistics): 這是Hive 0.10.0后加入的

極大值等信息

3. Top K Statistics

語法

analyze table t [partition p] compute statistics for [columns c,...];

例子1 對hive進(jìn)行表和分區(qū)的分析

#執(zhí)行語句
ANALYZE TABLE dw_wy_srvcc_customized_drilldown_table_daily PARTITION(date_time='20190228') COMPUTE STATISTICS;
#查看結(jié)果
DESCRIBE extended dw_wy_srvcc_customized_drilldown_table_daily PARTITION(date_time='20190228');

image.png

結(jié)果如下 numRows=12552, totalSize=1564884, rawDataSize=1552332

Partition(values:[20190228], dbName:coordinator_test, tableName:dw_wy_srvcc_customized_drilldown_table_daily, createTime:1551344149, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:level, type:bigint, comment:null), FieldSchema(name:pi_name, type:string, comment:null), FieldSchema(name:dimension_type, type:string, comment:null), FieldSchema(name:demension_item, type:string, comment:null), FieldSchema(name:numerator_value, type:double, comment:null), FieldSchema(name:denominator_value, type:double, comment:null), FieldSchema(name:pi_value, type:double, comment:null), FieldSchema(name:degrad_weight, type:double, comment:null), FieldSchema(name:date_time, type:string, comment:null)], location:hdfs://dlbdn3:8020/user/hive/warehouse/coordinator_test.db/dw_wy_srvcc_customized_drilldown_table_daily/date_time=20190228, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{numFiles=20, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1551408396, numRows=12552, totalSize=1564884, rawDataSize=1552332})

例子2 對hive進(jìn)行列的分析

#執(zhí)行語句
analyze table dw_wy_srvcc_customized_drilldown_table_daily compute statistics for columns level;
#查看結(jié)果
describe formatted dw_wy_srvcc_customized_drilldown_table_daily level;