前面介紹了HIVE的ANALYZE TABLE命令, IMPALA也提供了一個類似的命令叫COMPUTE STATS。這篇文章就是講講這個命令。
IMPALA的COMPUTE STATS是做啥的
Gathers information about volume and distribution of data in a table and all associated columns and partitions. The information is stored in the metastore database, and used by Impala to help optimize queries. For example, if Impala can determine that a table is large or small, or has many or few distinct values it can organize parallelize the work appropriately for a join query or insert operation. For details about the kinds of information gathered by this statement, see Table and Column Statistics.
和HIVE的ANALYZE TABLE類似,這個命令主要也是為了優(yōu)化查詢凌摄,加快查詢的速度。本來IMPALA是依靠HIVE的ANALYZE TABLE的漓帅,但是這個命令不是很好用同時不穩(wěn)定锨亏,所以IMPALA自己實現(xiàn)了個命令完成相同功能。
語法
#全量
COMPUTE STATS [db_name.]table_name
#增量
COMPUTE INCREMENTAL STATS [db_name.]table_name [PARTITION (partition_spec)]
例子
SHOW PARTITIONS dw_wy_video_kqi_cell_hourly;
COMPUTE INCREMENTAL STATS dw_wy_video_kqi_cell_hourly PARTITION (date_time='2019022817');
SHOW PARTITIONS dw_wy_video_kqi_cell_hourly;
效果如下忙干,沒有用過COMPUTE INCREMENTAL STATS的分區(qū)是 -1
執(zhí)行COMPUTE STATS dw_wy_video_kqi_cell_hourly
語句之前的效果器予,可以看到有很多分區(qū)的數(shù)據(jù)并未統(tǒng)計
執(zhí)行COMPUTE STATS dw_wy_video_kqi_cell_hourly
后的效果