元數(shù)據(jù)管理是數(shù)據(jù)倉(cāng)庫(kù)的核心智哀,它不僅定義了數(shù)據(jù)倉(cāng)庫(kù)有什么嘴高,還指明了數(shù)據(jù)倉(cāng)庫(kù)中數(shù)據(jù)的內(nèi)容和位置捐晶,刻畫(huà)了數(shù)據(jù)的提取和轉(zhuǎn)換規(guī)則,存儲(chǔ)了與數(shù)據(jù)倉(cāng)庫(kù)主題有關(guān)的各種商業(yè)信息惹盼。本文主要介紹Hive Hook和MetaStore Listener庸汗,使用這些功能可以進(jìn)行自動(dòng)的元數(shù)據(jù)管理。通過(guò)本文你可以了解到:
- 元數(shù)據(jù)管理
- Hive Hooks 和 Metastore Listeners
- Hive Hooks基本使用
- Metastore Listeners基本使用
元數(shù)據(jù)管理
元數(shù)據(jù)定義
按照傳統(tǒng)的定義手报,元數(shù)據(jù)( Metadata )是關(guān)于數(shù)據(jù)的數(shù)據(jù)蚯舱。元數(shù)據(jù)打通了源數(shù)據(jù)、數(shù)據(jù)倉(cāng)庫(kù)掩蛤、數(shù)據(jù)應(yīng)用枉昏,記錄了數(shù)據(jù)從產(chǎn)生到消費(fèi)的全過(guò)程。元數(shù)據(jù)主要記錄數(shù)據(jù)倉(cāng)庫(kù)中模型的定義揍鸟、各層級(jí)間的映射關(guān)系兄裂、監(jiān)控?cái)?shù)據(jù)倉(cāng)庫(kù)的數(shù)據(jù)狀態(tài)及ETL 的任務(wù)運(yùn)行狀態(tài)。在數(shù)據(jù)倉(cāng)庫(kù)系統(tǒng)中阳藻,元數(shù)據(jù)可以幫助數(shù)據(jù)倉(cāng)庫(kù)管理員和開(kāi)發(fā)人員非常方便地找到他們所關(guān)心的數(shù)據(jù)晰奖,用于指導(dǎo)其進(jìn)行數(shù)據(jù)管理和開(kāi)發(fā)工作,提高工作效率腥泥。將元數(shù)據(jù)按用途的不同分為兩類:技術(shù)元數(shù)據(jù)( Technical Metadata)和業(yè)務(wù)元數(shù)據(jù)( Business Metadata )匾南。技術(shù)元數(shù)據(jù)是存儲(chǔ)關(guān)于數(shù)據(jù)倉(cāng)庫(kù)系統(tǒng)技術(shù)細(xì)節(jié)的數(shù)據(jù),是用于開(kāi)發(fā)和管理數(shù)據(jù)倉(cāng)庫(kù)使用的數(shù)據(jù)蛔外。
元數(shù)據(jù)分類
技術(shù)元數(shù)據(jù)
- 分布式計(jì)算系統(tǒng)存儲(chǔ)元數(shù)據(jù)
如Hive表蛆楞、列、分區(qū)等信息冒萄。記錄了表的表名臊岸。分區(qū)信息、責(zé)任人信息尊流、文件大小帅戒、表類型,以及列的字段名、字段類型逻住、字段備注钟哥、是否是分區(qū)字段等信息。
-
分布式計(jì)算系統(tǒng)運(yùn)行元數(shù)據(jù)
類似于Hive 的Job 日志瞎访,包括作業(yè)類型腻贰、實(shí)例名稱、輸入輸出扒秸、SQL 播演、運(yùn)行參數(shù)、執(zhí)行時(shí)間等伴奥。
-
任務(wù)調(diào)度元數(shù)據(jù)
任務(wù)的依賴類型写烤、依賴關(guān)系等,以及不同類型調(diào)度任務(wù)的運(yùn)行日志等拾徙。
業(yè)務(wù)元數(shù)據(jù)
業(yè)務(wù)元數(shù)據(jù)從業(yè)務(wù)角度描述了數(shù)據(jù)倉(cāng)庫(kù)中的數(shù)據(jù)洲炊,它提供了介于使用者和實(shí)際系統(tǒng)之間的語(yǔ)義層,使得不懂計(jì)算機(jī)技術(shù)的業(yè)務(wù)人員也能夠“ 讀懂”數(shù)據(jù)倉(cāng)庫(kù)中的數(shù)據(jù)尼啡。常見(jiàn)的業(yè)務(wù)元數(shù)據(jù)有:如維度及屬性暂衡、業(yè)務(wù)過(guò)程、指標(biāo)等的規(guī)范化定義崖瞭,用于更好地管理和使用數(shù)據(jù)狂巢;數(shù)據(jù)應(yīng)用元數(shù)據(jù),如數(shù)據(jù)報(bào)表读恃、數(shù)據(jù)產(chǎn)品等的配置和運(yùn)行元數(shù)據(jù)隧膘。
元數(shù)據(jù)應(yīng)用
數(shù)據(jù)的真正價(jià)值在于數(shù)據(jù)驅(qū)動(dòng)決策代态,通過(guò)數(shù)據(jù)指導(dǎo)運(yùn)營(yíng)寺惫。通過(guò)數(shù)據(jù)驅(qū)動(dòng)的方法,我們能夠判斷趨勢(shì)蹦疑,從而展開(kāi)有效行動(dòng)西雀,幫助自己發(fā)現(xiàn)問(wèn)題,推動(dòng)創(chuàng)新或解決方案的產(chǎn)生歉摧。這就是數(shù)據(jù)化運(yùn)營(yíng)艇肴。同樣,對(duì)于元數(shù)據(jù)叁温,可以用于指導(dǎo)數(shù)據(jù)相關(guān)人員進(jìn)行日常工作再悼,實(shí)現(xiàn)數(shù)據(jù)化“運(yùn)營(yíng)”。比如對(duì)于數(shù)據(jù)使用者膝但,可以通過(guò)元數(shù)據(jù)讓其快速找到所需要的數(shù)據(jù)冲九;對(duì)于ETL 工程師,可以通過(guò)元數(shù)據(jù)指導(dǎo)其進(jìn)行模型設(shè)計(jì)跟束、任務(wù)優(yōu)化和任務(wù)下線等各種日常ETL 工作莺奸;對(duì)于運(yùn)維工程師丑孩,可以通過(guò)元數(shù)據(jù)指導(dǎo)其進(jìn)行整個(gè)集群的存儲(chǔ)、計(jì)算和系統(tǒng)優(yōu)化等運(yùn)維工作灭贷。
Hive Hooks 和 Metastore Listeners
Hive Hooks
關(guān)于數(shù)據(jù)治理和元數(shù)據(jù)管理框架温学,業(yè)界有許多開(kāi)源的系統(tǒng),比如Apache Atlas甚疟,這些開(kāi)源的軟件可以在復(fù)雜的場(chǎng)景下滿足元數(shù)據(jù)管理的需求仗岖。其實(shí)Apache Atlas對(duì)于Hive的元數(shù)據(jù)管理,使用的是Hive的Hooks览妖。需要進(jìn)行如下配置:
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook<value/>
</property>
通過(guò)Hook監(jiān)聽(tīng)Hive的各種事件箩帚,比如創(chuàng)建表,修改表等黄痪,然后按照特定的格式把收集的數(shù)據(jù)推送到Kafka紧帕,最后消費(fèi)元數(shù)據(jù)并存儲(chǔ)。
Hive Hooks分類
那么桅打,究竟什么是Hooks呢是嗜?
Hooks 是一種事件和消息機(jī)制, 可以將事件綁定在內(nèi)部 Hive 的執(zhí)行流程中挺尾,而無(wú)需重新編譯 Hive鹅搪。Hook 提供了擴(kuò)展和繼承外部組件的方式。根據(jù)不同的 Hook 類型遭铺,可以在不同的階段運(yùn)行丽柿。關(guān)于Hooks的類型,主要分為以下幾種:
- hive.exec.pre.hooks
從名稱可以看出魂挂,在執(zhí)行引擎執(zhí)行查詢之前被調(diào)用甫题。這個(gè)需要在 Hive 對(duì)查詢計(jì)劃進(jìn)行過(guò)優(yōu)化之后才可以使用。使用該Hooks需要實(shí)現(xiàn)接口:org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext涂召,具體在hive-site.xml中的配置如下:
<property>
<name>hive.exec.pre.hooks</name>
<value>實(shí)現(xiàn)類的全限定名<value/>
</property>
- hive.exec.post.hooks
在執(zhí)行計(jì)劃執(zhí)行結(jié)束結(jié)果返回給用戶之前被調(diào)用坠非。使用時(shí)需要實(shí)現(xiàn)接口:org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext,具體在hive-site.xml中的配置如下:
<property>
<name>hive.exec.post.hooks</name>
<value>實(shí)現(xiàn)類的全限定名<value/>
</property>
- hive.exec.failure.hooks
在執(zhí)行計(jì)劃失敗之后被調(diào)用果正。使用時(shí)需要實(shí)現(xiàn)接口:org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext,具體在hive-site.xml中的配置如下:
<property>
<name>hive.exec.failure.hooks</name>
<value>實(shí)現(xiàn)類的全限定名<value/>
</property>
- hive.metastore.init.hooks
HMSHandler初始化是被調(diào)用炎码。使用時(shí)需要實(shí)現(xiàn)接口:org.apache.hadoop.hive.metastore.MetaStoreInitListener,具體在hive-site.xml中的配置如下:
<property>
<name>hive.metastore.init.hooks</name>
<value>實(shí)現(xiàn)類的全限定名<value/>
</property>
- hive.exec.driver.run.hooks
在Driver.run開(kāi)始或結(jié)束時(shí)運(yùn)行秋泳,使用時(shí)需要實(shí)現(xiàn)接口:org.apache.hadoop.hive.ql.HiveDriverRunHook潦闲,具體在hive-site.xml中的配置如下:
<property>
<name>hive.exec.driver.run.hooks</name>
<value>實(shí)現(xiàn)類的全限定名<value/>
</property>
- hive.semantic.analyzer.hook
Hive 對(duì)查詢語(yǔ)句進(jìn)行語(yǔ)義分析的時(shí)候調(diào)用。使用時(shí)需要集成抽象類:org.apache.hadoop.hive.ql.parse.AbstractSemanticAnalyzerHook迫皱,具體在hive-site.xml中的配置如下:
<property>
<name>hive.semantic.analyzer.hook</name>
<value>實(shí)現(xiàn)類的全限定名<value/>
</property>
Hive Hooks的優(yōu)缺點(diǎn)
- 優(yōu)點(diǎn)
- 可以很方便地在各種查詢階段嵌入或者運(yùn)行自定義的代碼
- 可以被用作更新元數(shù)據(jù)
- 缺點(diǎn)
- 當(dāng)使用Hooks時(shí)歉闰,獲取到的元數(shù)據(jù)通常需要進(jìn)一步解析,否則很難理解
- 會(huì)影響查詢的過(guò)程
對(duì)于Hive Hooks,本文將給出hive.exec.post.hook的使用案例新娜,該Hooks會(huì)在查詢執(zhí)行之后赵辕,返回結(jié)果之前運(yùn)行。
Metastore Listeners
所謂Metastore Listeners概龄,指的是對(duì)Hive metastore的監(jiān)聽(tīng)还惠。用戶可以自定義一些代碼,用來(lái)使用對(duì)元數(shù)據(jù)的監(jiān)聽(tīng)私杜。
當(dāng)我們看HiveMetaStore這個(gè)類的源碼時(shí)蚕键,會(huì)發(fā)現(xiàn):在創(chuàng)建HiveMetaStore的init()方法中,同時(shí)創(chuàng)建了三種Listener,分別為MetaStorePreEventListener衰粹,MetaStoreEventListener和MetaStoreEndFunctionListener锣光,這些Listener用于對(duì)每一步事件的監(jiān)聽(tīng)。
public class HiveMetaStore extends ThriftHiveMetastore {
// ...省略代碼
public static class HMSHandler extends FacebookBase implements
IHMSHandler {
// ...省略代碼
public void init() throws MetaException {
// ...省略代碼
// 獲取MetaStorePreEventListener
preListeners = MetaStoreUtils.getMetaStoreListeners(MetaStorePreEventListener.class,
hiveConf,
hiveConf.getVar(HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS));
// 獲取MetaStoreEventListener
listeners = MetaStoreUtils.getMetaStoreListeners(MetaStoreEventListener.class,
hiveConf,
hiveConf.getVar(HiveConf.ConfVars.METASTORE_EVENT_LISTENERS));
listeners.add(new SessionPropertiesListener(hiveConf));
// 獲取MetaStoreEndFunctionListener
endFunctionListeners = MetaStoreUtils.getMetaStoreListeners(
MetaStoreEndFunctionListener.class,
hiveConf,
hiveConf.getVar(HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS));
// ...省略代碼
}
}
}
Metastore Listeners分類
- hive.metastore.pre.event.listeners
需要擴(kuò)展此抽象類铝耻,以提供在metastore上發(fā)生特定事件之前需要執(zhí)行的操作實(shí)現(xiàn)誊爹。在metastore上發(fā)生事件之前,將調(diào)用這些方法瓢捉。
使用時(shí)需要繼承抽象類:org.apache.hadoop.hive.metastore.MetaStorePreEventListener频丘,在Hive-site.xml中的配置為:
<property>
<name>hive.metastore.pre.event.listeners</name>
<value>實(shí)現(xiàn)類的全限定名</value>
</property>
- hive.metastore.event.listeners
需要擴(kuò)展此抽象類,以提供在metastore上發(fā)生特定事件時(shí)需要執(zhí)行的操作實(shí)現(xiàn)泡态。每當(dāng)Metastore上發(fā)生事件時(shí)搂漠,就會(huì)調(diào)用這些方法。
使用時(shí)需要繼承抽象類:org.apache.hadoop.hive.metastore.MetaStoreEventListener某弦,在Hive-site.xml中的配置為:
<property>
<name>hive.metastore.event.listeners</name>
<value>實(shí)現(xiàn)類的全限定名</value>
</property>
- hive.metastore.end.function.listeners
每當(dāng)函數(shù)結(jié)束時(shí)桐汤,將調(diào)用這些方法。
使用時(shí)需要繼承抽象類:org.apache.hadoop.hive.metastore.MetaStoreEndFunctionListener 靶壮,在Hive-site.xml中的配置為:
<property>
<name>hive.metastore.end.function.listeners</name>
<value>實(shí)現(xiàn)類的全限定名</value>
</property>
Metastore Listeners優(yōu)缺點(diǎn)
- 優(yōu)點(diǎn)
- 元數(shù)據(jù)已經(jīng)被解析好了怔毛,很容易理解
- 不影響查詢的過(guò)程,是只讀的
- 缺點(diǎn)
- 不靈活亮钦,僅僅能夠訪問(wèn)屬于當(dāng)前事件的對(duì)象
對(duì)于metastore listener馆截,本文會(huì)給出MetaStoreEventListener的使用案例,具體會(huì)實(shí)現(xiàn)兩個(gè)方法:onCreateTable和onAlterTable
Hive Hooks基本使用
代碼
具體實(shí)現(xiàn)代碼如下:
public class CustomPostHook implements ExecuteWithHookContext {
private static final Logger LOGGER = LoggerFactory.getLogger(CustomPostHook.class);
// 存儲(chǔ)Hive的SQL操作類型
private static final HashSet<String> OPERATION_NAMES = new HashSet<>();
// HiveOperation是一個(gè)枚舉類蜂莉,封裝了Hive的SQL操作類型
// 監(jiān)控SQL操作類型
static {
// 建表
OPERATION_NAMES.add(HiveOperation.CREATETABLE.getOperationName());
// 修改數(shù)據(jù)庫(kù)屬性
OPERATION_NAMES.add(HiveOperation.ALTERDATABASE.getOperationName());
// 修改數(shù)據(jù)庫(kù)屬主
OPERATION_NAMES.add(HiveOperation.ALTERDATABASE_OWNER.getOperationName());
// 修改表屬性,添加列
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_ADDCOLS.getOperationName());
// 修改表屬性,表存儲(chǔ)路徑
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_LOCATION.getOperationName());
// 修改表屬性
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_PROPERTIES.getOperationName());
// 表重命名
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAME.getOperationName());
// 列重命名
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAMECOL.getOperationName());
// 更新列,先刪除當(dāng)前的列,然后加入新的列
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_REPLACECOLS.getOperationName());
// 創(chuàng)建數(shù)據(jù)庫(kù)
OPERATION_NAMES.add(HiveOperation.CREATEDATABASE.getOperationName());
// 刪除數(shù)據(jù)庫(kù)
OPERATION_NAMES.add(HiveOperation.DROPDATABASE.getOperationName());
// 刪除表
OPERATION_NAMES.add(HiveOperation.DROPTABLE.getOperationName());
}
@Override
public void run(HookContext hookContext) throws Exception {
assert (hookContext.getHookType() == HookType.POST_EXEC_HOOK);
// 執(zhí)行計(jì)劃
QueryPlan plan = hookContext.getQueryPlan();
// 操作名稱
String operationName = plan.getOperationName();
logWithHeader("執(zhí)行的SQL語(yǔ)句: " + plan.getQueryString());
logWithHeader("操作名稱: " + operationName);
if (OPERATION_NAMES.contains(operationName) && !plan.isExplain()) {
logWithHeader("監(jiān)控SQL操作");
Set<ReadEntity> inputs = hookContext.getInputs();
Set<WriteEntity> outputs = hookContext.getOutputs();
for (Entity entity : inputs) {
logWithHeader("Hook metadata輸入值: " + toJson(entity));
}
for (Entity entity : outputs) {
logWithHeader("Hook metadata輸出值: " + toJson(entity));
}
} else {
logWithHeader("不在監(jiān)控范圍,忽略該hook!");
}
}
private static String toJson(Entity entity) throws Exception {
ObjectMapper mapper = new ObjectMapper();
// entity的類型
// 主要包括:
// DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
switch (entity.getType()) {
case DATABASE:
Database db = entity.getDatabase();
return mapper.writeValueAsString(db);
case TABLE:
return mapper.writeValueAsString(entity.getTable().getTTable());
}
return null;
}
/**
* 日志格式
*
* @param obj
*/
private void logWithHeader(Object obj) {
LOGGER.info("[CustomPostHook][Thread: " + Thread.currentThread().getName() + "] | " + obj);
}
}
使用過(guò)程解釋
首先將上述代碼編譯成jar包混卵,放在$HIVE_HOME/lib目錄下映穗,或者使用在Hive的客戶端中執(zhí)行添加jar包的命令:
0: jdbc:hive2://localhost:10000> add jar /opt/softwares/com.jmx.hive-1.0-SNAPSHOT.jar;
接著配置Hive-site.xml文件,為了方便幕随,我們直接使用客戶端命令進(jìn)行配置:
0: jdbc:hive2://localhost:10000> set hive.exec.post.hooks=com.jmx.hooks.CustomPostHook;
查看表操作
上面的代碼中我們對(duì)一些操作進(jìn)行了監(jiān)控蚁滋,當(dāng)監(jiān)控到這些操作時(shí)會(huì)觸發(fā)一些自定義的代碼(比如輸出日志)。當(dāng)我們?cè)贖ive的beeline客戶端中輸入下面命令時(shí):
0: jdbc:hive2://localhost:10000> show tables;
在$HIVE_HOME/logs/hive.log文件可以看到:
[CustomPostHook][Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | 執(zhí)行的SQL語(yǔ)句: show tables
[CustomPostHook][Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | 操作名稱: SHOWTABLES
[CustomPostHook][Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] |不在監(jiān)控范圍,忽略該hook!
上面的查看表操作辕录,不在監(jiān)控范圍睦霎,所以沒(méi)有相對(duì)應(yīng)的元數(shù)據(jù)日志。
建表操作
當(dāng)我們?cè)贖ive的beeline客戶端中創(chuàng)建一張表時(shí)走诞,如下:
CREATE TABLE testposthook(
id int COMMENT "id",
name string COMMENT "姓名"
)COMMENT "建表_測(cè)試Hive Hooks"
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/';
觀察hive.log日志:
上面的Hook metastore輸出值有兩個(gè):第一個(gè)是數(shù)據(jù)庫(kù)的元數(shù)據(jù)信息副女,第二個(gè)是表的元數(shù)據(jù)信息
- 數(shù)據(jù)庫(kù)元數(shù)據(jù)
{
"name":"default",
"description":"Default Hive database",
"locationUri":"hdfs://kms-1.apache.com:8020/user/hive/warehouse",
"parameters":{
},
"privileges":null,
"ownerName":"public",
"ownerType":"ROLE",
"setParameters":true,
"parametersSize":0,
"setOwnerName":true,
"setOwnerType":true,
"setPrivileges":false,
"setName":true,
"setDescription":true,
"setLocationUri":true
}
- 表元數(shù)據(jù)
{
"tableName":"testposthook",
"dbName":"default",
"owner":"anonymous",
"createTime":1597985444,
"lastAccessTime":0,
"retention":0,
"sd":{
"cols":[
],
"location":null,
"inputFormat":"org.apache.hadoop.mapred.SequenceFileInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat",
"compressed":false,
"numBuckets":-1,
"serdeInfo":{
"name":null,
"serializationLib":"org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe",
"parameters":{
"serialization.format":"1"
},
"setSerializationLib":true,
"setParameters":true,
"parametersSize":1,
"setName":false
},
"bucketCols":[
],
"sortCols":[
],
"parameters":{
},
"skewedInfo":{
"skewedColNames":[
],
"skewedColValues":[
],
"skewedColValueLocationMaps":{
},
"skewedColNamesIterator":[
],
"skewedColValuesSize":0,
"skewedColValuesIterator":[
],
"skewedColValueLocationMapsSize":0,
"setSkewedColNames":true,
"setSkewedColValues":true,
"setSkewedColValueLocationMaps":true,
"skewedColNamesSize":0
},
"storedAsSubDirectories":false,
"colsSize":0,
"setParameters":true,
"parametersSize":0,
"setOutputFormat":true,
"setSerdeInfo":true,
"setBucketCols":true,
"setSortCols":true,
"setSkewedInfo":true,
"colsIterator":[
],
"setCompressed":false,
"setNumBuckets":true,
"bucketColsSize":0,
"bucketColsIterator":[
],
"sortColsSize":0,
"sortColsIterator":[
],
"setStoredAsSubDirectories":false,
"setCols":true,
"setLocation":false,
"setInputFormat":true
},
"partitionKeys":[
],
"parameters":{
},
"viewOriginalText":null,
"viewExpandedText":null,
"tableType":"MANAGED_TABLE",
"privileges":null,
"temporary":false,
"rewriteEnabled":false,
"partitionKeysSize":0,
"setDbName":true,
"setSd":true,
"setParameters":true,
"setCreateTime":true,
"setLastAccessTime":false,
"parametersSize":0,
"setTableName":true,
"setPrivileges":false,
"setOwner":true,
"setPartitionKeys":true,
"setViewOriginalText":false,
"setViewExpandedText":false,
"setTableType":true,
"setRetention":false,
"partitionKeysIterator":[
],
"setTemporary":false,
"setRewriteEnabled":false
}
我們發(fā)現(xiàn)上面的表元數(shù)據(jù)信息中,cols[]列沒(méi)有數(shù)據(jù)蚣旱,即沒(méi)有建表時(shí)的字段id
和字段name
的信息碑幅。如果要獲取這些信息,可以執(zhí)行下面的命令:
ALTER TABLE testposthook
ADD COLUMNS (age int COMMENT '年齡');
再次觀察日志信息:
上面的日志中塞绿,Hook metastore只有一個(gè)輸入和一個(gè)輸出:都表示table的元數(shù)據(jù)信息沟涨。
- 輸入
{
"tableName":"testposthook",
"dbName":"default",
"owner":"anonymous",
"createTime":1597985445,
"lastAccessTime":0,
"retention":0,
"sd":{
"cols":[
{
"name":"id",
"type":"int",
"comment":"id",
"setName":true,
"setType":true,
"setComment":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setName":true,
"setType":true,
"setComment":true
}
],
"location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse",
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"serdeInfo":{
"name":null,
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":" ",
"field.delim":" "
},
"setSerializationLib":true,
"setParameters":true,
"parametersSize":2,
"setName":false
},
"bucketCols":[
],
"sortCols":[
],
"parameters":{
},
"skewedInfo":{
"skewedColNames":[
],
"skewedColValues":[
],
"skewedColValueLocationMaps":{
},
"skewedColNamesIterator":[
],
"skewedColValuesSize":0,
"skewedColValuesIterator":[
],
"skewedColValueLocationMapsSize":0,
"setSkewedColNames":true,
"setSkewedColValues":true,
"setSkewedColValueLocationMaps":true,
"skewedColNamesSize":0
},
"storedAsSubDirectories":false,
"colsSize":2,
"setParameters":true,
"parametersSize":0,
"setOutputFormat":true,
"setSerdeInfo":true,
"setBucketCols":true,
"setSortCols":true,
"setSkewedInfo":true,
"colsIterator":[
{
"name":"id",
"type":"int",
"comment":"id",
"setName":true,
"setType":true,
"setComment":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setName":true,
"setType":true,
"setComment":true
}
],
"setCompressed":true,
"setNumBuckets":true,
"bucketColsSize":0,
"bucketColsIterator":[
],
"sortColsSize":0,
"sortColsIterator":[
],
"setStoredAsSubDirectories":true,
"setCols":true,
"setLocation":true,
"setInputFormat":true
},
"partitionKeys":[
],
"parameters":{
"transient_lastDdlTime":"1597985445",
"comment":"建表_測(cè)試Hive Hooks",
"totalSize":"0",
"numFiles":"0"
},
"viewOriginalText":null,
"viewExpandedText":null,
"tableType":"MANAGED_TABLE",
"privileges":null,
"temporary":false,
"rewriteEnabled":false,
"partitionKeysSize":0,
"setDbName":true,
"setSd":true,
"setParameters":true,
"setCreateTime":true,
"setLastAccessTime":true,
"parametersSize":4,
"setTableName":true,
"setPrivileges":false,
"setOwner":true,
"setPartitionKeys":true,
"setViewOriginalText":false,
"setViewExpandedText":false,
"setTableType":true,
"setRetention":true,
"partitionKeysIterator":[
],
"setTemporary":false,
"setRewriteEnabled":true
}
從上面的json中可以看出"cols"列的字段元數(shù)據(jù)信息,我們?cè)賮?lái)看一下輸出json:
- 輸出
{
"tableName":"testposthook",
"dbName":"default",
"owner":"anonymous",
"createTime":1597985445,
"lastAccessTime":0,
"retention":0,
"sd":{
"cols":[
{
"name":"id",
"type":"int",
"comment":"id",
"setName":true,
"setType":true,
"setComment":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setName":true,
"setType":true,
"setComment":true
}
],
"location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse",
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"serdeInfo":{
"name":null,
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":" ",
"field.delim":" "
},
"setSerializationLib":true,
"setParameters":true,
"parametersSize":2,
"setName":false
},
"bucketCols":[
],
"sortCols":[
],
"parameters":{
},
"skewedInfo":{
"skewedColNames":[
],
"skewedColValues":[
],
"skewedColValueLocationMaps":{
},
"skewedColNamesIterator":[
],
"skewedColValuesSize":0,
"skewedColValuesIterator":[
],
"skewedColValueLocationMapsSize":0,
"setSkewedColNames":true,
"setSkewedColValues":true,
"setSkewedColValueLocationMaps":true,
"skewedColNamesSize":0
},
"storedAsSubDirectories":false,
"colsSize":2,
"setParameters":true,
"parametersSize":0,
"setOutputFormat":true,
"setSerdeInfo":true,
"setBucketCols":true,
"setSortCols":true,
"setSkewedInfo":true,
"colsIterator":[
{
"name":"id",
"type":"int",
"comment":"id",
"setName":true,
"setType":true,
"setComment":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setName":true,
"setType":true,
"setComment":true
}
],
"setCompressed":true,
"setNumBuckets":true,
"bucketColsSize":0,
"bucketColsIterator":[
],
"sortColsSize":0,
"sortColsIterator":[
],
"setStoredAsSubDirectories":true,
"setCols":true,
"setLocation":true,
"setInputFormat":true
},
"partitionKeys":[
],
"parameters":{
"transient_lastDdlTime":"1597985445",
"comment":"建表_測(cè)試Hive Hooks",
"totalSize":"0",
"numFiles":"0"
},
"viewOriginalText":null,
"viewExpandedText":null,
"tableType":"MANAGED_TABLE",
"privileges":null,
"temporary":false,
"rewriteEnabled":false,
"partitionKeysSize":0,
"setDbName":true,
"setSd":true,
"setParameters":true,
"setCreateTime":true,
"setLastAccessTime":true,
"parametersSize":4,
"setTableName":true,
"setPrivileges":false,
"setOwner":true,
"setPartitionKeys":true,
"setViewOriginalText":false,
"setViewExpandedText":false,
"setTableType":true,
"setRetention":true,
"partitionKeysIterator":[
],
"setTemporary":false,
"setRewriteEnabled":true
}
該
output
對(duì)象不包含新列age
异吻,它表示修改表之前的元數(shù)據(jù)信息
Metastore Listeners基本使用
代碼
具體實(shí)現(xiàn)代碼如下:
public class CustomListener extends MetaStoreEventListener {
private static final Logger LOGGER = LoggerFactory.getLogger(CustomListener.class);
private static final ObjectMapper objMapper = new ObjectMapper();
public CustomListener(Configuration config) {
super(config);
logWithHeader(" created ");
}
// 監(jiān)聽(tīng)建表操作
@Override
public void onCreateTable(CreateTableEvent event) {
logWithHeader(event.getTable());
}
// 監(jiān)聽(tīng)修改表操作
@Override
public void onAlterTable(AlterTableEvent event) {
logWithHeader(event.getOldTable());
logWithHeader(event.getNewTable());
}
private void logWithHeader(Object obj) {
LOGGER.info("[CustomListener][Thread: " + Thread.currentThread().getName() + "] | " + objToStr(obj));
}
private String objToStr(Object obj) {
try {
return objMapper.writeValueAsString(obj);
} catch (IOException e) {
LOGGER.error("Error on conversion", e);
}
return null;
}
}
使用過(guò)程解釋
使用方式與Hooks有一點(diǎn)不同裹赴,Hive Hook是與Hiveserver進(jìn)行交互的,而Listener是與Metastore交互的诀浪,即Listener運(yùn)行在Metastore進(jìn)程中的篮昧。具體使用方式如下:
首先將jar包放在$HIVE_HOME/lib目錄下,然后配置hive-site.xml文件笋妥,配置內(nèi)容為:
<property>
<name>hive.metastore.event.listeners</name>
<value>com.jmx.hooks.CustomListener</value>
<description/>
</property>
配置完成之后懊昨,需要重新啟動(dòng)元數(shù)據(jù)服務(wù):
bin/hive --service metastore &
建表操作
CREATE TABLE testlistener(
id int COMMENT "id",
name string COMMENT "姓名"
)COMMENT "建表_測(cè)試Hive Listener"
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/';
觀察hive.log日志:
{
"tableName":"testlistener",
"dbName":"default",
"owner":"anonymous",
"createTime":1597989316,
"lastAccessTime":0,
"retention":0,
"sd":{
"cols":[
{
"name":"id",
"type":"int",
"comment":"id",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setComment":true,
"setType":true,
"setName":true
}
],
"location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse",
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"serdeInfo":{
"name":null,
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":" ",
"field.delim":" "
},
"setSerializationLib":true,
"setParameters":true,
"parametersSize":2,
"setName":false
},
"bucketCols":[
],
"sortCols":[
],
"parameters":{
},
"skewedInfo":{
"skewedColNames":[
],
"skewedColValues":[
],
"skewedColValueLocationMaps":{
},
"setSkewedColNames":true,
"setSkewedColValues":true,
"setSkewedColValueLocationMaps":true,
"skewedColNamesSize":0,
"skewedColNamesIterator":[
],
"skewedColValuesSize":0,
"skewedColValuesIterator":[
],
"skewedColValueLocationMapsSize":0
},
"storedAsSubDirectories":false,
"setCols":true,
"setOutputFormat":true,
"setSerdeInfo":true,
"setBucketCols":true,
"setSortCols":true,
"colsSize":2,
"colsIterator":[
{
"name":"id",
"type":"int",
"comment":"id",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setComment":true,
"setType":true,
"setName":true
}
],
"setCompressed":true,
"setNumBuckets":true,
"bucketColsSize":0,
"bucketColsIterator":[
],
"sortColsSize":0,
"sortColsIterator":[
],
"setStoredAsSubDirectories":true,
"setParameters":true,
"setLocation":true,
"setInputFormat":true,
"parametersSize":0,
"setSkewedInfo":true
},
"partitionKeys":[
],
"parameters":{
"transient_lastDdlTime":"1597989316",
"comment":"建表_測(cè)試Hive Listener",
"totalSize":"0",
"numFiles":"0"
},
"viewOriginalText":null,
"viewExpandedText":null,
"tableType":"MANAGED_TABLE",
"privileges":{
"userPrivileges":{
"anonymous":[
{
"privilege":"INSERT",
"createTime":-1,
"grantor":"anonymous",
"grantorType":"USER",
"grantOption":true,
"setGrantOption":true,
"setCreateTime":true,
"setGrantor":true,
"setGrantorType":true,
"setPrivilege":true
},
{
"privilege":"SELECT",
"createTime":-1,
"grantor":"anonymous",
"grantorType":"USER",
"grantOption":true,
"setGrantOption":true,
"setCreateTime":true,
"setGrantor":true,
"setGrantorType":true,
"setPrivilege":true
},
{
"privilege":"UPDATE",
"createTime":-1,
"grantor":"anonymous",
"grantorType":"USER",
"grantOption":true,
"setGrantOption":true,
"setCreateTime":true,
"setGrantor":true,
"setGrantorType":true,
"setPrivilege":true
},
{
"privilege":"DELETE",
"createTime":-1,
"grantor":"anonymous",
"grantorType":"USER",
"grantOption":true,
"setGrantOption":true,
"setCreateTime":true,
"setGrantor":true,
"setGrantorType":true,
"setPrivilege":true
}
]
},
"groupPrivileges":null,
"rolePrivileges":null,
"setUserPrivileges":true,
"setGroupPrivileges":false,
"setRolePrivileges":false,
"userPrivilegesSize":1,
"groupPrivilegesSize":0,
"rolePrivilegesSize":0
},
"temporary":false,
"rewriteEnabled":false,
"setParameters":true,
"setPartitionKeys":true,
"partitionKeysSize":0,
"setSd":true,
"setLastAccessTime":true,
"setRetention":true,
"partitionKeysIterator":[
],
"parametersSize":4,
"setTemporary":true,
"setRewriteEnabled":false,
"setTableName":true,
"setDbName":true,
"setOwner":true,
"setViewOriginalText":false,
"setViewExpandedText":false,
"setTableType":true,
"setPrivileges":true,
"setCreateTime":true
}
當(dāng)我們?cè)賵?zhí)行修改表操作時(shí)
ALTER TABLE testlistener
ADD COLUMNS (age int COMMENT '年齡');
再次觀察日志:
可以看出上面有兩條記錄,第一條記錄是old table的信息春宣,第二條是修改之后的表的信息酵颁。
- old table
{
"tableName":"testlistener",
"dbName":"default",
"owner":"anonymous",
"createTime":1597989316,
"lastAccessTime":0,
"retention":0,
"sd":{
"cols":[
{
"name":"id",
"type":"int",
"comment":"id",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setComment":true,
"setType":true,
"setName":true
}
],
"location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse",
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"serdeInfo":{
"name":null,
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":" ",
"field.delim":" "
},
"setSerializationLib":true,
"setParameters":true,
"parametersSize":2,
"setName":false
},
"bucketCols":[
],
"sortCols":[
],
"parameters":{
},
"skewedInfo":{
"skewedColNames":[
],
"skewedColValues":[
],
"skewedColValueLocationMaps":{
},
"setSkewedColNames":true,
"setSkewedColValues":true,
"setSkewedColValueLocationMaps":true,
"skewedColNamesSize":0,
"skewedColNamesIterator":[
],
"skewedColValuesSize":0,
"skewedColValuesIterator":[
],
"skewedColValueLocationMapsSize":0
},
"storedAsSubDirectories":false,
"setCols":true,
"setOutputFormat":true,
"setSerdeInfo":true,
"setBucketCols":true,
"setSortCols":true,
"colsSize":2,
"colsIterator":[
{
"name":"id",
"type":"int",
"comment":"id",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setComment":true,
"setType":true,
"setName":true
}
],
"setCompressed":true,
"setNumBuckets":true,
"bucketColsSize":0,
"bucketColsIterator":[
],
"sortColsSize":0,
"sortColsIterator":[
],
"setStoredAsSubDirectories":true,
"setParameters":true,
"setLocation":true,
"setInputFormat":true,
"parametersSize":0,
"setSkewedInfo":true
},
"partitionKeys":[
],
"parameters":{
"totalSize":"0",
"numFiles":"0",
"transient_lastDdlTime":"1597989316",
"comment":"建表_測(cè)試Hive Listener"
},
"viewOriginalText":null,
"viewExpandedText":null,
"tableType":"MANAGED_TABLE",
"privileges":null,
"temporary":false,
"rewriteEnabled":false,
"setParameters":true,
"setPartitionKeys":true,
"partitionKeysSize":0,
"setSd":true,
"setLastAccessTime":true,
"setRetention":true,
"partitionKeysIterator":[
],
"parametersSize":4,
"setTemporary":false,
"setRewriteEnabled":true,
"setTableName":true,
"setDbName":true,
"setOwner":true,
"setViewOriginalText":false,
"setViewExpandedText":false,
"setTableType":true,
"setPrivileges":false,
"setCreateTime":true
}
- new table
{
"tableName":"testlistener",
"dbName":"default",
"owner":"anonymous",
"createTime":1597989316,
"lastAccessTime":0,
"retention":0,
"sd":{
"cols":[
{
"name":"id",
"type":"int",
"comment":"id",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"age",
"type":"int",
"comment":"年齡",
"setComment":true,
"setType":true,
"setName":true
}
],
"location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse",
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"serdeInfo":{
"name":null,
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":" ",
"field.delim":" "
},
"setSerializationLib":true,
"setParameters":true,
"parametersSize":2,
"setName":false
},
"bucketCols":[
],
"sortCols":[
],
"parameters":{
},
"skewedInfo":{
"skewedColNames":[
],
"skewedColValues":[
],
"skewedColValueLocationMaps":{
},
"setSkewedColNames":true,
"setSkewedColValues":true,
"setSkewedColValueLocationMaps":true,
"skewedColNamesSize":0,
"skewedColNamesIterator":[
],
"skewedColValuesSize":0,
"skewedColValuesIterator":[
],
"skewedColValueLocationMapsSize":0
},
"storedAsSubDirectories":false,
"setCols":true,
"setOutputFormat":true,
"setSerdeInfo":true,
"setBucketCols":true,
"setSortCols":true,
"colsSize":3,
"colsIterator":[
{
"name":"id",
"type":"int",
"comment":"id",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"name",
"type":"string",
"comment":"姓名",
"setComment":true,
"setType":true,
"setName":true
},
{
"name":"age",
"type":"int",
"comment":"年齡",
"setComment":true,
"setType":true,
"setName":true
}
],
"setCompressed":true,
"setNumBuckets":true,
"bucketColsSize":0,
"bucketColsIterator":[
],
"sortColsSize":0,
"sortColsIterator":[
],
"setStoredAsSubDirectories":true,
"setParameters":true,
"setLocation":true,
"setInputFormat":true,
"parametersSize":0,
"setSkewedInfo":true
},
"partitionKeys":[
],
"parameters":{
"totalSize":"0",
"last_modified_time":"1597989660",
"numFiles":"0",
"transient_lastDdlTime":"1597989660",
"comment":"建表_測(cè)試Hive Listener",
"last_modified_by":"anonymous"
},
"viewOriginalText":null,
"viewExpandedText":null,
"tableType":"MANAGED_TABLE",
"privileges":null,
"temporary":false,
"rewriteEnabled":false,
"setParameters":true,
"setPartitionKeys":true,
"partitionKeysSize":0,
"setSd":true,
"setLastAccessTime":true,
"setRetention":true,
"partitionKeysIterator":[
],
"parametersSize":6,
"setTemporary":false,
"setRewriteEnabled":true,
"setTableName":true,
"setDbName":true,
"setOwner":true,
"setViewOriginalText":false,
"setViewExpandedText":false,
"setTableType":true,
"setPrivileges":false,
"setCreateTime":true
}
可以看出:修改之后的表的元數(shù)據(jù)信息中,包含新添加的列age
月帝。
總結(jié)
在本文中躏惋,我們介紹了如何在Hive中操作元數(shù)據(jù),從而能夠自動(dòng)進(jìn)行元數(shù)據(jù)管理嚷辅。我們給出了Hive Hooks和Metastore Listener的基本使用方式簿姨,這些方式可以幫助我們實(shí)現(xiàn)操作元數(shù)據(jù)。當(dāng)然也可以將這些元數(shù)據(jù)信息推送到Kafka中簸搞,以此構(gòu)建自己的元數(shù)據(jù)管理系統(tǒng)扁位。
公眾號(hào)『大數(shù)據(jù)技術(shù)與數(shù)倉(cāng)』,回復(fù)『資料』領(lǐng)取大數(shù)據(jù)資料包