hive從0.12版本以后就開始自身支持json文件的格式了
1.文件格式
下面是測試用的文件預(yù)覽格式,我將其存儲為jsonTest.json
文件,放在了/root目錄下
{"_location":"(32.121, 45.262)","_ip":"2.12.1.211","_action":"click button","_uid":"694666ee-8918-48f8-bb92-0a756a3f1f31","_timestamp":"1496999163163"}
{"_location":"(32.121, 45.262)","_ip":"7.12.1.211","_action":"click button","_uid":"868be529-f59e-4f16-8ea5-08c4612ede9a","_timestamp":"1496999164165"}
{"_location":"(32.121, 45.262)","_ip":"2.12.1.211","_action":"click button","_uid":"49b63380-2a5a-453d-aa5b-57e74cd1a8d7","_timestamp":"1496999165170"}
{"_location":"(32.121, 45.262)","_ip":"3.12.1.211","_action":"click button","_uid":"60fd093a-9f6a-4716-81c6-72a5f07d3b4f","_timestamp":"1496999166171"}
{"_location":"(32.121, 45.262)","_ip":"3.12.1.211","_action":"click button","_uid":"056cf558-56b1-4312-aba1-0b0eb71ba78e","_timestamp":"1496999167172"}
{"_location":"(32.121, 45.262)","_ip":"8.12.1.211","_action":"click button","_uid":"a2ecd34c-5458-43a2-b851-89eb6b3985ae","_timestamp":"1496999168174"}
注意: 一行之內(nèi)必須是完整的一條json文件,否則會解析出錯誤
2.創(chuàng)建hive表
CREATE TABLE IF NOT EXISTS jsonTest
(
`_location` STRING,
`_ip` STRING,
`_action` STRING,
`_uid` STRING,
`_timestamp` STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;
注意: ``符號是因為列明里面是下劃線開頭;并且列明需要與json文件中的key對應(yīng),否則也會解析不到數(shù)據(jù).
3.加載數(shù)據(jù)
LOAD DATA LOCAL INPATH "/root/jsonTest.json" OVERWRITE INTO TABLE jsonTest;
4.查詢數(shù)據(jù)進(jìn)行校驗
hive> select * from jsonTest_test;
OK
(32.121, 45.262) 2.12.1.211 click button 54490a2f-376a-43e3-9658-e9a3d6583ffd 1496999162153
(32.121, 45.262) 2.12.1.211 click button 694666ee-8918-48f8-bb92-0a756a3f1f31 1496999163163
(32.121, 45.262) 7.12.1.211 click button 868be529-f59e-4f16-8ea5-08c4612ede9a 1496999164165
(32.121, 45.262) 2.12.1.211 click button 49b63380-2a5a-453d-aa5b-57e74cd1a8d7 1496999165170
(32.121, 45.262) 3.12.1.211 click button 60fd093a-9f6a-4716-81c6-72a5f07d3b4f 1496999166171
(32.121, 45.262) 3.12.1.211 click button 056cf558-56b1-4312-aba1-0b0eb71ba78e 1496999167172
(32.121, 45.262) 8.12.1.211 click button a2ecd34c-5458-43a2-b851-89eb6b3985ae 1496999168174
Time taken: 0.439 seconds, Fetched: 7 row(s)
可以看到數(shù)據(jù)已經(jīng)成功解析出來
問題
1.出現(xiàn)下面錯誤
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe
需要將支持序列化的hive-hcatalog-core.jar
添加進(jìn)來,執(zhí)行如下命令即可解決該問題.(找到自己安裝的目錄即可,我的是在這個目錄里)
hive> add jar /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
參考資料
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-JSON