數(shù)據(jù)處理中坠陈,一遇到j(luò)son就頭大,很長(zhǎng)一段時(shí)間里捐康,明知lateral view函數(shù)是個(gè)好東西仇矾,但就是很抗拒去學(xué),都是找數(shù)倉(cāng)的同事先理好字段直接用解总,順便以菜雞的身份贮匕,同情和膜拜一下埋頭洗臟數(shù)的數(shù)倉(cāng)同學(xué),大佬辛苦花枫。刻盐。
前段時(shí)間加入到一個(gè)數(shù)據(jù)建設(shè)的項(xiàng)目中作為先鋒軍打頭陣,沒(méi)辦法遇到j(luò)son還是硬著頭皮終于學(xué)會(huì)了lateral view用法乌昔,感受:困難只是心中的一座大山隙疚!也不過(guò)如此!
我肯定我過(guò)不了幾天一定會(huì)忘掉(其實(shí)已經(jīng)忘掉一點(diǎn)了磕道。。)行冰,沒(méi)有好記性拿起爛筆頭溺蕉,記錄在這里吧伶丐。
基本語(yǔ)法:
select
*
from T t
lateral view json_tuple(t.json_txt,[],[],……) q as item1,item2,……
假設(shè)T表中有個(gè)json_txt字段取值格式如下:
{
"student_no":"0001",
"student_name":'zhangxiaoxiao',
"class":"高三(1)班",
"score_detail":{
"scoreList":[{"scores":[
{"course":"語(yǔ)文","score":100,"rank":2}
,{"course":"數(shù)學(xué)","score":120,"rank":9}
,{"course":"英語(yǔ)","score":110,"rank":6}
,{"course":"化學(xué)","score":90,"rank":4}
,{"course":"物理","score":90,"rank":3}
,{"course":"生物","score":90,"rank":2}
]
}]
},
"total_score":"600"
"overal_rank":"3",
}
如果我要得到每個(gè)學(xué)生的所有信息字段,則需要將json中的信息解析出來(lái)疯特。
select
t.* ----表中其他原始字段保留
,q.student_no
,q.student_name,
,q.class,
,q.total_score,
,q.overal_rank,
,q.course,q.score,q.rank
from T t
lateral view json_tuple(t.json_txt,
"student_no",
"student_name",
"class",
"total_score",
"overal_rank",
"score_detail.scoreList.[*].scores.[*].course",
"score_detail.scoreList.[*].scores.[*].score"
"score_detail.scoreList.[*].scores.[*].rank"
) q as course,score,rank
得到結(jié)果如下:
result
但是score和rank是以數(shù)列形式存儲(chǔ)在同一行哗魂,不方便計(jì)算,用trans_array()函數(shù)可以解決啦:
select
trans_array(5,',',student_no,student_name,class,total_score,overal_rank,course,score,rank) as (student_no,student_name,class,total_score,overal_rank,course,score,rank)
from (
select
student_no,student_name,class,total_score,overal_rank
,regexp_replace(course,'(\\[)|(\\])|("))','') as course ---去掉[]"符號(hào)
,regexp_replace(score,'(\\[)|(\\])|("))','') as score ---去掉[]"符號(hào)
,regexp_replace(rank,'(\\[)|(\\])|("))','') as rank ---去掉[]"符號(hào)
from result
) t
得到的結(jié)果就是縱列的分?jǐn)?shù)明細(xì):
result2