數(shù)據(jù)透視表屬于數(shù)據(jù)分析中比較常見的功能立叛,如下面的數(shù)據(jù):
我們以姓名為行,科目為列贡茅,數(shù)據(jù)透視后的結(jié)果為:
那么hive該如何實(shí)現(xiàn)這樣的功能呢秘蛇?我們首先把數(shù)據(jù)寫入到hive表中:
val seqData = Seq(
("小A","語(yǔ)文",80),
("小A","數(shù)學(xué)",65),
("小A","英語(yǔ)",70),
("小B","語(yǔ)文",82),
("小B","數(shù)學(xué)",95),
("小B","英語(yǔ)",60),
("小C","語(yǔ)文",88),
("小C","數(shù)學(xué)",85),
("小C","英語(yǔ)",83)
)
val seq2df = seqData.toDF("name","course","score")
seq2df.write.saveAsTable("default.classinfo12")
結(jié)果如下:
hive中其實(shí)是沒有直接實(shí)現(xiàn)數(shù)據(jù)透視功能的函數(shù)的,不過我們可以通過groupby實(shí)現(xiàn)類似的功能:
select
name,
sum(if(course = '語(yǔ)文',score,0)) as chinese_score,
sum(if(course = '數(shù)學(xué)',score,0)) as math_score,
sum(if(course = '英語(yǔ)',score,0)) as english_score
from
default.classinfo12
group by name
結(jié)果如下:
可以看到顶考,通過groupby赁还,我們就實(shí)現(xiàn)了類似pivot的功能,不過這樣的寫法也僅適合類別比較少的情況驹沿。當(dāng)類別比較多的時(shí)候艘策,可以使用spark來解決:
val df = spark.sql(
"""
|select
| *
|from
| default.classinfo12
""".stripMargin)
.groupBy("name")
.pivot("course")
.agg(sum("score") as "score")
結(jié)果如下:
好了,本篇就到這里啦~~