行列轉(zhuǎn)換在做報表分析時還是經(jīng)常會遇到的,今天就說一下如何實現(xiàn)行列轉(zhuǎn)換吧萍诱。
行列轉(zhuǎn)換就是如下圖所示兩種展示形式的互相轉(zhuǎn)換
1. 行轉(zhuǎn)列
我們來看一個簡單的例子,我們要把下面這個表的數(shù)據(jù),轉(zhuǎn)換成圖二的樣式
要轉(zhuǎn)換的結(jié)果數(shù)據(jù)展示
先看看建表語句:
CREATE TABLE tb_score_01(
id INT(11) NOT NULL,
userid VARCHAR(20) NOT NULL COMMENT '用戶id',
subject VARCHAR(20) COMMENT '科目',
score DOUBLE COMMENT '成績'
)
DUPLICATE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false"
);
INSERT INTO tb_score_01 VALUES (1,'001','語文',90);
INSERT INTO tb_score_01 VALUES (2,'001','數(shù)學',92);
INSERT INTO tb_score_01 VALUES (3,'001','英語',80);
INSERT INTO tb_score_01 VALUES (4,'002','語文',88);
INSERT INTO tb_score_01 VALUES (5,'002','數(shù)學',90);
INSERT INTO tb_score_01 VALUES (6,'002','英語',75.5);
INSERT INTO tb_score_01 VALUES (7,'003','語文',70);
INSERT INTO tb_score_01 VALUES (8,'003','數(shù)學',85);
INSERT INTO tb_score_01 VALUES (9,'003','英語',90);
INSERT INTO tb_score_01 VALUES (10,'003','政治',82);
傳統(tǒng)的做法我們大概是這樣實現(xiàn),一般是通過 case when
語句
SELECT userid,
SUM(CASE `subject` WHEN '語文' THEN score ELSE 0 END) as '語文',
SUM(CASE `subject` WHEN '數(shù)學' THEN score ELSE 0 END) as '數(shù)學',
SUM(CASE `subject` WHEN '英語' THEN score ELSE 0 END) as '英語',
SUM(CASE `subject` WHEN '政治' THEN score ELSE 0 END) as '政治'
FROM tb_score
GROUP BY userid;
或者
SELECT userid,
SUM(IF(`subject`='語文',score,0)) as '語文',
SUM(IF(`subject`='數(shù)學',score,0)) as '數(shù)學',
SUM(IF(`subject`='英語',score,0)) as '英語',
SUM(IF(`subject`='政治',score,0)) as '政治'
FROM tb_score
GROUP BY userid;
我們來看看 Doris 怎么實現(xiàn)這個行轉(zhuǎn)列呢券时,有沒有更簡單、性能更好的一種方式
我們是不是可以首先將這個科目伏伯、成績組成一個Map
然后在外層對這個 Map 進行遍歷展開
從而完成這樣一個行列轉(zhuǎn)換呢
我們來看看實現(xiàn)
select
userid,
IFNULL(map['語文'],0) as '語文',
IFNULL(map['英語'],0) as '英語',
IFNULL(map['數(shù)學'],0) as '數(shù)學',
IFNULL(map['政治'],0) as '政治'
from (
select userid ,map_agg(subject,score) as map from tb_score group by userid
) t ;
這樣實現(xiàn)上性能更好,我們來看一下效果
select
-> userid,
-> IFNULL(map['語文'],0) as '語文',
-> IFNULL(map['英語'],0) as '英語',
-> IFNULL(map['數(shù)學'],0) as '數(shù)學',
-> IFNULL(map['政治'],0) as '政治'
-> from (
-> select userid ,map_agg(subject,score) as map from tb_score group by userid
-> ) t ;
+--------+--------+--------+--------+--------+
| userid | 語文 | 英語 | 數(shù)學 | 政治 |
+--------+--------+--------+--------+--------+
| 001 | 90 | 80 | 92 | 0 |
| 002 | 88 | 75.5 | 90 | 0 |
| 003 | 70 | 90 | 85 | 82 |
+--------+--------+--------+--------+--------+
3 rows in set (0.02 sec)
2. 列轉(zhuǎn)行
實際使用中我們還有很多場景要把數(shù)據(jù)沖列轉(zhuǎn)成行,下面我們來看一個例子捌袜,這個例子中每行是一個學生的说搅,語文、數(shù)學虏等、英語弄唧、政治的成績适肠,
我們想轉(zhuǎn)換成每門成績都是獨立的一行,轉(zhuǎn)出的效果如下:
我們來看看一個寬表轉(zhuǎn)成高表我們之前的是怎么實現(xiàn)候引,一般我們是通過union all的方式侯养,每科我們都是一個單獨的SQL語句,然后將這些SQL Unoin all 在一起得到我們想要的結(jié)果澄干。
SELECT userid,'語文' AS course,cn_score AS score FROM tb_score1
UNION ALL
SELECT userid,'數(shù)學' AS course,math_score AS score FROM tb_score1
UNION ALL
SELECT userid,'英語' AS course,en_score AS score FROM tb_score1
UNION ALL
SELECT userid,'政治' AS course,po_score AS score FROM tb_score1
ORDER BY userid;
這樣做的缺點:
SQL 冗余
大量的union all 也會帶來性能問題
我們來看看 Doris 怎么實現(xiàn)逛揩,首先 Doris 提供了 Lateral view,其實就是用來和像類似explode這種UDTF函數(shù)聯(lián)用的麸俘,lateral view會將 UDTF 生成的結(jié)果放到一個虛擬表中辩稽,然后這個虛擬表
會和輸入行
進行 join
來達到連接 UDTF 外的 select 字段的目的
還是以上面的例子來看,Doris我怎么對這個寬表轉(zhuǎn)成高表从媚,實現(xiàn)就是借助Lateral view
CREATE TABLE `tb_score1` (
`id` int(11) NOT NULL,
`userid` varchar(20) NOT NULL COMMENT '用戶id',
`cn_score` double NULL COMMENT '語文成績',
`math_score` double NULL COMMENT '數(shù)學成績',
`en_score` double NULL COMMENT '英語成績',
`po_score` double NULL COMMENT '政治成績'
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);;
INSERT INTO `tb_score1` (`id`, `userid`, `cn_score`, `math_score`, `en_score`, `po_score`) VALUES (1, '001', 90, 92, 80, 0);
INSERT INTO `tb_score1` (`id`, `userid`, `cn_score`, `math_score`, `en_score`, `po_score`) VALUES (2, '002', 88, 90, 75.5, 0);
INSERT INTO `tb_score1` (`id`, `userid`, `cn_score`, `math_score`, `en_score`, `po_score`) VALUES (3, '003', 70, 85, 90, 82);
- 首先我借助Lateral view 形成一個 UserID逞泄、客戶成績組成一個字符(使用逗號連接),達到下面的效果
+--------+--------------------+
| userid | arr |
+--------+--------------------+
| 001 | ["語文", "90"] |
| 001 | ["數(shù)學", "92"] |
| 001 | ["英語", "80"] |
| 001 | ["政治", "0"] |
| 002 | ["語文", "88"] |
| 002 | ["數(shù)學", "90"] |
| 002 | ["英語", "75.5"] |
| 002 | ["政治", "0"] |
| 003 | ["語文", "70"] |
| 003 | ["數(shù)學", "85"] |
| 003 | ["英語", "90"] |
| 003 | ["政治", "82"] |
+--------+--------------------+
12 rows in set (0.02 sec)
然后對這個上面的 arr 字符串,借助于 Doris 提供的 SPLIT_BY_STRING 函數(shù)完成字符串轉(zhuǎn)數(shù)組的動作
最后遍歷數(shù)組
完成列轉(zhuǎn)行的效果
SELECT
userid,
element_at ( arr, 1 ) AS SUBJECT,
element_at ( arr, 2 ) AS score
FROM
(
SELECT
userid,
SPLIT_BY_STRING ( sub, ',' ) arr
FROM
(
SELECT
userid,
array (
concat( '語文', ',', cn_score ),
concat( '數(shù)學', ',', math_score ),
concat( '英語', ',', en_score ),
concat( '政治', ',', po_score )) AS scores
FROM
tb_score1
) t LATERAL VIEW explode ( scores ) tbl1 AS sub
) aaa
最后的效果如下:
SELECT
-> userid,
-> element_at ( arr, 1 ) AS SUBJECT,
-> element_at ( arr, 2 ) AS score
-> FROM
-> (
-> SELECT
-> userid,
-> SPLIT_BY_STRING ( sub, ',' ) arr
-> FROM
-> (
-> SELECT
-> userid,
-> array (
-> concat( '語文', ',', cn_score ),
-> concat( '數(shù)學', ',', math_score ),
-> concat( '英語', ',', en_score ),
-> concat( '政治', ',', po_score )) AS scores
-> FROM
-> tb_score1
-> ) t LATERAL VIEW explode ( scores ) tbl1 AS sub
-> ) aaa;
+--------+---------+-------+
| userid | SUBJECT | score |
+--------+---------+-------+
| 001 | 語文 | 90 |
| 001 | 數(shù)學 | 92 |
| 001 | 英語 | 80 |
| 001 | 政治 | 0 |
| 002 | 語文 | 88 |
| 002 | 數(shù)學 | 90 |
| 002 | 英語 | 75.5 |
| 002 | 政治 | 0 |
| 003 | 語文 | 70 |
| 003 | 數(shù)學 | 85 |
| 003 | 英語 | 90 |
| 003 | 政治 | 82 |
+--------+---------+-------+
12 rows in set (0.02 sec)