面試官:一千萬(wàn)的數(shù)據(jù),你是怎么查詢的司恳?
1 先給結(jié)論
對(duì)于1千萬(wàn)的數(shù)據(jù)查詢途乃,主要關(guān)注分頁(yè)查詢過程中的性能
- 針對(duì)偏移量大導(dǎo)致查詢速度慢:
- 先對(duì)查詢的字段創(chuàng)建唯一索引
- 根據(jù)業(yè)務(wù)需求,先定位查詢范圍(對(duì)應(yīng)主鍵id的范圍扔傅,比如大于多少耍共、小于多少、IN)
- 查詢時(shí)猎塞,將第2步確定的范圍作為查詢條件
- 針對(duì)查詢數(shù)據(jù)量大的導(dǎo)致查詢速度慢:
- 查詢時(shí)试读,減少不需要的列,查詢效率也可以得到明顯提升
- 一次盡可能按需查詢較少的數(shù)據(jù)條數(shù)
- 借助nosql緩存數(shù)據(jù)等來(lái)減輕mysql數(shù)據(jù)庫(kù)的壓力
2 準(zhǔn)備數(shù)據(jù)
2.1 創(chuàng)建表
CREATE TABLE `user_operation_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr4` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr5` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr6` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr7` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr8` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr9` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr10` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr11` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr12` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
2.2 造數(shù)據(jù)腳本
采用批量插入邢享,效率會(huì)快很多鹏往,而且每1000條數(shù)就commit,數(shù)據(jù)量太大骇塘,也會(huì)導(dǎo)致批量插入效率慢
DELIMITER ;;
CREATE DEFINER=`root`@`%` PROCEDURE `batch_insert_log`()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE userId INT DEFAULT 10000000;
set @execSql = 'INSERT INTO `big_data`.`user_operation_log`(`user_id`, `ip`, `op_data`, `attr1`, `attr2`, `attr3`, `attr4`, `attr5`, `attr6`, `attr7`, `attr8`, `attr9`, `attr10`, `attr11`, `attr12`) VALUES';
set @execData = '';
WHILE i<=10000000 DO
set @attr = "rand_string(50)";
set @execData = concat(@execData, "(", userId + i, ", '110.20.169.111', '用戶登錄操作'", ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")");
if i % 1000 = 0
then
set @stmtSql = concat(@execSql, @execData,";");
prepare stmt from @stmtSql;
execute stmt;
DEALLOCATE prepare stmt;
commit;
set @execData = "";
else
set @execData = concat(@execData, ",");
end if;
SET i=i+1;
END WHILE;
END
DELIMITER ;
delimiter $$
create function rand_string(n INT)
returns varchar(255) #該函數(shù)會(huì)返回一個(gè)字符串
begin
#chars_str定義一個(gè)變量 chars_str,類型是 varchar(100),默認(rèn)值'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
declare chars_str varchar(100) default
'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
declare return_str varchar(255) default '';
declare i int default 0;
while i < n do
set return_str =concat(return_str,substring(chars_str,floor(1+rand()*52),1));
set i = i + 1;
end while;
return return_str;
end $$
2.3 執(zhí)行存儲(chǔ)過程函數(shù)
因?yàn)槟M數(shù)據(jù)流量是1000W伊履,我這電腦配置不高,耗費(fèi)了不少時(shí)間款违,應(yīng)該個(gè)把小時(shí)吧
SELECT count(1) FROM `user_operation_log`;
2.4 普通分頁(yè)查詢
MySQL 支持 LIMIT 語(yǔ)句來(lái)選取指定的條數(shù)數(shù)據(jù)唐瀑, Oracle 可以使用 ROWNUM 來(lái)選取。
MySQL分頁(yè)查詢語(yǔ)法如下:
SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset
- 第一個(gè)參數(shù)指定第一個(gè)返回記錄行的偏移量
- 第二個(gè)參數(shù)指定返回記錄行的最大數(shù)目
下面我們開始測(cè)試查詢結(jié)果:
SELECT * FROM `user_operation_log` LIMIT 10000, 10;
查詢3次時(shí)間分別為:
這樣看起來(lái)速度還行插爹,不過是本地?cái)?shù)據(jù)庫(kù)哄辣,速度自然快點(diǎn)。
換個(gè)角度來(lái)測(cè)試
相同偏移量赠尾,不同數(shù)據(jù)量
SELECT * FROM `user_operation_log` LIMIT 10000, 10;
SELECT * FROM `user_operation_log` LIMIT 10000, 100;
SELECT * FROM `user_operation_log` LIMIT 10000, 1000;
SELECT * FROM `user_operation_log` LIMIT 10000, 10000;
SELECT * FROM `user_operation_log` LIMIT 10000, 100000;
SELECT * FROM `user_operation_log` LIMIT 10000, 1000000;
從上面結(jié)果可以得出結(jié)束:數(shù)據(jù)量越大力穗,花費(fèi)時(shí)間越長(zhǎng)(這不是廢話嗎???)
相同數(shù)據(jù)量气嫁,不同偏移量
SELECT * FROM `user_operation_log` LIMIT 100, 100;
SELECT * FROM `user_operation_log` LIMIT 1000, 100;
SELECT * FROM `user_operation_log` LIMIT 10000, 100;
SELECT * FROM `user_operation_log` LIMIT 100000, 100;
SELECT * FROM `user_operation_log` LIMIT 1000000, 100;
從上面結(jié)果可以得出結(jié)束:偏移量越大当窗,花費(fèi)時(shí)間越長(zhǎng)
3 如何優(yōu)化
既然我們經(jīng)過上面一番的折騰,也得出了結(jié)論寸宵,針對(duì)上面兩個(gè)問題:偏移大崖面、數(shù)據(jù)量大元咙,我們分別著手優(yōu)化
3.1 優(yōu)化數(shù)據(jù)量大的問題
SELECT * FROM `user_operation_log` LIMIT 1, 1000000
SELECT id FROM `user_operation_log` LIMIT 1, 1000000
SELECT id, user_id, ip, op_data, attr1, attr2, attr3, attr4, attr5, attr6, attr7, attr8, attr9, attr10, attr11, attr12 FROM `user_operation_log` LIMIT 1, 1000000
查詢結(jié)果如下:
上面模擬的是從1000W條數(shù)據(jù)表中 ,一次查詢出100W條數(shù)據(jù)巫员,看起來(lái)性能不佳庶香,但是我們常規(guī)業(yè)務(wù)中,很少有一次性從mysql中查詢出這么多條數(shù)據(jù)量的場(chǎng)景简识「弦矗可以結(jié)合nosql緩存數(shù)據(jù)等等來(lái)減輕mysql數(shù)據(jù)庫(kù)的壓力。
因此财异,針對(duì)查詢數(shù)據(jù)量大的問題:
- 查詢時(shí)倘零,減少不需要的列,查詢效率也可以得到明顯提升
- 一次盡可能按需查詢較少的數(shù)據(jù)條數(shù)
- 借助nosql緩存數(shù)據(jù)等來(lái)減輕mysql數(shù)據(jù)庫(kù)的壓力
第一條和第三條查詢速度差不多戳寸,這時(shí)候你肯定會(huì)吐槽呈驶,那我還寫那么多字段干啥呢,直接 * 不就完事了
注意本人的 MySQL 服務(wù)器和客戶端是在同一臺(tái)機(jī)器上疫鹊,所以查詢數(shù)據(jù)相差不多袖瞻,有條件的同學(xué)可以測(cè)測(cè)客戶端與MySQL分開
SELECT * 它不香嗎?
在這里順便補(bǔ)充一下為什么要禁止 SELECT *拆吆。難道簡(jiǎn)單無(wú)腦聋迎,它不香嗎?
主要兩點(diǎn):
- 用 "SELECT * " 數(shù)據(jù)庫(kù)需要解析更多的對(duì)象枣耀、字段霉晕、權(quán)限、屬性等相關(guān)內(nèi)容捞奕,在 SQL 語(yǔ)句復(fù)雜牺堰,硬解析較多的情況下,會(huì)對(duì)數(shù)據(jù)庫(kù)造成沉重的負(fù)擔(dān)颅围。
- 增大網(wǎng)絡(luò)開銷伟葫,* 有時(shí)會(huì)誤帶上如log、IconMD5之類的無(wú)用且大文本字段院促,數(shù)據(jù)傳輸size會(huì)幾何增漲筏养。特別是MySQL和應(yīng)用程序不在同一臺(tái)機(jī)器,這種開銷非常明顯常拓。
3.2 優(yōu)化偏移量大的問題
3.2.1 采用子查詢方式
我們可以先定位偏移位置的 id渐溶,然后再查詢數(shù)據(jù)
SELECT id FROM `user_operation_log` LIMIT 1000000, 1;
SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10;
查詢結(jié)果如下:
這種查詢效率不理想啊EАU泼汀!奇怪眉睹,id是主鍵荔茬,主鍵索引不應(yīng)當(dāng)查詢這么慢啊竹海?慕蔚??
先EXPLAIN分析下sql語(yǔ)句:
EXPLAIN SELECT id FROM `user_operation_log` LIMIT 1000000, 1;
EXPLAIN SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10;
奇怪斋配,走了索引啊孔飒,而且是主鍵索引,如下
帶著十萬(wàn)個(gè)為什么和千萬(wàn)個(gè)不甘心艰争,嘗試給主鍵再加一層唯一索引
ALTER TABLE `big_data`.`user_operation_log`
ADD UNIQUE INDEX `idx_id`(`id`) USING BTREE;
由于數(shù)據(jù)量有1000W坏瞄,所以,加索引需要等待一會(huì)兒甩卓,畢竟創(chuàng)建1000W條數(shù)據(jù)的索引鸠匀,一般機(jī)器沒那么快。
然后再次執(zhí)行上面的查詢逾柿,結(jié)果如下:
天啊缀棍,這查詢效率的差距不止十倍!;怼爬范!
再次EXPLAIN分析一下:
命中的索引不一樣,命中唯一索引的查詢弱匪,效率高出不止十倍青瀑。
結(jié)論:
對(duì)于大表查詢,不要太相信主鍵索引能夠帶來(lái)多少的性能提升萧诫,老老實(shí)實(shí)根據(jù)查詢字段斥难,添加相應(yīng)索引吧!2聘椤蘸炸!
但是上面的方法只適用于==id是遞增==的情況,如果id不是遞增的尖奔,比如雪花算法生成的id搭儒,得按照下面的方式:
注意:
- 某些 mysql 版本不支持在 in 子句中使用 limit,所以采用了多個(gè)嵌套select
- 但這種缺點(diǎn)是分頁(yè)查詢只能放在子查詢里面
SELECT * FROM `user_operation_log` WHERE id IN (SELECT t.id FROM (SELECT id FROM `user_operation_log` LIMIT 1000000, 10) AS t);
查詢所花費(fèi)時(shí)間如下:
EXPLAIN一下
EXPLAIN SELECT * FROM `user_operation_log` WHERE id IN (SELECT t.id FROM (SELECT id FROM `user_operation_log` LIMIT 1000000, 10) AS t);
3.2.2 采用 id 限定方式
這種方法要求更高些提茁,==id必須是連續(xù)遞增==(注意是連續(xù)遞增淹禾,不僅僅是遞增哦),而且還得計(jì)算id的范圍茴扁,然后使用 between铃岔,sql如下
SELECT * FROM `user_operation_log` WHERE id between 1000000 AND 1000100 LIMIT 100;
SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100;
可以看出,查詢效率是相當(dāng)不錯(cuò)的
注意:這里的 LIMIT 是限制了條數(shù),沒有采用偏移量
還是EXPLAIN分析一下
EXPLAIN SELECT * FROM `user_operation_log` WHERE id between 1000000 AND 1000100 LIMIT 100;
EXPLAIN SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100;
因此毁习,針對(duì)分頁(yè)查詢智嚷,偏移量大導(dǎo)致查詢慢的問題:
- 先對(duì)查詢的字段創(chuàng)建唯一索引
- 根據(jù)業(yè)務(wù)需求,先定位查詢范圍(對(duì)應(yīng)主鍵id的范圍纺且,比如大于多少盏道、小于多少、IN)
- 查詢時(shí)载碌,將第2步確定的范圍作為查詢條件
本文由mdnice多平臺(tái)發(fā)布