OPPO互聯(lián)網(wǎng)技術(shù)
https://segmentfault.com/a/1190000021268588
1. 背景
線上某集群峰值TPS超過(guò)100萬(wàn)/秒左右(主要為寫(xiě)流量妓柜,讀流量很低)膏蚓,峰值tps幾乎已經(jīng)到達(dá)集群上限纬傲,同時(shí)平均時(shí)延也超過(guò)100ms,隨著讀寫(xiě)流量的進(jìn)一步增加贴铜,時(shí)延抖動(dòng)嚴(yán)重影響業(yè)務(wù)可用性粪摘。該集群采用mongodb天然的分片模式架構(gòu),數(shù)據(jù)均衡的分布于各個(gè)分片中绍坝,添加片鍵啟用分片功能后實(shí)現(xiàn)完美的負(fù)載均衡徘意。集群每個(gè)節(jié)點(diǎn)流量監(jiān)控如下圖所示:
從上圖可以看出集群流量比較大,峰值已經(jīng)突破120萬(wàn)/秒陷嘴,其中delete過(guò)期刪除的流量不算在總流量里面(delete由主觸發(fā)刪除映砖,但是主上面不會(huì)顯示间坐,只會(huì)在從節(jié)點(diǎn)拉取oplog的時(shí)候顯示)灾挨。如果算上主節(jié)點(diǎn)的delete流量,總tps超過(guò)150萬(wàn)/秒竹宋。
2. 軟件優(yōu)化
在不增加服務(wù)器資源的情況下劳澄,首先做了如下軟件層面的優(yōu)化,并取得了理想的數(shù)倍性能提升:
業(yè)務(wù)層面優(yōu)化
Mongodb配置優(yōu)化
存儲(chǔ)引擎優(yōu)化
2.1 業(yè)務(wù)層面優(yōu)化
該集群總文檔近百億條蜈七,每條文檔記錄默認(rèn)保存三天秒拔,業(yè)務(wù)隨機(jī)散列數(shù)據(jù)到三天后任意時(shí)間點(diǎn)隨機(jī)過(guò)期淘汰。由于文檔數(shù)目很多飒硅,白天平峰監(jiān)控可以發(fā)現(xiàn)從節(jié)點(diǎn)經(jīng)常有大量delete操作砂缩,甚至部分時(shí)間點(diǎn)delete刪除操作數(shù)已經(jīng)超過(guò)了業(yè)務(wù)方讀寫(xiě)流量,因此考慮把delete過(guò)期操作放入夜間進(jìn)行三娩,過(guò)期索引添加方法如下:
<span class="hljs-attribute" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">Db.collection.createIndex( { <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"expireAt": 1 }, { expireAfterSeconds: 0 } )
</span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-attribute" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">
上面的過(guò)期索引中 expireAfterSeconds=0庵芭,代表 collection 集合中的文檔的過(guò)期時(shí)間點(diǎn)在 expireAt 時(shí)間點(diǎn)過(guò)期,例如:
db.collection.insert ({
<span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">//表示該文檔在夜間凌晨1點(diǎn)這個(gè)時(shí)間點(diǎn)將會(huì)被過(guò)期刪除
<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"expireAt": <span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">new <span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">Date(<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">'July 22, 2019 01:00:00'),
<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"logEvent": <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">2,
<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"logMessage": <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"Success!"
})
</span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-comment" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(128, 128, 128); word-wrap: inherit !important; word-break: inherit !important;">
通過(guò)隨機(jī)散列expireAt在三天后的凌晨任意時(shí)間點(diǎn)雀监,即可規(guī)避白天高峰期觸發(fā)過(guò)期索引引入的集群大量delete双吆,從而降低了高峰期集群負(fù)載,最終減少業(yè)務(wù)平均時(shí)延及抖動(dòng)会前。
Delete 過(guò)期 Tips1: expireAfterSeconds 含義
在expireAt指定的絕對(duì)時(shí)間點(diǎn)過(guò)期好乐,也就是12.22日凌晨2:01過(guò)期
Db.collection.createIndex( { <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"expireAt": <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">1}, { expireAfterSeconds: <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">0 })
db.log_events.insert( { <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"expireAt": <span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">new Date(Dec <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">22, <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">2019 <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">02:<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">01:<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">00'),<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"logEvent": <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">2,<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"logMessage": <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"Success!"})
</span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">
在expireAt指定的時(shí)間往后推遲expireAfterSeconds秒過(guò)期,也就是當(dāng)前時(shí)間往后推遲60秒過(guò)期
db.log_events.insert( {<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"createdAt": <span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">new <span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">Date(),<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"logEvent": <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">2,<span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"logMessage": <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"Success!"} )
Db.collection.createIndex( { <span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">"expireAt": <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">1 }, { <span class="hljs-attr" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">expireAfterSeconds: <span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">60 } )
</span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-attr" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-built_in" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-keyword" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(248, 35, 117); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-string" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(238, 220, 112); word-wrap: inherit !important; word-break: inherit !important;">
Delete過(guò)期Tips2:為何mongostat只能監(jiān)控到從節(jié)點(diǎn)有delete操作瓦宜,主節(jié)點(diǎn)沒(méi)有蔚万?
原因是過(guò)期索引只在master主節(jié)點(diǎn)觸發(fā),觸發(fā)后主節(jié)點(diǎn)會(huì)直接刪除調(diào)用對(duì)應(yīng)wiredtiger存儲(chǔ)引擎接口做刪除操作临庇,不會(huì)走正常的客戶(hù)端鏈接處理流程反璃,因此主節(jié)點(diǎn)上看不到delete統(tǒng)計(jì)区转。
主節(jié)點(diǎn)過(guò)期delete后會(huì)生存對(duì)于的delete oplog信息,從節(jié)點(diǎn)通過(guò)拉取主節(jié)點(diǎn)oplog然后模擬對(duì)于client回放版扩,這樣就保證了主數(shù)據(jù)刪除的同時(shí)從數(shù)據(jù)也得以刪除废离,保證數(shù)據(jù)最終一致性。從節(jié)點(diǎn)模擬client回放過(guò)程將會(huì)走正常的client鏈接過(guò)程礁芦,因此會(huì)記錄delete count統(tǒng)計(jì)蜻韭。
2.2 Mongodb配置優(yōu)化(網(wǎng)絡(luò)IO復(fù)用,網(wǎng)絡(luò)IO和磁盤(pán)IO做分離)
由于集群tps高柿扣,同時(shí)整點(diǎn)有大量推送肖方,因此整點(diǎn)并發(fā)會(huì)更高,mongodb默認(rèn)的一個(gè)請(qǐng)求一個(gè)線程這種模式將會(huì)嚴(yán)重影響系統(tǒng)負(fù)載未状,該默認(rèn)配置不適合高并發(fā)的讀寫(xiě)應(yīng)用場(chǎng)景俯画。官方介紹如下:
2.2.1 Mongodb內(nèi)部網(wǎng)絡(luò)線程模型實(shí)現(xiàn)原理
mongodb默認(rèn)網(wǎng)絡(luò)模型架構(gòu)是一個(gè)客戶(hù)端鏈接,mongodb會(huì)創(chuàng)建一個(gè)線程處理該鏈接fd的所有讀寫(xiě)請(qǐng)求及磁盤(pán)IO操作司草。
Mongodb默認(rèn)網(wǎng)絡(luò)線程模型不適合高并發(fā)讀寫(xiě)原因如下:
在高并發(fā)的情況下艰垂,瞬間就會(huì)創(chuàng)建大量的線程,例如線上的這個(gè)集群埋虹,連接數(shù)會(huì)瞬間增加到1萬(wàn)左右猜憎,也就是操作系統(tǒng)需要瞬間創(chuàng)建1萬(wàn)個(gè)線程,這樣系統(tǒng)load負(fù)載就會(huì)很高搔课。
此外胰柑,當(dāng)鏈接請(qǐng)求處理完,進(jìn)入流量低峰期的時(shí)候爬泥,客戶(hù)端連接池回收鏈接柬讨,這時(shí)候mongodb服務(wù)端就需要銷(xiāo)毀線程,這樣進(jìn)一步加劇了系統(tǒng)負(fù)載袍啡,同時(shí)進(jìn)一步增加了數(shù)據(jù)庫(kù)的抖動(dòng)踩官,特別是在PHP這種短鏈接業(yè)務(wù)中更加明顯,頻繁的創(chuàng)建線程銷(xiāo)毀線程造成系統(tǒng)高負(fù)債葬馋。
一個(gè)鏈接一個(gè)線程卖鲤,該線程除了負(fù)責(zé)網(wǎng)絡(luò)收發(fā)外,還負(fù)責(zé)寫(xiě)數(shù)據(jù)到存儲(chǔ)引擎畴嘶,整個(gè)網(wǎng)絡(luò) I/O 處理和磁盤(pán) I/O 處理都由同一個(gè)線程負(fù)責(zé)蛋逾,本身架構(gòu)設(shè)計(jì)就是一個(gè)缺陷。
2.2.2 網(wǎng)絡(luò)線程模型優(yōu)化方法
為了適應(yīng)高并發(fā)的讀寫(xiě)場(chǎng)景窗悯,mongodb-3.6開(kāi)始引入serviceExecutor: adaptive配置区匣,該配置根據(jù)請(qǐng)求數(shù)動(dòng)態(tài)調(diào)整網(wǎng)絡(luò)線程數(shù),并盡量做到網(wǎng)絡(luò)IO復(fù)用來(lái)降低線程創(chuàng)建消耗引起的系統(tǒng)高負(fù)載問(wèn)題。
此外亏钩,加上serviceExecutor: adaptive配置后莲绰,借助boost:asio網(wǎng)絡(luò)模塊實(shí)現(xiàn)網(wǎng)絡(luò)IO復(fù)用,同時(shí)實(shí)現(xiàn)網(wǎng)絡(luò)IO和磁盤(pán)IO分離姑丑。這樣高并發(fā)情況下蛤签,通過(guò)網(wǎng)絡(luò)鏈接IO復(fù)用和mongodb的鎖操作來(lái)控制磁盤(pán)IO訪問(wèn)線程數(shù),最終降低了大量線程創(chuàng)建和消耗帶來(lái)的高系統(tǒng)負(fù)載栅哀,最終通過(guò)該方式提升高并發(fā)讀寫(xiě)性能震肮。
2.2.3 網(wǎng)絡(luò)線程模型優(yōu)化前后性能對(duì)比
在該大流量集群中增加serviceExecutor: adaptive配置實(shí)現(xiàn)網(wǎng)絡(luò)IO復(fù)用及網(wǎng)絡(luò)IO與磁盤(pán)IO做分離后,該大流量集群時(shí)延大幅度降低留拾,同時(shí)系統(tǒng)負(fù)載和慢日志也減少很多戳晌,具體如下:
2.2.3.1 優(yōu)化前后系統(tǒng)負(fù)載對(duì)比
該集群有多個(gè)分片,其中一個(gè)分片配置優(yōu)化后的主節(jié)點(diǎn)和同一時(shí)刻未優(yōu)化配置的主節(jié)點(diǎn)load負(fù)載比較:
未優(yōu)化配置的load:
優(yōu)化配置的load:
2.2.3.2 優(yōu)化前后慢日志對(duì)比
該集群有多個(gè)分片痴柔,其中一個(gè)分片配置優(yōu)化后的主節(jié)點(diǎn)和同一時(shí)刻未優(yōu)化配置的主節(jié)點(diǎn)慢日志數(shù)比較:
同一時(shí)間的慢日志數(shù)統(tǒng)計(jì):
未優(yōu)化配置的慢日志數(shù)(19621):
優(yōu)化配置后的慢日志數(shù)(5222):
2.2.3.3 優(yōu)化前后平均時(shí)延對(duì)比
該集群所有節(jié)點(diǎn)加上網(wǎng)絡(luò)IO復(fù)用配置后與默認(rèn)配置的平均時(shí)延對(duì)比如下:
從上圖可以看出沦偎,網(wǎng)絡(luò)IO復(fù)用后時(shí)延降低了1-2倍。
2.3 wiredtiger存儲(chǔ)引擎優(yōu)化
從上一節(jié)可以看出平均時(shí)延從200ms降低到了平均80ms左右咳蔚,很顯然平均時(shí)延還是很高豪嚎,如何進(jìn)一步提升性能降低時(shí)延?繼續(xù)分析集群屹篓,我們發(fā)現(xiàn)磁盤(pán)IO一會(huì)兒為0疙渣,一會(huì)兒持續(xù)性100%,并且有跌0現(xiàn)象堆巧,現(xiàn)象如下:
從圖中可以看出,I/O寫(xiě)入一次性到2G泼菌,后面幾秒鐘內(nèi)I/O會(huì)持續(xù)性阻塞谍肤,讀寫(xiě)I/O完全跌0,avgqu-sz哗伯、awit巨大荒揣,util次序性100%,在這個(gè)I/O跌0的過(guò)程中,業(yè)務(wù)方反應(yīng)的TPS同時(shí)跌0焊刹。
此外系任,在大量寫(xiě)入IO后很長(zhǎng)一段時(shí)間util又持續(xù)為0%,現(xiàn)象如下:
總體IO負(fù)載曲線如下:
從圖中可以看出IO很長(zhǎng)一段時(shí)間持續(xù)為0%虐块,然后又飆漲到100%持續(xù)很長(zhǎng)時(shí)間,當(dāng)IO util達(dá)到100%后贺奠,分析日志發(fā)現(xiàn)又大量滿(mǎn)日志霜旧,同時(shí)mongostat監(jiān)控流量發(fā)現(xiàn)如下現(xiàn)象:
從上可以看出我們定時(shí)通過(guò)mongostat獲取某個(gè)節(jié)點(diǎn)的狀態(tài)的時(shí)候,經(jīng)常超時(shí)儡率,超時(shí)的時(shí)候剛好是io util=100%的時(shí)候挂据,這時(shí)候IO跟不上客戶(hù)端寫(xiě)入速度造成阻塞迹淌。
有了以上現(xiàn)象眷射,我們可以確定問(wèn)題是由于IO跟不上客戶(hù)端寫(xiě)入速度引起,第2章我們已經(jīng)做了mongodb服務(wù)層的優(yōu)化,現(xiàn)在我們開(kāi)始著手wiredtiger存儲(chǔ)引擎層面的優(yōu)化件已,主要通過(guò)以下幾個(gè)方面:
cachesize調(diào)整
臟數(shù)據(jù)淘汰比例調(diào)整
checkpoint優(yōu)化
2.3.1 cachesize調(diào)整優(yōu)化(為何cacheSize越大性能越差)
前面的IO分析可以看出,超時(shí)時(shí)間點(diǎn)和I/O阻塞跌0的時(shí)間點(diǎn)一致掀潮,因此如何解決I/O跌0成為了解決改問(wèn)題的關(guān)鍵所在重付。
找個(gè)集群平峰期(總tps50萬(wàn)/s)查看當(dāng)時(shí)該節(jié)點(diǎn)的TPS,發(fā)現(xiàn)TPS不是很高,單個(gè)分片也就3-4萬(wàn)左右,為何會(huì)有大量的刷盤(pán)袁波,瞬間能夠達(dá)到10G/S瓦阐,造成IO util持續(xù)性跌0(因?yàn)镮O跟不上寫(xiě)入速度)。繼續(xù)分析wiredtiger存儲(chǔ)引擎刷盤(pán)實(shí)現(xiàn)原理篷牌,wiredtiger存儲(chǔ)引擎是一種B+樹(shù)存儲(chǔ)引擎睡蟋,mongodb文檔首先轉(zhuǎn)換為KV寫(xiě)入wiredtiger戳杀,在寫(xiě)入過(guò)程中,內(nèi)存會(huì)越來(lái)越大夭苗,當(dāng)內(nèi)存中臟數(shù)據(jù)和內(nèi)存總占用率達(dá)到一定比例傍菇,就開(kāi)始刷盤(pán)。同時(shí)當(dāng)達(dá)到checkpoint限制也會(huì)觸發(fā)刷盤(pán)操作界赔,查看任意一個(gè)mongod節(jié)點(diǎn)進(jìn)程狀態(tài)丢习,發(fā)現(xiàn)消耗的內(nèi)存過(guò)多,達(dá)到110G淮悼,如下圖所示:
于是查看mongod.conf配置文件咐低,發(fā)現(xiàn)配置文件中配置的cacheSizeGB: 110G,可以看出敛惊,存儲(chǔ)引擎中KV總量幾乎已經(jīng)達(dá)到110G渊鞋,按照5%臟頁(yè)開(kāi)始刷盤(pán)的比例,峰值情況下cachesSize設(shè)置得越大,里面得臟數(shù)據(jù)就會(huì)越多锡宋,而磁盤(pán)IO能力跟不上臟數(shù)據(jù)得產(chǎn)生速度儡湾,這種情況很可能就是造成磁盤(pán)I/O瓶頸寫(xiě)滿(mǎn),并引起I/O跌0的原因执俩。
此外徐钠,查看該機(jī)器的內(nèi)存,可以看到內(nèi)存總大小為190G役首,其中已經(jīng)使用110G左右尝丐,幾乎是mongod的存儲(chǔ)引起占用,這樣會(huì)造成內(nèi)核態(tài)的page cache減少衡奥,大量寫(xiě)入的時(shí)候內(nèi)核cache不足就會(huì)引起磁盤(pán)缺頁(yè)中斷爹袁。
解決辦法:通過(guò)上面的分析問(wèn)題可能是大量寫(xiě)入的場(chǎng)景,臟數(shù)據(jù)太多容易造成一次性大量I/O寫(xiě)入矮固,于是我們可以考慮把存儲(chǔ)引起cacheSize調(diào)小到50G失息,來(lái)減少同一時(shí)刻I/O寫(xiě)入的量,從而規(guī)避峰值情況下一次性大量寫(xiě)入的磁盤(pán)I/O打滿(mǎn)阻塞問(wèn)題档址。
2.3.2 存儲(chǔ)引擎dirty臟數(shù)據(jù)淘汰優(yōu)化
調(diào)整cachesize大小解決了5s請(qǐng)求超時(shí)問(wèn)題盹兢,對(duì)應(yīng)告警也消失了,但是問(wèn)題還是存在守伸,5S超時(shí)消失了绎秒,1s超時(shí)問(wèn)題還是偶爾會(huì)出現(xiàn)。
因此如何在調(diào)整cacheSize的情況下進(jìn)一步規(guī)避I/O大量寫(xiě)的問(wèn)題成為了問(wèn)題解決的關(guān)鍵尼摹,進(jìn)一步分析存儲(chǔ)引擎原理见芹,如何解決內(nèi)存和I/O的平衡關(guān)系成為了問(wèn)題解決的關(guān)鍵,mongodb默認(rèn)存儲(chǔ)因?yàn)閣iredtiger的cache淘汰策略相關(guān)的幾個(gè)配置如下:
調(diào)整cacheSize從120G到50G后窘问,如果臟數(shù)據(jù)比例達(dá)到5%辆童,則極端情況下如果淘汰速度跟不上客戶(hù)端寫(xiě)入速度,這樣還是容易引起I/O瓶頸惠赫,最終造成阻塞。
解決辦法:如何進(jìn)一步減少持續(xù)性I/O寫(xiě)入故黑,也就是如何平衡cache內(nèi)存和磁盤(pán)I/O的關(guān)系成為問(wèn)題關(guān)鍵所在儿咱。從上表中可以看出,如果臟數(shù)據(jù)及總內(nèi)占用存達(dá)到一定比例场晶,后臺(tái)線程開(kāi)始選擇page進(jìn)行淘汰寫(xiě)盤(pán)混埠,如果臟數(shù)據(jù)及內(nèi)存占用比例進(jìn)一步增加,那么用戶(hù)線程就會(huì)開(kāi)始做page淘汰诗轻,這是個(gè)非常危險(xiǎn)的阻塞過(guò)程钳宪,造成用戶(hù)請(qǐng)求驗(yàn)證阻塞。平衡cache和I/O的方法:調(diào)整淘汰策略,讓后臺(tái)線程盡早淘汰數(shù)據(jù)吏颖,避免大量刷盤(pán)搔体,同時(shí)降低用戶(hù)線程閥值,避免用戶(hù)線程進(jìn)行page淘汰引起阻塞半醉。優(yōu)化調(diào)整存儲(chǔ)引起配置如下:
eviction_target: 75%
eviction_trigger:97%
eviction_dirty_target: %3
eviction_dirty_trigger:25%
evict.threads_min:8
evict.threads_min:12
總體思想是讓后臺(tái)evict盡量早點(diǎn)淘汰臟頁(yè)page到磁盤(pán)疚俱,同時(shí)調(diào)整evict淘汰線程數(shù)來(lái)加快臟數(shù)據(jù)淘汰,調(diào)整后mongostat及客戶(hù)端超時(shí)現(xiàn)象進(jìn)一步緩解缩多。
2.3.3 存儲(chǔ)引擎checkpoint優(yōu)化調(diào)整
存儲(chǔ)引擎得checkpoint檢測(cè)點(diǎn)呆奕,實(shí)際上就是做快照,把當(dāng)前存儲(chǔ)引擎的臟數(shù)據(jù)全部記錄到磁盤(pán)衬吆。觸發(fā)checkpoint的條件默認(rèn)又兩個(gè)梁钾,觸發(fā)條件如下:
固定周期做一次checkpoint快照,默認(rèn)60s
增量的redo log(也就是journal日志)達(dá)到2G
當(dāng)journal日志達(dá)到2G或者redo log沒(méi)有達(dá)到2G并且距離上一次時(shí)間間隔達(dá)到60s逊抡,wiredtiger將會(huì)觸發(fā)checkpoint姆泻,如果在兩次checkpoint的時(shí)間間隔類(lèi)evict淘汰線程淘汰的dirty page越少,那么積壓的臟數(shù)據(jù)就會(huì)越多秦忿,也就是checkpoint的時(shí)候臟數(shù)據(jù)就會(huì)越多麦射,造成checkpoint的時(shí)候大量的IO寫(xiě)盤(pán)操作。
如果我們把checkpoint的周期縮短灯谣,那么兩個(gè)checkpoint期間的臟數(shù)據(jù)相應(yīng)的也就會(huì)減少潜秋,磁盤(pán)IO 100%持續(xù)的時(shí)間也就會(huì)縮短。
checkpoint調(diào)整后的值如下:
<span class="hljs-attr" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">checkpoint=(wait=<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">25,log_size=<span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;" style="font-size: inherit;color: inherit;line-height: inherit;overflow-wrap: inherit !important;word-break: inherit !important;">1GB)
</span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-number" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(174, 135, 250); word-wrap: inherit !important; word-break: inherit !important;"></span class="hljs-attr" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(165, 218, 45); word-wrap: inherit !important; word-break: inherit !important;">
2.3.4 存儲(chǔ)引擎優(yōu)化前后IO對(duì)比
通過(guò)上面三個(gè)方面的存儲(chǔ)引擎優(yōu)化后胎许,磁盤(pán)IO開(kāi)始平均到各個(gè)不同的時(shí)間點(diǎn)峻呛,iostat監(jiān)控優(yōu)化后的IO負(fù)載如下:
從上面的IO負(fù)載圖可以看出,之前的IO一會(huì)兒為0%辜窑,一會(huì)兒100%現(xiàn)象有所緩解钩述,總結(jié)如下圖所示:
2.3.5 存儲(chǔ)引擎優(yōu)化前后時(shí)延對(duì)比
優(yōu)化前后時(shí)延對(duì)比如下(注: 該集群有幾個(gè)業(yè)務(wù)同時(shí)使用,優(yōu)化前后時(shí)延對(duì)比如下):
從上圖可以看出穆碎,存儲(chǔ)引擎優(yōu)化后時(shí)間延遲進(jìn)一步降低并趨于平穩(wěn)牙勘,從平均80ms到平均20ms左右,但是還是不完美所禀,有抖動(dòng)方面。
3. 服務(wù)器系統(tǒng)磁盤(pán)IO問(wèn)題解決
3.1 服務(wù)器IO硬件問(wèn)題背景
如第3節(jié)所述,當(dāng)wiredtiger大量淘汰數(shù)據(jù)后色徘,發(fā)現(xiàn)只要每秒磁盤(pán)寫(xiě)入量超過(guò)500M/s恭金,接下來(lái)的幾秒鐘內(nèi)util就會(huì)持續(xù)100%,w/s幾乎跌0褂策,于是開(kāi)始懷疑磁盤(pán)硬件存在缺陷横腿。
從上圖可以看出磁盤(pán)為nvMe的ssd盤(pán)颓屑,查看相關(guān)數(shù)據(jù)可以看出該盤(pán)IO性能很好,支持每秒2G寫(xiě)入耿焊,iops能達(dá)到2.5W/S揪惦,而我們線上的盤(pán)只能每秒寫(xiě)入最多500M。
3.2 服務(wù)器IO硬件問(wèn)題解決后性能對(duì)比
于是考慮把該分片集群的主節(jié)點(diǎn)全部遷移到另一款服務(wù)器搀别,該服務(wù)器也是ssd盤(pán)丹擎,io性能達(dá)到2G/s寫(xiě)入(注意:只遷移了主節(jié)點(diǎn),從節(jié)點(diǎn)還是在之前的IO-500M/s的服務(wù)器)歇父。遷移完成后蒂培,發(fā)現(xiàn)性能得到了進(jìn)一步提升,時(shí)延遲降低到2-4ms/s榜苫,三個(gè)不同業(yè)務(wù)層面看到的時(shí)延監(jiān)控如下圖所示:
從上圖時(shí)延可以看出护戳,遷移主節(jié)點(diǎn)到IO能力更好的機(jī)器后,時(shí)延進(jìn)一步降低到平均2-4ms垂睬。
雖然時(shí)延降低到了平均2-4ms媳荒,但是還是有很多幾十ms的尖刺,鑒于篇幅將在下一期分享大家原因驹饺,最終保存所有時(shí)延控制在5ms以?xún)?nèi)钳枕,并消除幾十ms的尖刺。
此外赏壹,nvme的ssd io瓶頸問(wèn)題原因鱼炒,經(jīng)過(guò)和廠商確認(rèn)分析,最終定位到是linux內(nèi)核版本不匹配引起蝌借,如果大家nvme ssd盤(pán)有同樣問(wèn)題昔瞧,記得升級(jí)linux版本到3.10.0-957.27.2.el7.x86_64版本,升級(jí)后nvme ssd的IO能力達(dá)到2G/s以上寫(xiě)入菩佑。
4. 總結(jié)及遺留問(wèn)題
通過(guò)mongodb服務(wù)層配置優(yōu)化自晰、存儲(chǔ)引擎優(yōu)化、硬件IO提升三方面的優(yōu)化后稍坯,該大流量寫(xiě)入集群的平均時(shí)延從之前的平均數(shù)百ms降低到了平均2-4ms酬荞,整體性能提升數(shù)十倍,效果明顯瞧哟。
但是袜蚕,從4.2章節(jié)優(yōu)化后的時(shí)延可以看出,集群偶爾還是會(huì)有抖動(dòng)绢涡,鑒于篇幅,下期會(huì)分享如果消除4.2章節(jié)中的時(shí)延抖動(dòng)遣疯,最終保持時(shí)間完全延遲控制在2-4ms雄可,并且無(wú)任何超過(guò)10ms的抖動(dòng)凿傅,敬請(qǐng)期待。
注意:文章中的一些優(yōu)化方法并不是一定適用于所有mongodb場(chǎng)景数苫,請(qǐng)根據(jù)實(shí)際業(yè)務(wù)場(chǎng)景和硬件資源能力進(jìn)行優(yōu)化聪舒,而不是按部就班。