1.先看日志信息
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
Starting automatic rewriting of AOF on 107914% growth
Background append only file rewriting started by pid 4143
AOF rewrite child asks to stop sending diffs.
Parent agreed to stop sending diffs. Finalizing AOF...
Concatenating 0.00 MB of AOF diff received from parent.
SYNC append only file rewrite performed
AOF rewrite: 2 MB of memory used by copy-on-write
Background AOF rewrite terminated with success
Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
Background AOF rewrite finished successfully
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
2.看一下自己Redis相關(guān)的配置
appendonly yes # 開啟aof
appendfsync everysec # 設(shè)置aof策略,每秒寫入一次
aof-use-rdb-preamble yes #開啟aof rdb混合使用
aof-load-truncated yes # redis啟動加載aof文件時倦春,忽略掉錯誤的命令户敬,盡可能多的加載可用命令
aof-rewrite-incremental-fsync yes # 分批刷入aof文件,可以有效利用順序IO
no-appendfsync-on-rewrite no # 保證數(shù)據(jù)盡可能少的丟失,設(shè)置為no睁本,最多丟失2s數(shù)據(jù)尿庐,設(shè)置為yes,最多會丟失30s數(shù)據(jù)
auto-aof-rewrite-min-size 67108864 # aof文件大小 64M
auto-aof-rewrite-percentage 100 #(aof_current_size-aof_base_size)/aof_base_size與100%比較
3.查看監(jiān)控并分析問題原因
圖一結(jié)合監(jiān)控分析可以看到圖一aof_delayed_fsync參數(shù)一致在持續(xù)增加呢堰,代表著aof在持續(xù)發(fā)生阻塞的情況
圖二可以看到已經(jīng)滿足上述的aof進(jìn)行rewrite的條件抄瑟,aof在頻繁的進(jìn)行rewrite操作
原因:
1.客戶端是用redis來做隊列,又怕數(shù)據(jù)丟失枉疼,選擇了aof做持久化皮假,隊列中的key還都很大鞋拟,基本上都是30k左右的值,雖然監(jiān)控上看內(nèi)存的值是沒有很大惹资。
2.大量的大命令都堆積到了aof文件中贺纲,aof文件很快就達(dá)到了rewrite的觸發(fā)條件,導(dǎo)致redis在不斷的進(jìn)行rewrite褪测。
3.因為設(shè)置了no-appendfsync-on-rewrite no猴誊,所以在rewrite期間,是不允許追加fsync的侮措,再加上頻繁的rewrite操作懈叹,導(dǎo)致了aof的阻塞。
no-appendfsync-on-rewrite no / appendfsync everysec
每秒落盤一次分扎,實際上不是1s澄成,看下邊的邏輯圖,主線程在對比時間判斷的是2s笆包,此時最多丟失2s數(shù)據(jù)no-appendfsync-on-rewrite yes / appendfsync everysec = appendfsync no
那么緩存中的數(shù)據(jù)只能等到linux的sync執(zhí)行的時候才會落盤,默認(rèn)間隔30s略荡,此時最多丟失30s數(shù)據(jù)
4.解決方案
對于redis來說庵佣,最好還是用來做緩存,用來做隊列汛兜,還要使用aof來持久化是不建議的巴粪,建議將redis做隊列的功能,更改為用kafka/rabbitmq/rocketmq等專業(yè)的隊列中間件來實現(xiàn)粥谬,若想繼續(xù)使用redis做的話肛根,請關(guān)閉aof持久化,并減小參數(shù)值漏策,避免redis的阻塞派哲,至于數(shù)據(jù)丟失問題,可以外加數(shù)據(jù)補(bǔ)償機(jī)制掺喻,如果redis宕機(jī)等以外情況發(fā)生可以自行重推數(shù)據(jù).