【發(fā)現(xiàn)問題】
運(yùn)維人員收到zabbix告警說codis集群usa-9節(jié)點(diǎn)所在機(jī)器屯远,原swap 4G 空間只剩下80k。其立即登錄該機(jī)器增加了約6G的swap空間徐勃。
Lack of free swap space on USARN-H-Host-Linux-172.24.19.59: PROBLEM (Value: 80 KB) 2019.11.13 14:47:34
接著收到某個應(yīng)用的500錯誤告警词裤,錯誤堆棧里提到codis該usa-9節(jié)點(diǎn) “JedisConnectionException: Unexpected end of stream”症杏,再次登錄usa-9拿到 linux的系統(tǒng)日志如下:
Nov 13 14:56:19 vm-centos6 kernel: codis-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Nov 13 14:56:19 vm-centos6 kernel: codis-server cpuset=/ mems_allowed=0
Nov 13 14:56:19 vm-centos6 kernel: Pid: 4492, comm: codis-server Not tainted 2.6.32-504.el6.x86_64 #1
Nov 13 14:56:19 vm-centos6 kernel: Call Trace:
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81127300>] ? dump_header+0x90/0x1b0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff814470e1>] ? sock_aio_read+0x1a1/0x1b0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff810a2bbb>] ? __remove_hrtimer+0x3b/0xb0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811d68e0>] ? ep_send_events_proc+0x0/0x110
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8152d375>] ? page_fault+0x25/0x30
Nov 13 14:56:19 vm-centos6 kernel: Mem-Info:
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA per-cpu:
Nov 13 14:56:19 vm-centos6 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 1: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 2: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 3: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA32 per-cpu:
Nov 13 14:56:19 vm-centos6 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 Normal per-cpu:
Nov 13 14:56:19 vm-centos6 kernel: CPU 0: hi: 186, btch: 31 usd: 35
Nov 13 14:56:19 vm-centos6 kernel: CPU 1: hi: 186, btch: 31 usd: 3
Nov 13 14:56:19 vm-centos6 kernel: CPU 2: hi: 186, btch: 31 usd: 59
Nov 13 14:56:19 vm-centos6 kernel: CPU 3: hi: 186, btch: 31 usd: 184
Nov 13 14:56:19 vm-centos6 kernel: active_anon:4040530 inactive_anon:451920 isolated_anon:0
Nov 13 14:56:19 vm-centos6 kernel: active_file:3492 inactive_file:4985 isolated_file:0
Nov 13 14:56:19 vm-centos6 kernel: unevictable:0 dirty:2037 writeback:1387 unstable:0
Nov 13 14:56:19 vm-centos6 kernel: free:35841 slab_reclaimable:2943 slab_unreclaimable:7727
Nov 13 14:56:19 vm-centos6 kernel: mapped:296 shmem:73 pagetables:13459 bounce:0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA free:15668kB min:52kB low:64kB high:76kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15276kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 13 14:56:19 vm-centos6 kernel: lowmem_reserve[]: 0 3000 18150 18150
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA32 free:71556kB min:11160kB low:13948kB high:16740kB active_anon:2063844kB inactive_anon:519380kB active_file:656kB inactive_file:1132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:660kB writeback:0kB mapped:120kB shmem:0kB slab_reclaimable:628kB slab_unreclaimable:68kB kernel_stack:0kB pagetables:204kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2688 all_unreclaimable? yes
Nov 13 14:56:19 vm-centos6 kernel: lowmem_reserve[]: 0 0 15150 15150
Nov 13 14:56:19 vm-centos6 kernel: Node 0 Normal free:56140kB min:56364kB low:70452kB high:84544kB active_anon:14098276kB inactive_anon:1288300kB active_file:13312kB inactive_file:18808kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15513600kB mlocked:0kB dirty:7488kB writeback:5548kB mapped:1064kB shmem:292kB slab_reclaimable:11144kB slab_unreclaimable:30840kB kernel_stack:2184kB pagetables:53632kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:52256 all_unreclaimable? yes
Nov 13 14:56:19 vm-centos6 kernel: lowmem_reserve[]: 0 0 0 0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA: 1*4kB 2*8kB 2*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15668kB
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA32: 2308*4kB 391*8kB 210*16kB 146*32kB 62*64kB 37*128kB 26*256kB 22*512kB 18*1024kB 3*2048kB 0*4096kB = 71592kB
Nov 13 14:56:19 vm-centos6 kernel: Node 0 Normal: 756*4kB 706*8kB 494*16kB 330*32kB 170*64kB 89*128kB 21*256kB 3*512kB 0*1024kB 0*2048kB 0*4096kB = 56320kB
Nov 13 14:56:19 vm-centos6 kernel: 65997 total pagecache pages
Nov 13 14:56:19 vm-centos6 kernel: 57354 pages in swap cache
Nov 13 14:56:19 vm-centos6 kernel: Swap cache stats: add 46466585, delete 46409231, find 15690882/21869217
Nov 13 14:56:19 vm-centos6 kernel: Free swap = 0kB
Nov 13 14:56:19 vm-centos6 kernel: Total swap = 4063228kB
Nov 13 14:56:19 vm-centos6 kernel: 4718576 pages RAM
Nov 13 14:56:19 vm-centos6 kernel: 117970 pages reserved
Nov 13 14:56:19 vm-centos6 kernel: 9305 pages shared
Nov 13 14:56:19 vm-centos6 kernel: 4551285 pages non-shared
Nov 13 14:56:19 vm-centos6 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Nov 13 14:56:19 vm-centos6 kernel: [ 514] 0 514 2729 1 1 -17 -1000 udevd
Nov 13 14:56:19 vm-centos6 kernel: [ 837] 0 837 2729 1 1 -17 -1000 udevd
Nov 13 14:56:19 vm-centos6 kernel: [ 1272] 0 1272 62838 313 3 0 0 vmtoolsd
Nov 13 14:56:19 vm-centos6 kernel: [ 1310] 0 1310 15023 6 2 0 0 VGAuthService
Nov 13 14:56:19 vm-centos6 kernel: [ 1386] 0 1386 23283 40 0 -17 -1000 auditd
Nov 13 14:56:19 vm-centos6 kernel: [ 1406] 0 1406 62464 692 2 0 0 rsyslogd
Nov 13 14:56:19 vm-centos6 kernel: [ 1436] 0 1436 4589 36 0 0 0 irqbalance
Nov 13 14:56:19 vm-centos6 kernel: [ 1452] 32 1452 4744 18 2 0 0 rpcbind
Nov 13 14:56:19 vm-centos6 kernel: [ 1472] 29 1472 5837 2 0 0 0 rpc.statd
Nov 13 14:56:19 vm-centos6 kernel: [ 1589] 81 1589 5394 47 2 0 0 dbus-daemon
Nov 13 14:56:19 vm-centos6 kernel: [ 1621] 0 1621 1020 1 0 0 0 acpid
Nov 13 14:56:19 vm-centos6 kernel: [ 1631] 68 1631 9521 162 2 0 0 hald
Nov 13 14:56:19 vm-centos6 kernel: [ 1632] 0 1632 5099 2 1 0 0 hald-runner
Nov 13 14:56:19 vm-centos6 kernel: [ 1664] 0 1664 5629 2 3 0 0 hald-addon-inpu
Nov 13 14:56:19 vm-centos6 kernel: [ 1674] 68 1674 4501 2 0 0 0 hald-addon-acpi
Nov 13 14:56:19 vm-centos6 kernel: [ 1689] 0 1689 2728 1 3 -17 -1000 udevd
Nov 13 14:56:19 vm-centos6 kernel: [ 1695] 0 1695 96534 43 1 0 0 automount
Nov 13 14:56:19 vm-centos6 kernel: [ 1823] 0 1823 20332 28 0 0 0 master
Nov 13 14:56:19 vm-centos6 kernel: [ 1846] 89 1846 20398 24 2 0 0 qmgr
Nov 13 14:56:19 vm-centos6 kernel: [ 1849] 0 1849 28661 2 3 0 0 abrtd
Nov 13 14:56:19 vm-centos6 kernel: [ 1862] 0 1862 29342 24 2 0 0 crond
Nov 13 14:56:19 vm-centos6 kernel: [ 1876] 0 1876 5394 7 0 0 0 atd
Nov 13 14:56:19 vm-centos6 kernel: [ 1889] 0 1889 19879 2 0 0 0 login
Nov 13 14:56:19 vm-centos6 kernel: [ 1891] 0 1891 1016 2 3 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1893] 0 1893 1016 2 0 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1895] 0 1895 1016 2 2 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1897] 0 1897 1016 2 0 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1899] 0 1899 1016 2 1 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1996] 0 1996 521256 57 0 0 0 console-kit-dae
Nov 13 14:56:19 vm-centos6 kernel: [ 2063] 0 2063 27076 2 1 0 0 bash
Nov 13 14:56:19 vm-centos6 kernel: [29526] 0 29526 25812 47 1 0 0 ping
Nov 13 14:56:19 vm-centos6 kernel: [ 4492] 0 4492 6354569 4432393 1 0 0 codis-server
Nov 13 14:56:19 vm-centos6 kernel: [25500] 0 25500 133214 139 0 0 0 SFTMonitor
Nov 13 14:56:19 vm-centos6 kernel: [25501] 0 25501 222155 168 1 0 0 SFTServer
Nov 13 14:56:19 vm-centos6 kernel: [19596] 0 19596 16672 22 2 -17 -1000 sshd
Nov 13 14:56:19 vm-centos6 kernel: [26159] 500 26159 4441 10 3 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26161] 500 26161 4441 132 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26162] 500 26162 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26163] 500 26163 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26164] 500 26164 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26165] 500 26165 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26166] 500 26166 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26167] 500 26167 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26168] 500 26168 4441 49 3 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26169] 500 26169 4441 49 1 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26170] 500 26170 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26171] 500 26171 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26172] 500 26172 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26174] 500 26174 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26175] 500 26175 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [23868] 38 23868 7683 44 2 0 0 ntpd
Nov 13 14:56:19 vm-centos6 kernel: [ 3221] 89 3221 20352 231 2 0 0 pickup
Nov 13 14:56:19 vm-centos6 kernel: [ 3463] 0 3463 24592 291 2 0 0 sshd
Nov 13 14:56:19 vm-centos6 kernel: [ 3466] 0 3466 27087 145 0 0 0 bash
Nov 13 14:56:19 vm-centos6 kernel: [ 3490] 0 3490 26297 51 0 0 0 dd
Nov 13 14:56:19 vm-centos6 kernel: Out of memory: Kill process 4492 (codis-server) score 941 or sacrifice child
Nov 13 14:56:19 vm-centos6 kernel: Killed process 4492, UID 0, (codis-server) total-vm:25418276kB, anon-rss:17729176kB, file-rss:396kB
//這是運(yùn)維收到機(jī)器原4G swap只剩80k告警時,立即去增加了約6G swap空間產(chǎn)生的日志伪节,但redis進(jìn)程已經(jīng)在20秒前被kill掉了
Nov 13 14:56:39 vm-centos6 kernel: Adding 5999996k swap on /home/swap/swapfile. Priority:-2 extents:8 across:6499708k
【分析問題】
redis實(shí)例被系統(tǒng)內(nèi)核關(guān)閉掉了光羞,系統(tǒng)日志最重要的就是一頭一尾兩句:
codis-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Killed process 4492, UID 0, (codis-server) total-vm:25418276kB, anon-rss:17729176kB, file-rss:396kB
redis進(jìn)程申請4K內(nèi)存空間時(order=0所以是2^0頁也就是4k),系統(tǒng)內(nèi)存不足觸發(fā)了oom-killer怀大,最后被選中kill的就是redis進(jìn)程自己纱兑。
參考 http://www.reibang.com/p/c2e7d36829af 的內(nèi)存結(jié)構(gòu),mask(0x201da)的最低2位 "10"=2是會Allocate from ZONE_HIGHMEM化借,但在64位系統(tǒng)中是沒有highmem區(qū)的潜慎,實(shí)際是從normal區(qū)請求內(nèi)存。從日志得知“Node 0 Normal free:56140kB min:56364kB”蓖康,normal區(qū)當(dāng)前可用56140kB小于最低限制56364kB铐炫,由此觸發(fā)的oom-killer。
codis-monitor監(jiān)控對該節(jié)點(diǎn)的內(nèi)存使用告警閾值為65%蒜焊,maxmemory=12G倒信,所以在K-V使用內(nèi)存到 12G * 65% = 7.8G 時會發(fā)出告警。但節(jié)點(diǎn)被kill時并沒有發(fā)出告警泳梆,也就是說K-V使用的內(nèi)存還不到 7.8G鳖悠,機(jī)器總內(nèi)存 18G swap區(qū)當(dāng)時 4G唆迁,沒有別的什么進(jìn)程能消耗內(nèi)存。
從日志可以看到“anon-rss:17729176kB”竞穷,redis節(jié)點(diǎn)被關(guān)閉時占用內(nèi)存約16.9G,一邊說redis占用內(nèi)存16.9G耗光了內(nèi)存導(dǎo)致OOM鳞溉,一邊說redis的K-V數(shù)據(jù)量不超過7.8G瘾带。
于是調(diào)查anon-rss的含義,RSS是說從操作系統(tǒng)角度來看分配給進(jìn)程的內(nèi)存熟菲。又核對codis-monitor的65%是怎么設(shè)置的看政,原來是針對info命令打印出來的 used_memory 實(shí)際K-V數(shù)據(jù)所使用內(nèi)存,info命令還有 used_memory_rss 字段表示操作系統(tǒng)分配給redis所占用的內(nèi)存抄罕,used_memory_rss 能大于 used_memory 表示內(nèi)存碎片率即另一個字段 mem_fragmentation_ratio允蚣。
至此理解閾值告警的used_memory字段和系統(tǒng)層面分配的used_memory_rss字段后,可以得知這兩個現(xiàn)象描述的是不同維度的事情呆贿,初步猜測是內(nèi)存碎片過大嚷兔,導(dǎo)致redis總占用內(nèi)存超過機(jī)器內(nèi)存總量,先于K-V存儲數(shù)據(jù)達(dá)到告警閾值做入。
【驗(yàn)證問題】
由于該usa-9節(jié)點(diǎn)redis已重啟無法追溯問題現(xiàn)場冒晰,于是遍歷了usa集群的其他redis節(jié)點(diǎn)來驗(yàn)證初步猜測。
1)usa-2節(jié)點(diǎn)
[root@usa-idc-micen-codis-app2 ~]# top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4498 root 20 0 26.9g 17g 884 S 4.3 96.9 23255:49 /opt/xyz/codis202/bin/codis-server *:8998
[root@usa-idc-micen-codis-app2 ~]# free -m
total used free shared buffers cached
Mem: 17971 17784 187 0 8 19
-/+ buffers/cache: 17755 215
Swap: 3967 2553 1414
xxx.xxx.xxx.xxx:8998> info
# Memory
used_memory_human:6.63G
used_memory_rss_human:17.00G
mem_fragmentation_ratio:2.57
2)usa-4節(jié)點(diǎn)
[root@usa-idc-micen-codis-app4 log]# top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9297 root 20 0 11.2g 10g 1076 S 3.0 59.5 1780:23 /opt/xyz/codis202/bin/codis-server *:8998
[root@usa-idc-micen-codis-app4 log]# free -m
total used free shared buffers cached
Mem: 17971 17751 219 0 138 5184
-/+ buffers/cache: 12429 5542
Swap: 3967 631 3336
xxx.xxx.xxx.xxx:8998> info
# Memory
used_memory_human:7.63G
used_memory_rss_human:10.44G
mem_fragmentation_ratio:1.37
3)usa-1節(jié)點(diǎn)
[root@usa-idc-micen-codis-app1 ~]# top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4617 root 20 0 29.8g 15g 876 S 3.0 89.8 22864:35 /opt/xyz/codis202/bin/codis-server *:8998
[root@usa-idc-micen-codis-app1 ~]# free -m
total used free shared buffers cached
Mem: 17971 17813 158 0 30 97
-/+ buffers/cache: 17685 285
Swap: 11780 4239 7541
xxx.xxx.xxx.xxx:8998> info
# Memory
used_memory_human:6.61G
used_memory_rss_human:15.74G
mem_fragmentation_ratio:2.38
總結(jié):
1)usa-2節(jié)點(diǎn)狀況最接近OOM的usa-9節(jié)點(diǎn)竟块,K-V數(shù)據(jù)存儲6.63G但包含內(nèi)存碎片的總內(nèi)存占用17G壶运,內(nèi)存碎片驚人的達(dá)到了10G以上,相當(dāng)于存儲6G數(shù)據(jù)但浪費(fèi)10G內(nèi)存不可用浪秘,碎片率2.57遠(yuǎn)遠(yuǎn)超過業(yè)界建議的1.5蒋情。間接驗(yàn)證了usa-9節(jié)點(diǎn)是因?yàn)閮?nèi)存碎片過大,總占用內(nèi)存達(dá)到物理內(nèi)存上限耸携,申請新內(nèi)存頁失敗導(dǎo)致OOM棵癣。
2)top命令查看到的進(jìn)程常駐內(nèi)存RES,應(yīng)該就是info命令查看到的redis包含碎片的占用內(nèi)存used_memory_rss违帆,也即是內(nèi)核日志kill時的清理出來的內(nèi)存anon-rss浙巫。
3)除了usa-4節(jié)點(diǎn)狀況健康:碎片率1.37低于1.5、剩余內(nèi)存5G刷后、swap區(qū)幾乎沒用的畴,其他節(jié)點(diǎn)碎片量和碎片率過大、內(nèi)存剩余無幾尝胆、swap區(qū)大量使用丧裁。尤其是usa-2節(jié)點(diǎn)離OOM不遠(yuǎn),但其能在懸崖邊游走而沒有掉下去含衔,是因?yàn)樯厦嫣岬降腒-V存儲內(nèi)存65%閾值告警對redis所做的保護(hù)只讀不寫煎娇,讓包含碎片的總占用內(nèi)存沒有超過物理內(nèi)存總量二庵,但usa-9節(jié)點(diǎn)就沒這么好運(yùn)。
【解決問題】
1)保守治療就是讓redis節(jié)點(diǎn)不容易OOM缓呛。一是增加swap區(qū)加大物理內(nèi)存耗盡的容忍度催享,降低觸發(fā)oom-killer的機(jī)會;二是調(diào)低K-V存儲告警閾值從65%到60%讓保護(hù)提前生效哟绊,從而降低包含碎片的內(nèi)存占用總量超過物理內(nèi)存的風(fēng)險因妙。
2)有效治療就是清理內(nèi)存碎片,redis4.0之后具備了清理能力票髓,但目前使用的redis3.2只能通過關(guān)機(jī)重啟攀涵,加入新機(jī)器節(jié)點(diǎn)逐步遷移slot,遷移完成之后關(guān)閉重啟舊節(jié)點(diǎn)洽沟。困難有三個:一是缺乏自動運(yùn)維手段以故,逐個slot手工遷移費(fèi)時;二是之前缺乏項(xiàng)目組對redis的使用約束裆操,里面會存放有大key怒详,遷移這些slot時帶來的停頓項(xiàng)目組可不會接受;三是缺乏使用約束跷车,項(xiàng)目組很可能把redis當(dāng)db使用棘利,這些被重度使用的slot所在節(jié)點(diǎn)會有master-slave保證高可用(幾乎都不開持久化),如果遷移slot時在缺乏新的slave備份的情況下出現(xiàn)redis掛掉數(shù)據(jù)丟失朽缴,項(xiàng)目組完全無法接受善玫。
3)長期治療就是降低內(nèi)存碎片,要求項(xiàng)目組對使用到的所有key補(bǔ)上TTL密强,一小時或一星期都行茅郎,讓過期key能被清理,從而降低內(nèi)存使用量和內(nèi)存碎片量或渤。但完全沒項(xiàng)目組認(rèn)領(lǐng)的key系冗,只能暫時留在redis內(nèi),后期用腳本遍歷對沒有TTL的key補(bǔ)默認(rèn)TTL薪鹦。
【思考問題】
1)內(nèi)存碎片是如何產(chǎn)生的掌敬?
可以確定的是頻繁的對key set新值。比如整數(shù)集合(intSet)數(shù)據(jù)結(jié)構(gòu)池磁,假設(shè)以連續(xù)int16空間存儲多個小整數(shù)奔害,一旦加入一個2字節(jié)以上的大整數(shù)時,所有小整數(shù)都會升級成int32或int64的空間地熄,之后再把這個大整數(shù)刪除华临,所有小整數(shù)可不會降級回到int16,于是有一半以上的內(nèi)存空間被浪費(fèi)了端考。另外假設(shè)set keyA 1m_str雅潭,之后再set keyA int_val揭厚,空余出來的內(nèi)存是否能釋放,有待驗(yàn)證扶供。
【待思考項(xiàng)】
1)redis啟用持久化時筛圆,fork子進(jìn)程需要同redis進(jìn)程相等的內(nèi)存空間(實(shí)際上copy-on-write不會真使用完全一樣多的內(nèi)存空間),如果只分配45%物理內(nèi)存給redis進(jìn)程椿浓,剩余留給持久化子進(jìn)程可不劃算顽染。所以推薦的是打開內(nèi)核參數(shù) vm.overcommit_memory = 1,讓分配內(nèi)存空間給fork子進(jìn)程時轰绵,由swap區(qū)來擔(dān)保分配。redis啟動日志也能看到這條警告:
# WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
vm.overcommit_memory 默認(rèn)是0尼荆,也就是說redis進(jìn)程申請內(nèi)存只能從物理剩余內(nèi)存中申請左腔,不會去使用swap區(qū)。那top命令查看usa-2節(jié)點(diǎn)顯示的VIRT=26.9g是怎么計(jì)算出來的捅儒?