ceph運(yùn)營(yíng)中經(jīng)常會(huì)遇到slow request妙真,簡(jiǎn)單總結(jié)下定位這些問(wèn)題的方法及思路:
- 先查看集群的cpu負(fù)載缴允,相關(guān)命令top;
- 再查看磁盤(pán)的負(fù)責(zé)珍德,相關(guān)命令iostat练般、dstat等;
- 再查看網(wǎng)絡(luò)方面的負(fù)載菱阵,相關(guān)命令netstat踢俄;netstat的輸出結(jié)果里面要關(guān)注下Recv-Q Send-Q這兩個(gè)queue的大小,如果Recv-Q比較大的話說(shuō)明收的較慢晴及,Send-Q比較大說(shuō)明發(fā)的慢都办;
- 使用ceph --admin-daemon asok perf dump等命令來(lái)分析,輸出結(jié)果里面需要主要關(guān)注wait那一項(xiàng)虑稼,比如:
"throttle-objecter_bytes": {
"val": 35652380,
"max": 838860800,
"get_started": 17898,
"get": 2182128,
"get_sum": 4255599802310,
"get_or_fail_fail": 17898,
"get_or_fail_success": 2164230,
"take": 0,
"take_sum": 0,
"put": 1248942,
"put_sum": 4255564149930,
"wait": {
"avgcount": 17897,
"sum": 4899.822857281
}
},
"throttle-objecter_ops": {
"val": 15,
"max": 1024,
"get_started": 0,
"get": 2182128,
"get_sum": 2182128,
"get_or_fail_fail": 0,
"get_or_fail_success": 2182128,
"take": 0,
"take_sum": 0,
"put": 2182113,
"put_sum": 2182113,
"wait": {
"avgcount": 0,
"sum": 0.000000000
}
},
這種情況下可能需要調(diào)大objecter_inflight_op_bytes琳钉,但需要注意的是,調(diào)大這個(gè)配置項(xiàng)的話蛛倦,會(huì)導(dǎo)致rgw使用內(nèi)存增加歌懒。