一祟昭、關(guān)于netstat和ss
ss是Socket Statistics的縮寫(xiě)宫莱。
netstat命令大家肯定已經(jīng)很熟悉了,但是在2001年的時(shí)候netstat 1.42版本之后就沒(méi)更新了,之后取代的工具是ss命令专挪,是iproute2 package的一員。
rpm -ql iproute | grep ss
/usr/sbin/ss
netstat的替代工具是nstat片排,當(dāng)然netstat的大部分功能ss也可以替代
ss可以顯示跟netstat類(lèi)似的信息寨腔,但是速度卻比netstat快很多,netstat是基于/proc/net/tcp獲取 TCP socket 的相關(guān)統(tǒng)計(jì)信息率寡,用strace跟蹤一下netstat查詢tcp的連接迫卢,會(huì)看到他open的是/proc/net/tcp的信息。
ss快的秘密就在于它利用的是TCP協(xié)議的tcp_diag模塊冶共,而且是從內(nèi)核直接讀取信息乾蛤,當(dāng)內(nèi)核不支持 tcp_diag 內(nèi)核模塊時(shí),會(huì)回退到 /proc/net/tcp 模式捅僵。
/proc/net/snmp 存放的是系統(tǒng)啟動(dòng)以來(lái)的累加值家卖,netstat -s 讀取它
/proc/net/tcp 是存放目前活躍的tcp連接的統(tǒng)計(jì)值,連接斷開(kāi)統(tǒng)計(jì)值清空庙楚, ss -it 讀取它
二上荡、ss用法舉例
ss 查看Buffer窗口
-m, --memory //查看每個(gè)連接的buffer使用情況
Show socket memory usage. The output format is:
skmem:(r<rmem_alloc>,rb<rcv_buf>,t<wmem_alloc>,tb<snd_buf>,
f<fwd_alloc>,w<wmem_queued>,o<opt_mem>,
bl<back_log>,d<sock_drop>)
<rmem_alloc>
the memory allocated for receiving packet
<rcv_buf>
the total memory can be allocated for receiving
packet
<wmem_alloc>
the memory used for sending packet (which has been
sent to layer 3)
<snd_buf>
the total memory can be allocated for sending
packet
<fwd_alloc>
the memory allocated by the socket as cache, but
not used for receiving/sending packet yet. If need
memory to send/receive packet, the memory in this
cache will be used before allocate additional
memory.
<wmem_queued>
The memory allocated for sending packet (which has
not been sent to layer 3)
<ropt_mem>
The memory used for storing socket option, e.g.,
the key for TCP MD5 signature
<back_log>
The memory used for the sk backlog queue. On a
process context, if the process is receiving
packet, and a new packet is received, it will be
put into the sk backlog queue, so it can be
received by the process immediately
<sock_drop>
the number of packets dropped before they are de-
multiplexed into the socket
--memory/-m : 展示buffer窗口的大小
# ss -m | xargs -L 1 | grep "ESTAB" | awk '{ if($3>0 || $4>0) print $0 }'
tcp ESTAB 0 31 10.97.137.1:7764 10.97.137.2:41019 skmem:(r0,rb7160692,t0,tb87040,f1792,w2304,o0,bl0)
tcp ESTAB 0 193 ::ffff:10.97.137.1:sdo-tls ::ffff:10.97.137.2:55545 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0)
tcp ESTAB 0 65 ::ffff:10.97.137.1:splitlock ::ffff:10.97.137.2:47796 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0)
tcp ESTAB 0 80 ::ffff:10.97.137.1:informer ::ffff:10.97.137.3:49279 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0)
tcp ESTAB 0 11 ::ffff:10.97.137.1:acp-policy ::ffff:10.97.137.2:41607 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0)
# ss -m -n | xargs -L 1 | grep "tcp EST" | grep "t[1-9]"
tcp ESTAB 0 281 10.97.169.173:32866 10.97.170.220:3306 skmem:(r0,rb4619516,t2304,tb87552,f1792,w2304,o0,bl0)
如上圖,tb指可分配的發(fā)送buffer大小醋奠,不夠還可以動(dòng)態(tài)調(diào)整(應(yīng)用沒(méi)有寫(xiě)死的話)榛臼,w[The memory allocated for sending packet (which has not been sent to layer 3)]已經(jīng)預(yù)分配好了的size伊佃,t[the memory used for sending packet (which has been sent to layer 3)] , 似乎 w總是等于大于t?
對(duì)172.16.210.17和172.16.160.1之間的帶寬限速50MB后觀察(帶寬限制后沛善,發(fā)送buffer就很容易被撐滿了):
$ss -m | xargs -L 1 | grep "tcp EST" | awk '{ if($3>0 || $4>0) print $0 }'
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
tcp ESTAB 1431028 0 172.16.210.17:30082 172.16.160.1:4847 skmem:(r2066432,rb2135508,t0,tb46080,f2048,w0,o0,bl0,d72)
tcp ESTAB 1195628 0 172.16.210.17:30086 172.16.160.1:4847 skmem:(r1742848,rb1915632,t8,tb46080,f190464,w0,o0,bl0,d187)
tcp ESTAB 86416 0 172.16.210.17:40470 172.16.160.1:4847 skmem:(r127232,rb131072,t0,tb46080,f3840,w0,o0,bl0,d16)
tcp ESTAB 1909826 0 172.16.210.17:40476 172.16.160.1:4847 skmem:(r2861568,rb2933688,t2,tb46080,f26112,w0,o0,bl0,d15)
tcp ESTAB 758312 0 172.16.210.17:40286 172.16.160.1:4847 skmem:(r1124864,rb1177692,t0,tb46080,f1536,w0,o0,bl0,d17)
tcp ESTAB 2238720 0 172.16.210.17:40310 172.16.160.1:4847 skmem:(r3265280,rb3334284,t0,tb46080,f3328,w0,o0,bl0,d30)
tcp ESTAB 88172 0 172.16.210.17:40508 172.16.160.1:4847 skmem:(r128000,rb131072,t0,tb46080,f3072,w0,o0,bl0,d16)
tcp ESTAB 87700 0 172.16.210.17:41572 172.16.160.1:4847 skmem:(r130560,rb131072,t0,tb46080,f512,w0,o0,bl0,d10)
tcp ESTAB 4147293 0 172.16.210.17:40572 172.16.160.1:4847 skmem:(r6064896,rb6291456,t2,tb46080,f75008,w0,o0,bl0,d27)
tcp ESTAB 1610940 0 172.16.210.17:30100 172.16.160.1:4847 skmem:(r2358784,rb2533092,t6,tb46080,f82432,w0,o0,bl0,d304)
tcp ESTAB 4216156 0 172.16.210.17:30068 172.16.160.1:4847 skmem:(r6091008,rb6291456,t0,tb46080,f3840,w0,o0,bl0,d112)
tcp ESTAB 87468 0 172.16.210.17:40564 172.16.160.1:4847 skmem:(r127232,rb131072,t0,tb46080,f3840,w0,o0,bl0,d16)
tcp ESTAB 0 84608 172.16.210.17:3306 10.100.7.27:43114 skmem:(r0,rb65536,t8352,tb131072,f75648,w92288,o0,bl0,d0)
tcp ESTAB 4141872 0 172.16.210.17:40584 172.16.160.1:4847 skmem:(r6050560,rb6291456,t2,tb46080,f19712,w0,o0,bl0,d14)
$ss -itn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 965824 0 172.16.210.17:19310 172.16.160.1:4847
cubic wscale:9,7 rto:215 rtt:14.405/0.346 ato:160 mss:1440 rcvmss:1460 advmss:1460 cwnd:10 bytes_acked:1324584 bytes_received:2073688144 segs_out:91806 segs_in:1461520 data_segs_out:4824 data_segs_in:1456130 send 8.0Mbps lastsnd:545583 lastrcv:545276 lastack:13173 pacing_rate 16.0Mbps delivery_rate 8.9Mbps app_limited busy:9071ms rcv_rtt:1.303 rcv_space:164245 minrtt:1.293
ESTAB 0 84371 172.16.210.17:3306 10.100.7.147:59664
cubic wscale:7,7 rto:217 rtt:16.662/0.581 ato:40 mss:1448 rcvmss:976 advmss:1448 cwnd:375 ssthresh:19 bytes_acked:5087795046 bytes_received:1647 segs_out:3589314 segs_in:358086 data_segs_out:3589313 data_segs_in:8 send 260.7Mbps lastsnd:6 lastrcv:1177745 lastack:4 pacing_rate 312.8Mbps delivery_rate 32.9Mbps busy:1176476ms rwnd_limited:1717ms(0.1%) sndbuf_limited:159867ms(13.6%) unacked:37 retrans:0/214 rcv_space:14600 notsent:32055 minrtt:7.945
ESTAB 0 83002 172.16.210.17:3306 10.100.7.28:34066
cubic wscale:7,7 rto:215 rtt:14.635/0.432 ato:40 mss:1448 rcvmss:976 advmss:1448 cwnd:144 ssthresh:144 bytes_acked:972464708 bytes_received:1466 segs_out:671667 segs_in:94369 data_segs_out:671666 data_segs_in:8 send 114.0Mbps lastsnd:1 lastrcv:453365 lastack:1 pacing_rate 136.8Mbps delivery_rate 24.0Mbps busy:453493ms sndbuf_limited:200ms(0.0%) unacked:23 rcv_space:14600 notsent:49698 minrtt:9.937
ESTAB 1239616 0 172.16.210.17:41592 172.16.160.1:4847
cubic wscale:9,7 rto:216 rtt:15.754/0.775 ato:144 mss:1440 rcvmss:1460 advmss:1460 cwnd:10 bytes_acked:20321 bytes_received:1351071 segs_out:269 segs_in:1091 data_segs_out:76 data_segs_in:988 send 7.3Mbps lastsnd:339339 lastrcv:337401 lastack:10100 pacing_rate 14.6Mbps delivery_rate 1.0Mbps app_limited busy:1214ms rcv_rtt:227.156 rcv_space:55581 minrtt:11.38
ESTAB 3415748 0 172.16.210.17:30090 172.16.160.1:4847
cubic wscale:9,7 rto:202 rtt:1.667/0.011 ato:80 mss:1440 rcvmss:1460 advmss:1460 cwnd:10 bytes_acked:398583 bytes_received:613824362 segs_out:28630 segs_in:437621 data_segs_out:1495 data_segs_in:435792 send 69.1Mbps lastsnd:1179931 lastrcv:1179306 lastack:12149 pacing_rate 138.2Mbps delivery_rate 7.2Mbps app_limited busy:2520ms rcv_rtt:1.664 rcv_space:212976 minrtt:1.601
ESTAB 86480 0 172.16.210.17:41482 172.16.160.1:4847
cubic wscale:9,7 rto:215 rtt:14.945/1.83 ato:94 mss:1440 rcvmss:1460 advmss:1460 cwnd:10 bytes_acked:3899 bytes_received:93744 segs_out:73 segs_in:136 data_segs_out:20 data_segs_in:83 send 7.7Mbps lastsnd:449541 lastrcv:449145 lastack:19314 pacing_rate 15.4Mbps delivery_rate 964.2Kbps app_limited busy:296ms rcv_rtt:8561.27 rcv_space:14600 minrtt:11.948
ESTAB 89136 0 172.16.210.17:40480 172.16.160.1:4847
cubic wscale:9,7 rto:213 rtt:12.11/0.79 ato:196 mss:1440 rcvmss:1460 advmss:1460 cwnd:10 bytes_acked:2510 bytes_received:95652 segs_out:102 segs_in:168 data_segs_out:16 data_segs_in:81send 9.5Mbps lastsnd:1099067 lastrcv:1098659 lastack:13686 pacing_rate 19.0Mbps delivery_rate 1.0Mbps app_limited busy:199ms rcv_rtt:2438.63 rcv_space:14600 minrtt:11.178
ESTAB 0 84288 172.16.210.17:3306 10.100.7.26:51160
cubic wscale:7,7 rto:216 rtt:15.129/0.314 ato:40 mss:1448 rcvmss:976 advmss:1448 cwnd:157 ssthresh:157 bytes_acked:2954689465 bytes_received:1393 segs_out:2041403 segs_in:237797 data_segs_out:2041402 data_segs_in:8 send 120.2Mbps lastsnd:11 lastrcv:1103462 lastack:10 pacing_rate 144.2Mbps delivery_rate 31.3Mbps busy:1103503ms sndbuf_limited:3398ms(0.3%) unacked:24 retrans:0/7rcv_space:14600 notsent:49536 minrtt:9.551
推薦 -m -i 一起查看狀態(tài)航揉,比如 rcv_space 表示buffer達(dá)到過(guò)的最大水位:
rcv_space is the high water mark of the rate of the local application reading from the receive buffer during any RTT. This is used internally within the kernel to adjust sk_rcvbuf.
ss 查看擁塞窗口、RTO
//rto的定義金刁,不讓修改帅涂,每個(gè)ip的rt都不一樣,必須通過(guò)rtt計(jì)算所得, HZ 一般是1秒
#define TCP_RTO_MAX ((unsigned)(120*HZ))
#define TCP_RTO_MIN ((unsigned)(HZ/5)) //在rt很小的環(huán)境中計(jì)算下來(lái)RTO基本等于TCP_RTO_MIN
下面看到的rto和rtt單位都是毫秒尤蛮,一般rto最小為200ms媳友、最大為120秒:
# ss -itn | egrep "cwnd|rto"
ESTAB 0 165 [::ffff:192.168.0.174]:48074 [::ffff:192.168.0.173]:3306
cubic wscale:7,7 rto:201 rtt:0.24/0.112 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:1910206449 bytes_received:8847784416 segs_out:11273005 segs_in:22997562 data_segs_out:9818729 data_segs_in:13341573 send 482.7Mbps lastsnd:1 lastrcv:1 pacing_rate 963.8Mbps delivery_rate 163.2Mbps app_limited busy:2676463ms retrans:0/183 rcv_rtt:1.001 rcv_space:35904 minrtt:0.135
ESTAB 0 0 [::ffff:192.168.0.174]:48082 [::ffff:192.168.0.173]:3306
cubic wscale:7,7 rto:201 rtt:0.262/0.112 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:1852907381 bytes_received:8346503207 segs_out:10913962 segs_in:22169704 data_segs_out:9531411 data_segs_in:12796151 send 442.1Mbps lastsnd:2 lastack:2 pacing_rate 881.3Mbps delivery_rate 164.3Mbps app_limited busy:2736500ms retrans:0/260 rcv_rtt:1.042 rcv_space:31874 minrtt:0.133
-----
skmem:(r0,rb131072,t0,tb133632,f0,w0,o0,bl0,d0) cubic wscale:8,7 rto:233 rtt:32.489/2.99 ato:40 mss:1380 rcvmss:536 advmss:1460 cwnd:11 ssthresh:8 bytes_acked:99862366 bytes_received:2943 segs_out:78933 segs_in:23388 data_segs_out:78925 data_segs_in:81 send 3.7Mbps lastsnd:1735288 lastrcv:1735252 lastack:1735252 pacing_rate 4.5Mbps delivery_rate 2.9Mbps busy:370994ms retrans:0/6479 reordering:5 rcv_space:14600 minrtt:27.984
RTO計(jì)算算法
RTO的計(jì)算依賴于RTT值,或者說(shuō)一系列RTT值产捞。rto=f(rtt)
1.1. 在沒(méi)有任何rtt sample的時(shí)候醇锚,RTO <- TCP_TIMEOUT_INIT (1s)
多次重傳時(shí)同樣適用指數(shù)回避算法(backoff)增加RTO
1.2. 獲得第一個(gè)RTT sample后,
SRTT <- RTT
RTTVAR <- RTT/2
RTO <- SRTT + max(G, K * RTTVAR)
其中K=4, G表示timestamp的粒度(在CONFIG_HZ=1000時(shí)坯临,粒度為1ms)
1.3. 后續(xù)獲得更多RTT sample后焊唬,
RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R|
SRTT <- (1 - alpha) * SRTT + alpha * R
其中beta = 1/4, alpha = 1/8
1.4. Whenever RTO is computed, if it is less than 1 second, then the
RTO SHOULD be rounder up to 1 second.
1.5. A maximum value MAY be placed on RTO provided it is at least 60 seconds.
RTTVAR表示的是平滑過(guò)的平均偏差,SRTT表示的平滑過(guò)的RTT看靠。
這兩個(gè)值的具體含義會(huì)在后面介紹赶促,具體實(shí)現(xiàn)的時(shí)候進(jìn)一步的解釋。
以上是計(jì)算一個(gè)初始RTO值的過(guò)程挟炬,當(dāng)連續(xù)出現(xiàn)RTO超時(shí)后鸥滨,RTO值會(huì)用一個(gè)叫做指數(shù)回避的策略進(jìn)行調(diào)整。
從系統(tǒng)cache中查看 tcp_metrics item
$sudo ip tcp_metrics show | grep 100.118.58.7
100.118.58.7 age 1457674.290sec tw_ts 3195267888/5752641sec ago rtt 1000us rttvar 1000us ssthresh 361 cwnd 40 ----這兩個(gè)值對(duì)傳輸性能很重要
192.168.1.100 age 1051050.859sec ssthresh 4 cwnd 2 rtt 4805us rttvar 4805us source 192.168.0.174 ---這條記錄有問(wèn)題谤祖,緩存的ssthresh 4 cwnd 2都太小婿滓,傳輸速度一定慢
清除 tcp_metrics, sudo ip tcp_metrics flush all
關(guān)閉 tcp_metrics 功能,net.ipv4.tcp_no_metrics_save = 1
$ sudo ip tcp_metrics delete 100.118.58.7
每個(gè)連接的ssthresh默認(rèn)是個(gè)無(wú)窮大的值泊脐,但是內(nèi)核會(huì)cache對(duì)端ip上次的ssthresh(大部分時(shí)候兩個(gè)ip之間的擁塞窗口大小不會(huì)變)空幻,這樣大概率到達(dá)ssthresh之后就基本擁塞了,然后進(jìn)入cwnd的慢增長(zhǎng)階段容客。
ss分析重傳的包數(shù)量
通過(guò)抓取ss命令秕铛,可以分析出來(lái)重傳的包數(shù)量,然后將重傳的流的數(shù)量和重傳的包的數(shù)量按照對(duì)端ip:port的維度分段聚合缩挑,參考命令:
ss -itn | grep -v "Address:Port" | xargs -L 1 | grep retrans | awk '{gsub("retrans:.*/", "",$21); print $5, $21}' | awk '{arr[$1]+=$2} END {for (i in arr) {print i,arr[i]}}' | sort -rnk 2
xargs -L 1 每一行處理一次但两,但是這個(gè)行如果是空格、tab結(jié)尾供置,那么會(huì)被認(rèn)為是連續(xù)行谨湘,跟下一行合并
高版本Linux內(nèi)核的話,可以用systemtap或者bcc來(lái)獲取每個(gè)連接的重傳包以及發(fā)生重傳的階段
當(dāng)前和最大全連接隊(duì)列確認(rèn)
$ss -lt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.1:10248 *:*
LISTEN 0 128 *:2376 *:*
LISTEN 0 128 127.0.0.1:10249 *:*
LISTEN 0 128 *:7337 *:*
LISTEN 0 128 *:10250 *:*
LISTEN 0 128 11.163.187.44:7946 *:*
LISTEN 0 128 127.0.0.1:55631 *:*
LISTEN 0 128 *:10256 *:*
LISTEN 0 10 *:6640 *:*
LISTEN 0 128 127.0.0.1:vmware-fdm *:*
LISTEN 0 128 11.163.187.44:vmware-fdm *:*
LISTEN 0 128 *:ssh *:*
LISTEN 0 10 127.0.0.1:15772 *:*
LISTEN 0 10 127.0.0.1:15776 *:*
LISTEN 0 10 127.0.0.1:19777 *:*
LISTEN 0 10 11.163.187.44:15778 *:*
LISTEN 0 128 *:tr-rsrb-p2 *:*
三 netstat定位性能案例
netstat 和 ss 都是小工具,但是在網(wǎng)絡(luò)性能紧阔、異常的窺探方面真的是神器坊罢。
下面案例通過(guò)netstat很快就發(fā)現(xiàn)為什么系統(tǒng)總是壓不上去了(主要是快速定位到一個(gè)長(zhǎng)鏈條的服務(wù)調(diào)用體系中哪個(gè)節(jié)點(diǎn)碰到瓶頸了)
netstat 命令
netstat跟ss命令一樣也能看到Send-Q、Recv-Q這些狀態(tài)信息擅耽,不過(guò)如果這個(gè)連接不是Listen狀態(tài)的話活孩,Recv-Q就是指收到的數(shù)據(jù)還在緩存中,還沒(méi)被進(jìn)程讀取乖仇,這個(gè)值就是還沒(méi)被進(jìn)程讀取的 bytes憾儒;而 Send 則是發(fā)送隊(duì)列中沒(méi)有被遠(yuǎn)程主機(jī)確認(rèn)的 bytes 數(shù)。
$netstat -tn
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp0 0 server:8182 client-1:15260 SYN_RECV
tcp0 28 server:22 client-1:51708 ESTABLISHED
tcp0 0 server:2376 client-1:60269 ESTABLISHED
netstat -tn 看到的 Recv-Q 跟全連接半連接沒(méi)有關(guān)系乃沙,這里特意拿出來(lái)說(shuō)一下是因?yàn)槿菀赘?ss -lnt 的 Recv-Q 搞混淆起趾。
Recv-Q 和 Send-Q 的說(shuō)明
Recv-Q
Established: The count of bytes not copied by the user program connected to this socket.
Listening: Since Kernel 2.6.18 this column contains the current syn backlog.
Send-Q
Established: The count of bytes not acknowledged by the remote host.
Listening: Since Kernel 2.6.18 this column contains the maximum size of the syn backlog.
netstat 命令中的 Recv-Q:
如果 TCP 連接狀態(tài)處于 Established,Recv-Q 的數(shù)值表示接收緩沖區(qū)中還沒(méi)拷貝到應(yīng)用層的數(shù)據(jù)大芯濉训裆;
如果 TCP 連接狀態(tài)處于 Listen 狀態(tài),Recv-Q 的數(shù)值表示當(dāng)前全連接隊(duì)列的大欣渎臁缭保;
netstat 命令中的 Send-Q:
表示發(fā)送緩沖區(qū)中已發(fā)送但未被確認(rèn)的數(shù)據(jù)大小(不管 TCP 是 Listen 狀態(tài)還是 Established 狀態(tài)都表示這個(gè)意思)蝙茶;
通過(guò) netstat 發(fā)現(xiàn)問(wèn)題的案例
自身太慢,比如如下netstat -t 看到的Recv-Q有大量數(shù)據(jù)堆積诸老,那么一般是CPU處理不過(guò)來(lái)導(dǎo)致的:
下面的case是接收方太慢隆夯,從應(yīng)用機(jī)器的netstat統(tǒng)計(jì)來(lái)看,也是client端回復(fù)太慢(本機(jī)listen 9108端口)
Send-Q 表示回復(fù)從9108發(fā)走了别伏,沒(méi)收到對(duì)方的ack蹄衷,基本可以推斷client端到9108之間有瓶頸
實(shí)際確實(shí)是前端到9108之間的帶寬被打滿了,調(diào)整帶寬后問(wèn)題解決厘肮。
四愧口、參考
就是要你懂網(wǎng)絡(luò)監(jiān)控--netstat定位性能案例
https://plantegg.github.io/2019/04/21/netstat%E5%AE%9A%E4%BD%8D%E6%80%A7%E8%83%BD%E6%A1%88%E4%BE%8B
就是要你懂網(wǎng)絡(luò)監(jiān)控--ss用法大全
https://plantegg.github.io/2016/10/12/ss%E7%94%A8%E6%B3%95%E5%A4%A7%E5%85%A8