在客戶端與服務(wù)端之間的交互過程中蜕衡,客戶端向服務(wù)端發(fā)送一個(gè)syn的請求建立連接包壤短,但是服務(wù)端收到后不返回syn+ack確認(rèn)包:
1、
在客戶端與服務(wù)端tcpdump進(jìn)行端口對抓慨仿,使用wireshark分析
看到服務(wù)端收到了客戶端發(fā)送的syn包久脯,但是并沒有返回客戶端syn+ack包屈尼,客戶端等待了1s設(shè)置的超時(shí)時(shí)間后重發(fā)蹦玫,才建立了連接
2淡喜、
netstat -s查看網(wǎng)絡(luò)情況
netstat -s | grep reject
13126873 packets rejects in established connections because of timestamp
由于時(shí)間戳導(dǎo)致包被拒絕的情況一直在增長
cat /proc/net/netstat
查看機(jī)器的計(jì)數(shù)器文件
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive ? PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLoss TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop TCPMinTTLDrop TCPChallengeACK TCPSYNChallenge BusyPollRxPackets TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv
TcpExt: 0 0 28417247 273595 0 0 0 0 0 0 8920817876 886076087 0 0 0 13126873 5837834196 10352662 19315573 128 128 134035850356 3564537962116 77185674789917 335133 153000568912 172198514916 99782109633 204994309264 0 150637 0 8436 455406 0 335321 78327 1591607 50432 11755659 173769 3833 0 29057 579 1551569 449839 80637 78679662 0 170 6 0 19315343 5 1899979 1163 1461621 184288 0 72304 0 309 0 0 19123 544588 209 0 0 10513441 5094638 7108594 82227 0 309118 139877 0 1239 1239 45114
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets
IpExt: 0 0 2 0 0 0 649370286806505 597235907092484 72 0 0 0
找到對應(yīng)錯(cuò)誤原因關(guān)鍵字為PAWSEstab问裕,查看原代碼,
static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, int syn_inerr)
{
? ? struct tcp_sock *tp = tcp_sk(sk);
? ? /* RFC1323: H1. Apply PAWS check first. */
? ? if (tcp_fast_parse_options(sock_net(sk), skb, th, tp) &&
? ? ? ? tp->rx_opt.saw_tstamp &&
? ? ? ? tcp_paws_discard(sk, skb)) {
? ? ? ? if (!th->rst) {
? ? ? ? ? ? NET_INC_STATS(sock_net(sk), LINUX_MIB_PAWSESTABREJECTED);
? ? ? ? ? ? if (!tcp_oow_rate_limited(sock_net(sk), skb,
? ? ? ? ? ? ? ? ? ? ? ? ? LINUX_MIB_TCPACKSKIPPEDPAWS, &tp->last_oow_ack_time))
? ? ? ? ? ? ? ? tcp_send_dupack(sk, skb);
? ? ? ? ? ? goto discard;
? ? ? ? }
? ? ? ? /* Reset is accepted even if it did not pass PAWS. */
? ? }
可以知道到達(dá)時(shí)間戳晚導(dǎo)致reject织中。
然后查看linux的環(huán)境配置
cat /etc/sysctl.conf
kernel.printk = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_timestamps = 1
net.core.somaxconn = 4096
net.ipv4.tcp_max_tw_buckets = 30000
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_tcp_timeout_established = 300
net.netfilter.nf_conntrack_max = 524288
看到net.ipv4.tcp_timestamps設(shè)置值為1苗缩,開啟了時(shí)間戳選項(xiàng)崭参,如果開啟recycle為1則會(huì)進(jìn)行強(qiáng)校驗(yàn)相寇,一分鐘內(nèi)同ip主機(jī)的timestamp必須是遞增,否則丟棄钮科,但是我們這里并沒有開啟recycle唤衫。
現(xiàn)在場景缺失因?yàn)闀r(shí)間戳造成paws,所以進(jìn)一步分析需要手動(dòng)更新本機(jī)時(shí)間戳绵脯,時(shí)間要同步
同步后執(zhí)行netstat -s | grep reject不再增長
3佳励、
觀察服務(wù)的tcp iotimeout仍然在增加,問題還存在蛆挫,只是解決了時(shí)間戳paws的問題
進(jìn)一步懷疑是否為網(wǎng)卡隊(duì)列處理能力有問題導(dǎo)致在網(wǎng)卡丟列時(shí)候就丟棄了syn包
? netstat -s|grep drop
? 10021478 outgoing packets dropped
在持續(xù)增長赃承,果然,查看網(wǎng)卡是單隊(duì)列悴侵,處理能力不足瞧剖,解決辦法換成了多隊(duì)列,或者調(diào)大buffer
ethtool -G eth1
如果看完覺得有所收獲的話可免,記得點(diǎn)贊關(guān)注哦