一譬涡、節(jié)點calico pod啟動問題
1起惕、故障現(xiàn)象:
命令查看啟動報錯
# kubectl describe pod calico-node-r42fc -n kube-system
calico/node is not ready: BIRD is not ready: BGP not established with 10.51.10.4,10.51.10.5
Warning Unhealthy 24s (x196 over 29m) kubelet (combined from similar events): Readiness probe failed: 2024-03-22 02:39:47.813 [INFO][7095] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.51.10.4,10.51.10.5
2秽褒、排查過程
如果不在故障calico-node-r42fc 對應(yīng)的節(jié)點去登錄calico-node-r42fc 會報沒有route到主機
[root@rzbl-middleware01 ~]# kubectl exec -it calico-node-r42fc -n kube-system bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
Error from server: error dialing backend: dial tcp 10.51.10.6:10250: connect: no route to host
要到故障pod的主機登錄pod
[root@rzbl-middleware03 ~]# kubectl exec -it calico-node-r42fc -n kube-system bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
[root@rzbl-middleware03 /]#
進入故障pod翼雀,打開bird配置文件膘融,發(fā)現(xiàn)router id為172.21.0.1;,此IP應(yīng)該是容器網(wǎng)橋網(wǎng)卡地址划咐,命令查看是br-e170164834a4拴念,正常應(yīng)該是ens192網(wǎng)卡地址:10.51.10.6,如下:
## 查看故障pod的 /etc/calico/confd/config/bird.cfg 配置文件參數(shù)"router id"
[root@rzbl-middleware03 /]# cat /etc/calico/confd/config/bird.cfg
function apply_communities ()
{
}
# Generated by confd
include "bird_aggr.cfg";
include "bird_ipam.cfg";
router id 172.21.0.1;
...
## 查看172.21.0.1 IP對應(yīng)的網(wǎng)卡
[root@rzbl-middleware03 ~]# ip a |grep 172.21.0.1
inet 172.21.0.1/16 brd 172.21.255.255 scope global br-e170164834a4
## 查看網(wǎng)卡 ens192對應(yīng)的ip
[root@rzbl-middleware03 ~]# ifconfig ens192
ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.51.10.6 netmask 255.255.255.0 broadcast 10.51.10.255
ether 00:50:56:a4:67:b6 txqueuelen 1000 (Ethernet)
RX packets 1032206644 bytes 168848840934 (157.2 GiB)
RX errors 0 dropped 418 overruns 0 frame 0
TX packets 1186182486 bytes 206549006053 (192.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
在mastet節(jié)點執(zhí)行:
[root@rzbl-middleware01 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 10.51.10.5 | node-to-node mesh | up | 11:01:56 | Established |
| 172.21.0.1 | node-to-node mesh | start | 02:09:41 | Passive |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
綜上所述褐缠,基本可以確定是節(jié)點的calico的BGP網(wǎng)卡設(shè)備識別錯誤導(dǎo)致政鼠。
calicoctl下載包地址:https://github.com/projectcalico/calicoctl/releases/
cd /usr/local/src
wget https://github.com/projectcalico/calicoctl/releases/download/v3.20.6/calicoctl
chmod +x calicoctl
mv calicoctl /usr/sbin/
3、修復(fù)操作
清掉故障pod所在節(jié)點網(wǎng)卡br-e170164834a4
ifconfig br-e170164834a4 down
ip link delete br-e170164834a4
rm -rf /var/lib/cni
rm -f /etc/cni/net.d/*
calico daemonsets 控制器添加環(huán)境變量
[root@rzbl-middleware01 ~]# kubectl edit daemonsets.apps calico-node -n kube-system
...
spec:
template:
spec:
containers:
- env:
- name: IP_AUTODETECTION_METHOD
value: interface=ens*
...
查看calico-node 啟動及狀態(tài)
[root@rzbl-middleware01 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-4p8tr 1/1 Running 1 (7m26s ago) 8m32s
calico-node-8tv4k 1/1 Running 1 (9m42s ago) 10m
calico-node-zpqbg 1/1 Running 0 10m
再次查看calicoctl查看 calico node的狀態(tài)
[root@rzbl-middleware01 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 10.51.10.5 | node-to-node mesh | up | 06:07:29 | Established |
| 10.51.10.6 | node-to-node mesh | up | 06:05:04 | Established |
+--------------+-------------------+-------+----------+-------------+