搭建的是 k8s 高可用集群伟端,用了 3 臺(tái) master 節(jié)點(diǎn),2 臺(tái) master 節(jié)點(diǎn)宕機(jī)后编丘,僅剩的 1 臺(tái)無(wú)法正常工作摆舟。
運(yùn)行 kubectl get nodes 命令出現(xiàn)下面的錯(cuò)誤
The connection to the server k8s-api:6443 was refused - did you specify the right host or port?
注:k8s-api 對(duì)應(yīng)的就是這臺(tái) master 服務(wù)器的本機(jī) IP 地址。
運(yùn)行 netstat -lntp 命令發(fā)現(xiàn) kube-apiserver 根本沒(méi)有運(yùn)行噪服,同時(shí)發(fā)現(xiàn) etcd 與 kube-proxy 也沒(méi)運(yùn)行铃彰。
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:33807 0.0.0.0:* LISTEN 602/kubelet
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 572/rpcbind
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 3229/kube-controlle
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 3753/kube-scheduler
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 571/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1644/sshd
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 602/kubelet
tcp6 0 0 :::111 :::* LISTEN 572/rpcbind
tcp6 0 0 :::10250 :::* LISTEN 602/kubelet
tcp6 0 0 :::10251 :::* LISTEN 3753/kube-scheduler
tcp6 0 0 :::10252 :::* LISTEN 3229/kube-controlle
通過(guò) docker ps 命令發(fā)現(xiàn) etcd , kube-apiserver, kube-proxy 這 3 個(gè)容器都沒(méi)有運(yùn)行,etcd 容器在不停地啟動(dòng)->失敗->重啟->又失敗......芯咧,查看容器日志發(fā)現(xiàn)下面的錯(cuò)誤:
etcdserver: publish error: etcdserver: request timed out
rafthttp: health check for peer 611e58a32a3e3ebe could not connect: dial tcp 10.0.1.252:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
rafthttp: health check for peer 611e58a32a3e3ebe could not connect: dial tcp 10.0.1.252:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")
rafthttp: health check for peer cc00b4912b6442df could not connect: dial tcp 10.0.1.82:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
rafthttp: health check for peer cc00b4912b6442df could not connect: dial tcp 10.0.1.82:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")
raft: 12637f5ec2bd02b8 is starting a new election at term 254669
etcd 啟動(dòng)失敗是由于 etcd 在 3 節(jié)點(diǎn)集群模式在啟動(dòng)卻無(wú)法連接另外 2 臺(tái) master 節(jié)點(diǎn)的 etcd 牙捉,要解決這個(gè)問(wèn)題需要改為單節(jié)點(diǎn)集群模式。開(kāi)始不知道如何將 etcd 改為單節(jié)點(diǎn)模式敬飒,后來(lái)在網(wǎng)上找到 2 個(gè)參數(shù) --initial-cluster-state=new 與 --force-new-cluster 邪铲,在 /etc/kubernetes/manifests/etcd.yaml 中給 etcd 命令加上這 2 個(gè)參數(shù),并重啟服務(wù)器后无拗,master 節(jié)點(diǎn)就能正常運(yùn)行了带到。
containers:
- command:
- etcd
- --advertise-client-urls=https://10.0.1.81:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://10.0.1.81:2380
- --initial-cluster=k8s-master0=https://10.0.1.81:2380
- --initial-cluster-state=new
......
master 正常運(yùn)行后,需要去掉剛剛添加的這 2 個(gè) etcd 參數(shù)英染。
轉(zhuǎn)載自dudu
出處:http://dwz.date/cAZE