背景
kubernetes: 1.16.3
master: 3臺(tái)
采用kubeadm部署,在證書(shū)還有30天以上時(shí)影兽,使用kubeadm alpha certs renew all
更新所有證書(shū)揭斧,以為萬(wàn)無(wú)一失,但是到原有的證書(shū)過(guò)期時(shí)間峻堰,發(fā)現(xiàn)API異常告警讹开。
問(wèn)題
api-server日志:
E0721 08:09:28.129981 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
E0721 08:09:28.133091 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
E0721 08:09:28.133460 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
E0721 08:09:28.135093 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
E0721 08:09:28.139986 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
E0721 08:09:28.141188 1 available_controller.go:416] v1beta1.custom.metrics.k8s.io failed with: failing or missing response from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: bad status from https://10.244.69.146:6443/apis/custom.metrics.k8s.io/v1beta1: 401
E0721 08:09:28.143084 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: bad status from https://10.244.8.165:4443/apis/metrics.k8s.io/v1beta1: 401
解決思路
1.kubectl無(wú)法使用,顯示證書(shū)過(guò)期或者失效捐名,隨即
$ kubectl get node
Unable to connect to the server: x509: certificate has expired or is not yet valid
2.隨即將/etc/kubernetes/admin.conf復(fù)制~/.kube/conf旦万,執(zhí)行命令后,還是返回步驟1的錯(cuò)誤镶蹋,隨即意識(shí)到是api-server的問(wèn)題成艘。
3.檢查pki木塊下的所有證書(shū)過(guò)期時(shí)間
$ for i in $(ls *.crt); do echo "===== $i ====="; openssl x509 -in $i -text -noout | grep -A 3 'Validity' ; done
===== apiserver.crt =====
Validity
Not Before: Jul 21 08:08:33 2020 GMT
Not After : Apr 14 05:58:54 2022 GMT
Subject: CN=kube-apiserver
===== apiserver-kubelet-client.crt =====
Validity
Not Before: Jul 21 08:08:33 2020 GMT
Not After : Apr 14 06:00:03 2022 GMT
Subject: O=system:masters, CN=kube-apiserver-kubelet-client
===== ca.crt =====
Validity
Not Before: Jul 21 08:08:33 2020 GMT
Not After : Jul 19 08:08:33 2030 GMT
Subject: CN=kubernetes
===== front-proxy-ca.crt =====
Validity
Not Before: Jul 21 08:08:34 2020 GMT
Not After : Jul 19 08:08:34 2030 GMT
Subject: CN=front-proxy-ca
===== front-proxy-client.crt =====
Validity
Not Before: Jul 21 08:08:34 2020 GMT
Not After : Apr 14 06:00:40 2022 GMT
Subject: CN=front-proxy-client
4.pki下證書(shū)檢測(cè)均沒(méi)有過(guò)期,問(wèn)題就很明顯贺归,出在正在運(yùn)行的container上淆两,重啟kubelet無(wú)法解決問(wèn)題
5.重啟kulet并不會(huì)重建container
systemctl stop kubelet
docker ps -q | xargs docker stop
df -Th | grep "docker" | awk '{print $NF}' | xargs umount
df -Th | grep "kubelet" | awk '{print $NF}' | xargs umount
docker ps -a -q | xargs docker rm
systemctl restart kubelet
6.重啟后,發(fā)現(xiàn)除了主控制節(jié)點(diǎn)(kubeadm init的第一臺(tái)服務(wù)器)牧氮,其余master節(jié)點(diǎn)都恢復(fù)正常
7.查看報(bào)錯(cuò)信息琼腔,主控制節(jié)點(diǎn)上的kubelet無(wú)法啟動(dòng)
Jul 21 17:53:03 master001 kubelet[23047]: E0721 17:53:03.545713 23047 bootstrap.go:250] unable to load TLS configuration from existing bootstrap client config: tls: private key does not match public key
Jul 21 17:53:03 master001 kubelet[23047]: F0721 17:53:03.545749 23047 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
8.對(duì)比其他集群,發(fā)現(xiàn)均沒(méi)有/etc/kubernetes/bootstrap-kubelet.conf文件
9.隨即針對(duì)kubelet進(jìn)行檢查
$ cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
kubelet在啟動(dòng)時(shí)踱葛,會(huì)優(yōu)先讀取kubelet.conf丹莲,當(dāng)kubelet不可用時(shí)光坝,才讀取bootstrap-kubelet.conf
10.對(duì)比住控制節(jié)點(diǎn)上的kubelet.conf發(fā)現(xiàn)
主控制節(jié)點(diǎn)
users:
- name: system:node:master001
user:
client-certificate-data: *****
client-key-data: *****
***** 為加密值
其他主控節(jié)點(diǎn)
users:
- name: default-auth
user:
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
可以看到直接讀取的文件,在首個(gè)master節(jié)點(diǎn)初始化集群時(shí)甥材,kubelet還沒(méi)加入集群盯另,也就沒(méi)法生成這個(gè)文件,所以直接用的加密值洲赵,我們只需要手動(dòng)更改即可鸳惯。
11.重啟首臺(tái)master,集群全部恢復(fù)正常
總結(jié)
1.kubeadm renew在更新證書(shū)后叠萍,api-server芝发,controller,schedule并不會(huì)重載證書(shū)文件苛谷,需要重建container辅鲸。
2.主控制節(jié)點(diǎn)的kubelet.conf中使用的加密值,并不是文件腹殿,而且kubeadm renew也不會(huì)更新這個(gè)文件独悴,所以需要手動(dòng)操作。