<nav id="nav_main" class="navbar-main" style="display: flex; font-size: 14px; color: rgb(102, 102, 102); flex-flow: row nowrap; align-items: center; justify-content: space-between; width: 1823px;">
- [登錄](javascript:void(0);)
</nav>
chalon
K8S使用---故障處理
問題1:K8S集群服務(wù)訪問失斎碜濉刷喜?
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
原因分析:證書不能被識別,其原因為:自定義證書立砸,過期等掖疮。
解決方法:更新證書即可。
問題2:K8S集群服務(wù)訪問失斂抛!浊闪?
curl: (7) Failed connect to 10.103.22.158:3000; Connection refused
原因分析:端口映射錯誤,服務(wù)正常工作螺戳,但不能提供服務(wù)搁宾。
解決方法:刪除svc,重新映射端口即可倔幼。
kubectl delete svc nginx-deployment
問題3:K8S集群服務(wù)暴露失敻峭取?
Error from server (AlreadyExists): services "nginx-deployment" already exists
原因分析:該容器已暴露服務(wù)了凤藏。
解決方法:刪除svc,重新映射端口即可堕伪。
問題4:外網(wǎng)無法訪問K8S集群提供的服務(wù)揖庄?
原因分析:K8S集群的type為ClusterIP,未將服務(wù)暴露至外網(wǎng)欠雌。
解決方法:修改K8S集群的type為NodePort即可蹄梢,于是可通過所有K8S集群節(jié)點(diǎn)訪問服務(wù)。
kubectl edit svc nginx-deployment
問題5:pod狀態(tài)為ErrImagePull?
readiness-httpget-pod 0/1 ErrImagePull 0 10s
原因分析:image無法拉冉础而咆;
Warning Failed 59m (x4 over 61m) kubelet, k8s-node01 Error: ErrImagePull
解決方法:更換鏡像即可。
問題6:創(chuàng)建init C容器后幕袱,其狀態(tài)不正常暴备?
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/2 0 20s
原因分析:查看日志發(fā)現(xiàn),pod一直出于初始化中们豌;然后查看pod詳細(xì)信息涯捻,定位pod創(chuàng)建失敗的原因為:初始化容器未執(zhí)行完畢。
Error from server (BadRequest): container "myapp-container" in pod "myapp-pod" is waiting to start: PodInitializing
waiting for myservice
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find myservice.default.svc.cluster.local: NXDOMAIN
*** Can't find myservice.svc.cluster.local: No answer
*** Can't find myservice.cluster.local: No answer
*** Can't find myservice.default.svc.cluster.local: No answer
*** Can't find myservice.svc.cluster.local: No answer
*** Can't find myservice.cluster.local: No answer
解決方法:創(chuàng)建相關(guān)service望迎,將SVC的name寫入K8S集群的coreDNS服務(wù)器中障癌,于是coreDNS就能對POD的initC容器執(zhí)行過程中的域名解析了。
kubectl apply -f myservice.yaml
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:1/2 0 27m
myapp-pod 0/1 PodInitializing 0 28m
myapp-pod 1/1 Running 0 28m
問題7:探測存活pod狀態(tài)為CrashLoopBackOff辩尊?
readiness-httpget-pod 0/1 CrashLoopBackOff 1 13s
readiness-httpget-pod 0/1 Completed 2 20s
readiness-httpget-pod 0/1 CrashLoopBackOff 2 31s
readiness-httpget-pod 0/1 Completed 3 42s
readiness-httpget-pod 0/1 CrashLoopBackOff 3 53s
原因分析:鏡像問題涛浙,導(dǎo)致容器重啟失敗。
Events:
Type Reason Age From Message
Normal Pulling 56m kubelet, k8s-node01 Pulling image "hub.atguigu.com/library/mylandmarktech/myapp:v1"
Normal Pulled 56m kubelet, k8s-node01 Successfully pulled image "hub.atguigu.com/library/mylandmarktech/myapp:v1"
Normal Created 56m (x3 over 56m) kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 56m (x3 over 56m) kubelet, k8s-node01 Started container readiness-httpget-container
Normal Pulled 56m (x2 over 56m) kubelet, k8s-node01 Container image "hub.atguigu.com/library/mylandmarktech/myapp:v1" already present on machine
Warning Unhealthy 56m kubelet, k8s-node01 Readiness probe failed: Get http://10.244.2.22:80/index1.html: dial tcp 10.244.2.22:80: connect: connection refused
Warning BackOff 56m (x4 over 56m) kubelet, k8s-node01 Back-off restarting failed container
Normal Scheduled 50s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01
解決方法:更換鏡像即可摄欲。
問題8:POD創(chuàng)建失斀瘟痢?
readiness-httpget-pod 0/1 Pending 0 0s
readiness-httpget-pod 0/1 Pending 0 0s
readiness-httpget-pod 0/1 ContainerCreating 0 0s
readiness-httpget-pod 0/1 Error 0 2s
readiness-httpget-pod 0/1 Error 1 3s
readiness-httpget-pod 0/1 CrashLoopBackOff 1 4s
readiness-httpget-pod 0/1 Error 2 15s
readiness-httpget-pod 0/1 CrashLoopBackOff 2 26s
readiness-httpget-pod 0/1 Error 3 37s
readiness-httpget-pod 0/1 CrashLoopBackOff 3 52s
readiness-httpget-pod 0/1 Error 4 82s
原因分析:鏡像問題導(dǎo)致容器無法啟動蒿涎。
[root@k8s-master01 ~]# kubectl logs readiness-httpget-pod
url.js:106
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'url', 'string', url);
^
TypeError [ERR_INVALID_ARG_TYPE]: The "url" argument must be of type string. Received type undefined
at Url.parse (url.js:106:11)
at Object.urlParse [as parse] (url.js:100:13)
at module.exports (/myapp/node_modules/mongodb/lib/url_parser.js:17:23)
at connect (/myapp/node_modules/mongodb/lib/mongo_client.js:159:16)
at Function.MongoClient.connect (/myapp/node_modules/mongodb/lib/mongo_client.js:110:3)
at Object.<anonymous> (/myapp/app.js:12:13)
at Module._compile (module.js:641:30)
at Object.Module._extensions..js (module.js:652:10)
at Module.load (module.js:560:32)
at tryModuleLoad (module.js:503:12)
at Function.Module._load (module.js:495:3)
at Function.Module.runMain (module.js:682:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:613:3
Events:
Type Reason Age From Message
Normal Pulled 58m (x5 over 59m) kubelet, k8s-node01 Container image "hub.atguigu.com/library/myapp:v1" already present on machine
Normal Created 58m (x5 over 59m) kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 58m (x5 over 59m) kubelet, k8s-node01 Started container readiness-httpget-container
Warning BackOff 57m (x10 over 59m) kubelet, k8s-node01 Back-off restarting failed container
Normal Scheduled 3m35s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01
解決方法:更換鏡像哀托。
問題9:POD的ready狀態(tài)未進(jìn)入?
readiness-httpget-pod 0/1 Running 0 116s
原因分析:POD的執(zhí)行命令失敗劳秋,無法獲取資源仓手。
Error from server (NotFound): pods "pod" not found
2021/06/11 07:10:14 [error] 30#30: *1 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.2.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.2.25:80"
10.244.2.1 - - [11/Jun/2021:07:10:14 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
10.244.2.1 - - [11/Jun/2021:07:10:17 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
Events:
Type Reason Age From Message
Normal Pulled 64m kubelet, k8s-node01 Container image "hub.atguigu.com/library/nginx" already present on machine
Normal Created 64m kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 64m kubelet, k8s-node01 Started container readiness-httpget-container
Warning Unhealthy 59m (x101 over 64m) kubelet, k8s-node01 Readiness probe failed: HTTP probe failed with statuscode: 404
Normal Scheduled 8m16s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01
解決方法:進(jìn)入容器內(nèi)部,創(chuàng)建yaml定義的資源
問題10:pod創(chuàng)建失敳J纭嗽冒?
error: error validating "myregistry-secret.yml": error validating data: ValidationError(Pod.spec.imagePullSecrets[0]): invalid type for io.k8s.api.core.v1.LocalObjectReference: got "string", expected "map"; if you choose to ignore these errors, turn validation off with --validate=false
原因分析:yml文件內(nèi)容出錯---使用中文字符;
解決方法:修改myregistrykey內(nèi)容即可补履。
11添坊、kube-flannel-ds-amd64-ndsf7插件pod的status為Init:0/1?
排查思路:kubectl -n kube-system describe pod kube-flannel-ds-amd64-ndsf7 #查詢pod描述信息箫锤;
原因分析:k8s-slave1節(jié)點(diǎn)拉取鏡像失敗贬蛙。
解決方法:登錄k8s-slave1,重啟docker服務(wù)谚攒,手動拉取鏡像阳准。
k8s-master節(jié)點(diǎn),重新安裝插件即可馏臭。
kubectl create -f kube-flannel.yml;kubectl get nodes
12野蝇、K8S創(chuàng)建服務(wù)status為ErrImagePull?
排查思路:kubectl describe pod test-nginx
原因分析:拉取鏡像名稱問題。
解決方法:刪除錯誤pod绕沈;重新拉取鏡像锐想;
kubectl delete pod test-nginx;kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine
13、不能進(jìn)入指定容器內(nèi)部乍狐?
Error from server (BadRequest): container volume-test-container is not valid for pod volume-test-pod
原因分析:yml文件comtainers字段重復(fù)赠摇,導(dǎo)致該pod沒有該容器。
解決方法:去掉yml文件中多余的containers字段澜躺,重新生成pod蝉稳。
14、創(chuàng)建PV失斁虮伞耘戚?
persistentvolume/nfspv1 unchanged
persistentvolume/nfspv01 created
Error from server (Invalid): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"name":"nfspv01"},"spec":{"accessModes":["ReadWriteOnce"],"capacity":{"storage":"5Gi"},"nfs":{"path":"/nfs2","server":"192.168.66.100"},"persistentVolumeReclaimPolicy":"Retain","storageClassName":"nfs"}}\n"}},"spec":{"nfs":{"path":"/nfs2"}}}
to:
Resource: "/v1, Resource=persistentvolumes", GroupVersionKind: "/v1, Kind=PersistentVolume"
Name: "nfspv01", Namespace: ""
Object: &{map["apiVersion":"v1" "kind":"PersistentVolume" "metadata":map["annotations":map["kubectl.kubernetes.io/last-applied-configuration":"{"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"name":"nfspv01"},"spec":{"accessModes":["ReadWriteOnce"],"capacity":{"storage":"5Gi"},"nfs":{"path":"/nfs1","server":"192.168.66.100"},"persistentVolumeReclaimPolicy":"Retain","storageClassName":"nfs"}}\n"] "creationTimestamp":"2021-06-25T01:54:24Z" "finalizers":["kubernetes.io/pv-protection"] "name":"nfspv01" "resourceVersion":"325674" "selfLink":"/api/v1/persistentvolumes/nfspv01" "uid":"89cb1d15-8012-47f0-aee6-6507bb624387"] "spec":map["accessModes":["ReadWriteOnce"] "capacity":map["storage":"5Gi"] "nfs":map["path":"/nfs1" "server":"192.168.66.100"] "persistentVolumeReclaimPolicy":"Retain" "storageClassName":"nfs" "volumeMode":"Filesystem"] "status":map["phase":"Available"]]}
for: "PV.yml": PersistentVolume "nfspv01" is invalid: spec.persistentvolumesource: Forbidden: is immutable after creation
原因分析:pv的name字段重復(fù)。
解決方法:修改pv的name字段即可操漠。
15收津、pod無法掛載PVC?
原因分析:pod無法掛載PVC浊伙。
Events:
Type Reason Age From Message
Warning FailedScheduling 60s default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
accessModes與可使用的PV不一致撞秋,導(dǎo)致無法掛載PVC,由于只能掛載大于1G且accessModes為RWO的PV嚣鄙,故只能成功創(chuàng)建1個pod吻贿,第2個pod一致pending,按序創(chuàng)建時則第3個pod一直未被創(chuàng)建哑子;
解決方法:修改yml文件中accessModes或PV的accessModes即可舅列。
16、問題:pod使用PV后卧蜓,無法訪問其內(nèi)容帐要?
原因分析:nfs卷中沒有文件或權(quán)限不對。
解決方法:在nfs卷中創(chuàng)建文件并授予權(quán)限弥奸。
17榨惠、查看節(jié)點(diǎn)狀態(tài)失敗盛霎?
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
原因分析:沒有heapster服務(wù)赠橙。
解決方法:安裝promethus監(jiān)控組件即可。
18愤炸、pod一直處于pending'狀態(tài)期揪?
原因分析:由于已使用同樣鏡像發(fā)布了pod,導(dǎo)致無節(jié)點(diǎn)可調(diào)度摇幻。
Events:
Type Reason Age From Message
Warning FailedScheduling 9s (x13 over 14m) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
解決方法:刪除所有pod后部署pod即可横侦。
19、helm安裝組件失敶乱觥枉侧?
[root@k8s-master01 hello-world]# helm install
Error: This command needs 1 argument: chart nam
[root@k8s-master01 hello-world]# helm install ./
Error: no Chart.yaml exists in directory "/root/hello-world"
原因分析:文件名格式不對。
解決方法:mv chart.yaml Chart.yaml
20狂芋、helm更新release失斦ツ佟?
[root@k8s-master01 hello-world]#
[root@k8s-master01 hello-world]# helm upgrade joyous-wasp ./
UPGRADE FAILED
ROLLING BACK
Error: render error in "hello-world/templates/deployment.yaml": template: hello-world/templates/deployment.yaml:14:35: executing "hello-world/templates/deployment.yaml" at <.values.image.reposi...>: can't evaluate field image in type interface {}
Error: UPGRADE FAILED: render error in "hello-world/templates/deployment.yaml": template: hello-world/templates/deployment.yaml:14:35: executing "hello-world/templates/deployment.yaml" at <.values.image.reposi...>: can't evaluate field image in type interface {}
原因分析:yaml文件語法錯誤帜矾。
解決方法:修改yaml文件即可翼虫。
21、etcd啟動失斅庞珍剑?
[root@k8s-master01 ~]# systemctl enable --now etcd
Created symlink from /etc/systemd/system/etcd3.service to /usr/lib/systemd/system/etcd.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /usr/lib/systemd/system/etcd.service.
Job for etcd.service failed because a timeout was exceeded. See "systemctl status etcd.service" and "journalctl -xe" for details.
原因分析:認(rèn)證失敗原因可能為證書、配置死陆、端口等招拙。檢查配置符合etcd版本要求,證書生成過程有效措译。最后確認(rèn)端口被占用導(dǎo)致認(rèn)證失敗别凤。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
[root@k8s-master01 ~]``# systemctl status etcd
● etcd.service - Etcd.service
Loaded: loaded (``/usr/lib/systemd/system/etcd``.service; enabled; vendor preset: disabled)
Active: activating (start) since Wed 2021-07-14 09:53:03 CST; 1min 6s ago
Docs: https:``//coreos``.com``/etcd/docs/latest/
Main PID: 39692 (etcd)
CGroup: ``/system``.slice``/etcd``.service
└─39692 ``/usr/local/bin/etcd
--config-``file``=``/etc/etcd/etcd``.config.yml
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46168"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46166"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46170"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46172"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46176"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46174"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46178"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46180"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:10 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46182"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
Jul 14 09:54:10 k8s-master01 etcd[39692]: rejected connection from ``"192.168.0.108:46186"
(error ``"remote error: tls: bad certificate"``, ServerName ``""``)
|
解決方法:kill占用2379端口的進(jìn)程,重啟etcd即可领虹。