部署Kubeadm遇到的哪些問題屯伞,并且如何解決
http://www.reibang.com/p/7ccf7769c3a9
k8s集群-CNI網(wǎng)絡(luò)插件
地址:http://www.reibang.com/p/1b1d6ab82e2e
1、初始化服務(wù)器設(shè)置(三臺都要)
環(huán)境機器:Linux7.6系統(tǒng)
為了方便管理, 將服務(wù)器的實例名稱改成: k8s-master01-15/k8s-node01-16/k8s-node02-17(其中15/16/17是私網(wǎng)IP的最后三位, 命名規(guī)則可以自行定義)豪直,測試一下三個服務(wù)器是否劣摇,可以通過私網(wǎng)相互ping通
修改主機名稱
# k8s-master01-15 機器上
hostnamectl set-hostname k8s-master01-15
# k8s-node01-16 機器上
hostnamectl set-hostname k8s-node01-16
# k8s-node02-17 機器上
hostnamectl set-hostname k8s-node02-17
設(shè)置/etc/hosts文件
真正的集群應(yīng)該是使用自己搭建的DNS服務(wù)器來進行IP和域名綁定, 這里處于簡單考慮, 就直接使`用hosts文件關(guān)聯(lián)IP和主機名了, 在三臺服務(wù)的/etc/hosts文件中添加相同的三句話
cat << EOF >> /etc/hosts
172.23.199.15 k8s-master01-15
172.23.191.16 k8s-node01-16
172.23.191.17 k8s-node02-17
EOF
前置準備的環(huán)境 ( 所有節(jié)點 )
1)安裝依賴包
yum install -y conntrack ipvsadm ipset jq iptables curl sysstat libseccomp wget vim net-tools git epel-release telnet tree nmap lrzsz dos2unix bind-utils
2)關(guān)閉setenforce和firewall、NetworkManager
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
setenforce 0
systemctl disable firewalld && systemctl stop firewalld
systemctl stop NetworkManager
systemctl disabled NetworkManager
chkconfig NetworkManager off
systemctl restart NetworkManager
3)安裝設(shè)置Iptables規(guī)則為空
yum -y install iptables-services && systemctl start iptables && systemctl enable iptables&& iptables -F && service iptables save
4)關(guān)閉swap分區(qū) ( 如果不關(guān)閉的話, pod容器可能運行在swap(虛擬內(nèi)存)中, 影響效率 )
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
5)針對K8S調(diào)整內(nèi)核參數(shù)
# 編輯配置文件
cat > /data/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1 # 開啟網(wǎng)橋模式
net.bridge.bridge-nf-call-ip6tables=1 # 開啟網(wǎng)橋模式
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
vm.swappiness=0 # 禁止使用 swap 空間弓乙,只有當系統(tǒng) OOM 時才允許使用它
vm.overcommit_memory=1 # 不檢查物理內(nèi)存是否夠用
vm.panic_on_oom=0 # 開啟 OOM
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1 # 關(guān)閉IPV6協(xié)議
net.netfilter.nf_conntrack_max=2310720
EOF
6)設(shè)置yum源
curl -o /etc/yum.repos.d/centos-7.repo http://mirrors.aliyun.com/repo/Centos-7.repo
curl -o /etc/yum.repos.d/docker-ce.repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum clean all && yum makecache
7)生效配置文件
cp /data/kubernetes.conf /etc/sysctl.d/kubernetes.conf
sysctl -p /etc/sysctl.d/kubernetes.conf
8)調(diào)整系統(tǒng)時區(qū)(時區(qū)正常的可以不用設(shè)置)
# 設(shè)置系統(tǒng)時區(qū)為中國/上海
timedatectl set-timezone Asia/Shanghai
# 將當前的 UTC 時間寫入硬件時鐘
timedatectl set-local-rtc 0
# 重啟依賴于系統(tǒng)時間的服務(wù)
systemctl restart rsyslog
systemctl restart crond
9)關(guān)閉系統(tǒng)不需要的服務(wù)(如果有的話)
systemctl stop postfix && systemctl disable postfix
10)設(shè)置日志系統(tǒng)
選擇systemd journald的日志系統(tǒng), 而不是rsyslogd
創(chuàng)建日志目錄
# 持久化保存日志的目錄
mkdir -p /var/log/journal
mkdir -p /etc/systemd/journald.conf.d
編寫配置文件
cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
[Journal]
# 持久化保存到磁盤
Storage=persistent
# 壓縮歷史日志
Compress=yes
SyncIntervalSec=5m
RateLimitInterval=30s
RateLimitBurst=1000
# 最大占用空間 10G
SystemMaxUse=10G
# 單日志文件最大 200M
SystemMaxFileSize=200M
# 日志保存時間 2 周
MaxRetentionSec=2week
# 不將日志轉(zhuǎn)發(fā)到syslog
ForwardToSyslog=no
EOF
重啟日志系統(tǒng)
systemctl restart systemd-journald
11)kube-proxy開啟ipvs的前置條件
# 加載br_netfilter模塊
modprobe br_netfilter
# 編寫依賴文件
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
# 授權(quán)
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
12)安裝Docker
# 安裝依賴
yum install -y yum-utils device-mapper-persistent-data lvm2
# 配置阿里源
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安裝安裝最新的 containerd.io
yum install dnf -y
dnf install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.2.6-3.3.el7.x86_64.rpm
# 查看docker-ce-cli版本
yum list docker-ce-cli --showduplicates|sort -r
# 安裝docker
yum install -y docker-ce-19.03.8 docker-ce-cli-19.03.8
# 查看docker版本(是否安裝成功)
docker --version
# 創(chuàng)建 /etc/docker 目錄
mkdir -p /etc/docker
# 配置 daemon.json 在阿里云控制臺選擇"容器鏡像服務(wù)", 再選擇"鏡像加速器"側(cè)邊欄, 查看加速器地址
cat > /etc/docker/daemon.json <<EOF
{
"registry-mirrors": ["https://q4xjzq29.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
# 創(chuàng)建目錄
mkdir -p /etc/systemd/system/docker.service.d
# 重啟docker
systemctl daemon-reload
systemctl enable docker && systemctl restart docker && systemctl status docker
安裝Kubeadm(主從配置)
下載kubeadm(三臺服務(wù)器)
# 配置阿里源
cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 安裝 kubelet kubeadm kubectl
yum install -y kubelet-1.20.11 kubectl-1.20.11 kubeadm-1.20.11
# systemctl在enable末融、disable、mask子命令里面增加了--now選項暇韧,可以激活同時啟動服務(wù)勾习,激活同時停止服務(wù)等
systemctl enable --now kubelet
# 查看安裝的版本
kubelet --version
如何卸載K8s組件
# 卸載K8s組件前,先執(zhí)行kubeadm reset命令懈玻,清空K8s集群設(shè)置
echo y|kubeadm reset
# 卸載管理組件
yum erase -y kubelet kubectl kubeadm kubernetes-cni
下載必須鏡像(三臺服務(wù)器)
正常情況下, 接下來可以直接init操作, 在init操作時, 也會下載一些必須的組件鏡像, 這些鏡像是在k8s.gcr.io網(wǎng)站上下載的, 但是由于我們國內(nèi)把該網(wǎng)址墻掉了, 不能直接訪問, 于是需要先提前將這些鏡像通過其他的方式下載好, 這里比較好的方式就是從另一個網(wǎng)站源下載
kubeadm init主要執(zhí)行了以下操作:
[init]:指定版本進行初始化操作
[preflight] :初始化前的檢查和下載所需要的Docker鏡像文件
[kubelet-start] :生成kubelet的配置文件”/var/lib/kubelet/config.yaml”巧婶,沒有這個文件kubelet無法啟動,所以初始化之前的kubelet實際上啟動失敗酪刀。
[certificates]:生成Kubernetes使用的證書粹舵,存放在/etc/kubernetes/pki目錄中。
[kubeconfig] :生成 KubeConfig 文件骂倘,存放在/etc/kubernetes目錄中,組件之間通信需要使用對應(yīng)文件巴席。
[control-plane]:使用/etc/kubernetes/manifest目錄下的YAML文件历涝,安裝 Master 組件。
[etcd]:使用/etc/kubernetes/manifest/etcd.yaml安裝Etcd服務(wù)漾唉。
[wait-control-plane]:等待control-plan部署的Master組件啟動荧库。
[apiclient]:檢查Master組件服務(wù)狀態(tài)。
[uploadconfig]:更新配置
[kubelet]:使用configMap配置kubelet赵刑。
[patchnode]:更新CNI信息到Node上分衫,通過注釋的方式記錄。
[mark-control-plane]:為當前節(jié)點打標簽般此,打了角色Master蚪战,和不可調(diào)度標簽,這樣默認就不會使用Master節(jié)點來運行Pod铐懊。
[bootstrap-token]:生成token記錄下來邀桑,后邊使用kubeadm join往集群中添加節(jié)點時會用到
[addons]:安裝附加組件CoreDNS和kube-proxy
查看需要下載的鏡像
kubeadm config images list
# 輸出結(jié)果, 這些都是K8S的必要組件, 但是由于被墻, 是不能直接docker pull下來的
k8s.gcr.io/kube-apiserver:v1.20.15
k8s.gcr.io/kube-controller-manager:v1.20.15
k8s.gcr.io/kube-scheduler:v1.20.15
k8s.gcr.io/kube-proxy:v1.20.15
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0
編寫pull腳本
cat >/data/script/pull_k8s_images.sh << "EOF"
# 內(nèi)容為
set -o errexit
set -o nounset
set -o pipefail
##這里定義需要下載的版本
KUBE_VERSION=v1.20.15
KUBE_PAUSE_VERSION=3.2
ETCD_VERSION=3.4.13-0
DNS_VERSION=1.7.0
##這是原來被墻的倉庫
GCR_URL=k8s.gcr.io
##這里就是寫你要使用的倉庫,也可以使用gotok8s
DOCKERHUB_URL=registry.cn-hangzhou.aliyuncs.com/google_containers
##這里是鏡像列表
images=(
kube-proxy:${KUBE_VERSION}
kube-scheduler:${KUBE_VERSION}
kube-controller-manager:${KUBE_VERSION}
kube-apiserver:${KUBE_VERSION}
pause:${KUBE_PAUSE_VERSION}
etcd:${ETCD_VERSION}
coredns:${DNS_VERSION}
)
## 這里是拉取和改名的循環(huán)語句, 先下載, 再tag重命名生成需要的鏡像, 再刪除下載的鏡像
for imageName in ${images[@]} ; do
docker pull $DOCKERHUB_URL/$imageName
docker tag $DOCKERHUB_URL/$imageName $GCR_URL/$imageName
docker rmi $DOCKERHUB_URL/$imageName
done
EOF
推送腳本到node[1:2]
scp /data/script/pull_k8s_images.sh root@IP地址:/data/script/
賦予執(zhí)行權(quán)限
chmod +x /data/script/pull_k8s_images.sh
執(zhí)行腳本
bash /data/script/pull_k8s_images.sh
查看下載結(jié)果
[root@k8s-master01-15 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.20.1 e3f6fcd87756 14 months ago 118MB
k8s.gcr.io/kube-controller-manager v1.20.1 2893d78e47dc 14 months ago 116MB
k8s.gcr.io/kube-apiserver v1.20.1 75c7f7112080 14 months ago 122MB
k8s.gcr.io/kube-scheduler v1.20.1 4aa0b4397bbb 14 months ago 46.4MB
k8s.gcr.io/etcd 3.4.13-0 0369cf4303ff 18 months ago 253MB
k8s.gcr.io/coredns 1.7.0 bfe3a36ebd25 20 months ago 45.2MB
k8s.gcr.io/pause 3.2 80d28bedfe5d 2 years ago 683kB
直接pull的話會報錯超時 (如果沒有提示 可忽略)
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.18.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
# 即先從gotok8s倉庫下載鏡像, 然后重新tag一下, 修改起名字即可。這里使用腳本自動化執(zhí)行全過程
docker tag k8s.gcr.io/coredns:1.8.0 k8s.gcr.io/coredns/coredns:v1.8.0
docker rmi k8s.gcr.io/coredns:v1.8.0
初始化主節(jié)點(只有主節(jié)點服務(wù)器才需要初始化)
# 生成初始化文件
kubeadm config print init-defaults > kubeadm-config.yaml
編輯文件
vim kubeadm-config.yaml
# 修改項下面標出
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.19.199.15 # 本機IP
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-master01-15 # 本主機名
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {} # 虛擬IP和haproxy端口(如果k8s不做高可用集群科乎,可以不填寫)
dns:
type: CoreDNS
etcd:
# 如果只想部署單節(jié)點etcd壁畸,以下兩個可以注釋掉
# local:
# dataDir: /var/lib/etcd
# 這里我搭建k8s-外置ETCD集群(可選)
# 部署地址:http://www.reibang.com/p/fbec19c20454
external:
# 修改etcd服務(wù)器地址
endpoints:
- https://172.23.199.15:2379
- https://172.23.199.16:2379
- https://172.23.199.17:2379
# 搭建etcd集群時生成的ca證書
caFile: /root/TLS/etcd/ca.pem
# 搭建etcd集群時生成的客戶端證書
certFile: /root/TLS/etcd/server.pem
# 搭建etcd集群時生成的客戶端密鑰
keyFile: /root/TLS/etcd/server-key.pem
imageRepository: registry.aliyuncs.com/google_containers # 鏡像倉庫源要根據(jù)自己實際情況修改
kind: ClusterConfiguration
kubernetesVersion: v1.20.15 # 修改版本, 與前面版本一致, 也可通過 kubeadm version 查看版本
networking:
dnsDomain: cluster.local
podSubnet: "10.244.0.0/16" # 新增pod子網(wǎng), 固定該IP即可
serviceSubnet: 10.96.0.0/12
scheduler: {}
# 新增下面設(shè)置, 固定即可
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
k8s-外置ETCD集群部署(可選)
地址:http://www.reibang.com/p/fbec19c20454
運行初始化命令
kubeadm init --config=kubeadm-config.yaml | tee kubeadm-init.log
# 正常運行結(jié)果
....
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join IP地址:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:873f80617875dc39a23eced3464c7069689236d460b60692586e7898bf8a254a
如果init運行錯誤
可以根據(jù)錯誤信息來排錯, 主要原因是配置文件 kubeadm-config.yaml 沒寫好, 還有版本號沒對上, IP地址沒改, 多余空格等等問題..........
但是修改完之后之后, 如果直接運行init命令, 可能還會報錯端口已被占用或者一些文件已經(jīng)存在的
[root@k8s-node01-15 ~]# kubeadm init --config=kubeadm-config.yaml | tee kubeadm-init.log
W0801 20:05:00.768809 44882 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.6
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileExisting-tc]: tc not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
原因可能是之前init到一半成功一部分, 但報錯后有沒有回滾, 那么需要先運行kubeadm reset重新設(shè)置為init之前的狀態(tài)
[root@k8s-node01-15 ~]# kubeadm reset # 或者 echo y|kubeadm reset 跳過輸入[y/N]選項
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0801 20:15:00.630170 52554 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://IP地址:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: context deadline exceeded
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0801 20:15:00.534409 52554 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
重設(shè)完之后再繼續(xù)執(zhí)行上述的init即可, init 知道是否成功
init運行成功后
可以查看最后的輸出結(jié)果或者查看運行日志kubeadm-init.log, 里面告訴說需要操作下面的步驟
# master上執(zhí)行
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf
# 推送node{1..X}機器上,如果/root/.kube/config沒有目錄要手動創(chuàng)建
scp /etc/kubernetes/admin.conf root@$nodeX:/root/.kube/config
查看當前節(jié)點, 發(fā)現(xiàn)狀態(tài)為NotReady
[root@k8s-master01-15 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01-15 NotReady master 20m v1.20.11
提示:狀態(tài)還是NotReady
是因為還沒安裝CNI網(wǎng)絡(luò)插件
將子節(jié)點加到主節(jié)點下面(在子節(jié)點服務(wù)器運行)
還是在主節(jié)點的init命令的輸出日志下, 有子節(jié)點的加入命令, 在兩臺子節(jié)點服務(wù)器上運行
kubeadm join masterIP地址:6443 --token XXXXXXXXXXXXXXXXXXX --discovery-token-ca-cert-hash sha256:XXXXXXXXXXXXXXXXXXX