Kubeadm 安裝
注:以下命令如果沒有額外說明稚茅,均須要root權(quán)限進行運行谣辞。
Step 1.配置源
首先根據(jù)相應(yīng)的系統(tǒng)配置好apt/yum
1.1 配置 Kubernetes 源
https://developer.aliyun.com/mirror/kubernetes?spm=a2c6h.13651102.0.0.3e221b11XNMVQJ
Debian / Ubuntu
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF?
apt-get update
apt-get install -y kubelet kubeadm kubectl
CentOS / RHEL / Fedora
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet
1.2 配置Docker-ce 源
https://developer.aliyun.com/mirror/docker-ce?spm=a2c6h.13651102.0.0.3e221b11MjnYsW
Ubuntu 14.04/16.04(使用 apt-get 進行安裝)
# step 1: 安裝必要的一些系統(tǒng)工具
sudo apt-get update
sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common
# step 2: 安裝GPG證書
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
# Step 3: 寫入軟件源信息
sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
# Step 4: 更新并安裝Docker-CE
sudo apt-get -y update
sudo apt-get -y install docker-ce
# 安裝指定版本的Docker-CE:
# Step 1: 查找Docker-CE的版本:
# apt-cache madison docker-ce
#? docker-ce | 17.03.1~ce-0~ubuntu-xenial | https://mirrors.aliyun.com/docker-ce/linux/ubuntu xenial/stable amd64 Packages
#? docker-ce | 17.03.0~ce-0~ubuntu-xenial | https://mirrors.aliyun.com/docker-ce/linux/ubuntu xenial/stable amd64 Packages
# Step 2: 安裝指定版本的Docker-CE: (VERSION例如上面的17.03.1~ce-0~ubuntu-xenial)
# sudo apt-get -y install docker-ce=[VERSION]
CentOS 7(使用 yum 進行安裝)
# step 1: 安裝必要的一些系統(tǒng)工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# Step 2: 添加軟件源信息
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# Step 3: 更新并安裝Docker-CE
sudo yum makecache fast
sudo yum -y install docker-ce
# Step 4: 開啟Docker服務(wù)
sudo service docker start
# 注意:
# 官方軟件源默認啟用了最新的軟件荞膘,您可以通過編輯軟件源的方式獲取各個版本的軟件包渠鸽。例如官方并沒有將測試版本的軟件源置為可用,您可以通過以下方式開啟。同理可以開啟各種測試版本等嘱支。
# vim /etc/yum.repos.d/docker-ee.repo
#? 將[docker-ce-test]下方的enabled=0修改為enabled=1
#
# 安裝指定版本的Docker-CE:
# Step 1: 查找Docker-CE的版本:
# yum list docker-ce.x86_64 --showduplicates | sort -r
#? Loading mirror speeds from cached hostfile
#? Loaded plugins: branch, fastestmirror, langpacks
#? docker-ce.x86_64? ? ? ? ? ? 17.03.1.ce-1.el7.centos? ? ? ? ? ? docker-ce-stable
#? docker-ce.x86_64? ? ? ? ? ? 17.03.1.ce-1.el7.centos? ? ? ? ? ? @docker-ce-stable
#? docker-ce.x86_64? ? ? ? ? ? 17.03.0.ce-1.el7.centos? ? ? ? ? ? docker-ce-stable
#? Available Packages
# Step2: 安裝指定版本的Docker-CE: (VERSION例如上面的17.03.0.ce.1-1.el7.centos)
# sudo yum -y install docker-ce-[VERSION]
Docker 安裝校驗
root@iZbp12adskpuoxodbkqzjfZ:$ docker version
Client:
Version:? ? ? 17.03.0-ce
API version:? 1.26
Go version:? go1.7.5
Git commit:? 3a232c8
Built:? ? ? ? Tue Feb 28 07:52:04 2017
OS/Arch:? ? ? linux/amd64
Server:
Version:? ? ? 17.03.0-ce
API version:? 1.26 (minimum version 1.12)
Go version:? go1.7.5
Git commit:? 3a232c8
Built:? ? ? ? Tue Feb 28 07:52:04 2017
OS/Arch:? ? ? linux/amd64
Experimental: false
Step 2.配置系統(tǒng)環(huán)境
2.1 關(guān)閉swap
sudo swapoff -a
并且在/etc/fstab中找到swap對應(yīng)的行 將其用#注釋掉
2.2 關(guān)閉Selinux
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
2.3 在/etc/sysctl.conf中添加以下配置
cat <<EOF> /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.ip_forward=1
EOF
sysctl --system
Step 3.安裝 Master
參考:
http://www.reibang.com/p/cdf5db4653bf
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
在Master節(jié)點操作:
kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16
這一步如果沒有報錯蚓胸,代表初始化Master成功。否則根據(jù)報錯信息具體分析除师,如未關(guān)閉swap等沛膳。
安裝Flannel網(wǎng)絡(luò)插件
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
地址如有變化參考https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
Master執(zhí)行:
export KUBECONFIG=/etc/kubernetes/admin.conf
Worker執(zhí)行:
export KUBECONFIG=/etc/kubernetes/kubelet.conf
把上面的export ...寫在/etc/profile的最后
Step 4.添加節(jié)點
在Master節(jié)點操作:
查看是否有token
kubeadm token list
如果沒有輸出
kubeadm token create
如果輸出類似
8ewj1p.9r9hcjoqgajrj4gi
這個就是token名
使用如下命令查看密鑰
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
得到的輸出類似于
8cb2de97839780a412b93877f8507ad6c94f73add17d5d7058e91741c9d5ec78
這個就是hash
在Worker節(jié)點操作:
kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>
用上面在master得到的token 和hash ,加上master的ip和端口號汛聚。
驗證
如果在Worker節(jié)點沒有報錯锹安,輸出最后如
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在Master節(jié)點輸入
kubectl get nodes
如果看到worker被加入,代表添加成功贞岭。
Step 5.使用nvidia gpu
5.1 安裝gpu驅(qū)動(Worker)
由于需要在pod中使用gpu計算八毯,首先需要安裝nvidia驅(qū)動
ubuntu
add-apt-repository ppa:graphics-drivers?
apt-get update
ubuntu還提供了一個很方便的命令,查看各種設(shè)備的推薦驅(qū)動
ubuntu-drivers devices
看到如下輸出
== /sys/devices/pci0000:ae/0000:ae:00.0/0000:af:00.0 ==
vendor? : NVIDIA Corporation
modalias : pci:v000010DEd00001E04sv000010DEsd000012AEbc03sc00i00
driver? : nvidia-415 - third-party free
driver? : nvidia-430 - third-party free recommended
driver? : xserver-xorg-video-nouveau - distro free builtin
driver? : nvidia-418 - third-party free
driver? : nvidia-410 - third-party free
這里我們就選擇推薦的nvidia-430
apt install nvidia-430
安裝完成后瞄桨,輸入
lsmod | grep nvidia
如果沒有輸出话速,代表安裝失敗
成功后,重啟
reboot
重啟后芯侥,輸入
lsmod | grep nouveau
發(fā)現(xiàn)沒有輸出泊交,代表ok
5.2 安裝nvidia-docker2(Worker)
https://nvidia.github.io/nvidia-docker/
Debian-based distributions
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
?sudo apt-key add -
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
RHEL-based distributions
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-container-toolkit
sudo systemctl restart docker
https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)
Ubuntu
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
CentOS
sudo yum install nvidia-docker2
sudo systemctl restart docker
測試nvidia-docker2
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
5.3 安裝 k8s-device-plugin
https://github.com/NVIDIA/k8s-device-plugin
在Worker節(jié)點操作:
修改/etc/docker/daemon.json
{
? ? "default-runtime": "nvidia",
? ? "runtimes": {
? ? ? ? "nvidia": {
? ? ? ? ? ? "path": "nvidia-container-runtime",
? ? ? ? ? ? "runtimeArgs": []
? ? ? ? }
? ? }
}
在Master節(jié)點操作:
https://github.com/NVIDIA/k8s-device-plugin/releases
下載源碼
wget https://github.com/NVIDIA/k8s-device-plugin/archive/1.0.0-beta5.tar.gz
解壓
tar -zxvf 1.0.0-beta5.tar.gz
cd k8s-device-plugin-1.0.0-beta5/
kubectl create -f ./nvidia-device-plugin.yml?
查看kubelet啟動參數(shù)
cat /var/lib/kubelet/kubeadm-flags.env
*修改kubelet啟動參數(shù),允許啟用插件
方法一:
KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --feature-gates=DevicePlugins=true --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2"
方法二:
參考
https://cloud.tencent.com/developer/article/1475552
https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/
修改/etc/kubernetes/manifests下k8s的配置文件(Master)柱查,在
kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
三個文件中加入--feature-gates廓俭,即加入加粗的這一行
spec:
? containers:
? - command:
? ? - kube-apiserver
? ? - --advertise-address=192.168.60.16
? ? - --allow-privileged=true
? ? - --authorization-mode=Node,RBAC
? ? - --client-ca-file=/etc/kubernetes/pki/ca.crt
? ? - --feature-gates=TTLAfterFinished=true,DevicePlugins=true
? ? - --enable-admission-plugins=NodeRestriction
重啟kubelet
systemctl daemon-reload
systemctl restart kubelet
如果要卸載k8s-device-plugin
kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system
5.4 測試gpu任務(wù)分配
創(chuàng)建一個測試用test_gpu_pod.yaml
apiVersion: v1
kind: Pod
metadata:
? name: gpu-pod
spec:
? containers:
? ? - name: cuda-container
? ? ? image: nvidia/cuda
? ? ? resources:
? ? ? ? limits:
? ? ? ? ? nvidia.com/gpu: 1 # requesting 1 GPU
? ? ? command: ["nvidia-smi","-a"]
? restartPolicy: Never
在主節(jié)點上運行
kubectl apply -f test_gpu_pod.yaml
查看pods
kubectl get pods
由于創(chuàng)建的pod只有nvidia-smi一條指令,它已經(jīng)處于Completed狀態(tài)唉工,pod名為gpu-pod
NAME? ? ? READY? STATUS? ? ? RESTARTS? AGE
gpu-pod? 0/1? ? Completed? 0? ? ? ? ? 2m43s
查看pod輸出和日志
kubectl logs gpu-pod
得到顯卡的信息研乒,代表測試成功
TODO:scheduler,gpu分配