視頻教程連接:kubernetes快速入門
寫在前面
上一篇文章中kubernetes系列教程(六)kubernetes資源管理和服務(wù)質(zhì)量初步介紹了kubernetes中的resource資源調(diào)度和服務(wù)質(zhì)量Qos撮躁,介紹了kubernetes中如何定義pod的資源和資源調(diào)度赏壹,以及設(shè)置resource之后的優(yōu)先級(jí)別Qos,接下來介紹kubernetes系列教程pod的調(diào)度機(jī)制。
1. Pod調(diào)度
1.1 pod調(diào)度概述
kubernets是容器編排引擎砰诵,其中最主要的一個(gè)功能是容器的調(diào)度栗柒,通過kube-scheduler實(shí)現(xiàn)容器的完全自動(dòng)化調(diào)度徒溪,調(diào)度周期分為:調(diào)度周期Scheduling Cycle和綁定周期Binding Cycle梳侨,其中調(diào)度周期細(xì)分為過濾filter和weight稱重,按照指定的調(diào)度策略將滿足運(yùn)行pod節(jié)點(diǎn)的node賽選出來材蛛,然后進(jìn)行排序圆到;綁定周期是經(jīng)過kube-scheduler調(diào)度優(yōu)選的pod后,由特定的node節(jié)點(diǎn)watch然后通過kubelet運(yùn)行卑吭。
過濾階段包含預(yù)選Predicate和scoring排序芽淡,預(yù)選是篩選滿足條件的node,排序是最滿足條件的node打分并排序豆赏,預(yù)選的算法包含有:
- CheckNodeConditionPred 節(jié)點(diǎn)是否ready
- MemoryPressure 節(jié)點(diǎn)內(nèi)存是否壓力大(內(nèi)存是否足夠)
- DiskPressure 節(jié)點(diǎn)磁盤壓力是否大(空間是否足夠)
- PIDPressure 節(jié)點(diǎn)Pid是否有壓力(Pid進(jìn)程是否足夠)
- GeneralPred 匹配pod.spec.hostname字段
- MatchNodeSelector 匹配pod.spec.nodeSelector標(biāo)簽
- PodFitsResources 判斷resource定義的資源是否滿足
- PodToleratesNodeTaints 能容忍的污點(diǎn)pod.spec.tolerations
- CheckNodeLabelPresence
- CheckServiceAffinity
- CheckVolumeBinding
- NoVolumeZoneConflict
過濾條件需要檢查node上滿足的條件挣菲,可以通過kubectl describe node node-id方式查看富稻,如下圖:
優(yōu)選調(diào)度算法有:
- least_requested 資源消耗最小的節(jié)點(diǎn)
- balanced_resource_allocation 各項(xiàng)資源消耗最均勻的節(jié)點(diǎn)
- node_prefer_avoid_pods 節(jié)點(diǎn)傾向
- taint_toleration 污點(diǎn)檢測(cè),檢測(cè)有污點(diǎn)條件的node白胀,得分越低
- selector_spreading 節(jié)點(diǎn)selector
- interpod_affinity pod親和力遍歷
- most_requested 資源消耗最大的節(jié)點(diǎn)
- node_label node標(biāo)簽
1. 2 指定nodeName調(diào)度
nodeName是PodSpec中的一個(gè)字段椭赋,可以通過pod.spec.nodeName指定將pod調(diào)度到某個(gè)具體的node節(jié)點(diǎn)上,該字段比較特殊一般都為空或杠,如果有設(shè)置nodeName字段纹份,kube-scheduler會(huì)直接跳過調(diào)度,在特定節(jié)點(diǎn)上通過kubelet啟動(dòng)pod廷痘。通過nodeName調(diào)度并非是集群的智能調(diào)度,通過指定調(diào)度的方式可能會(huì)存在資源不均勻的情況件已,建議設(shè)置Guaranteed的Qos笋额,防止資源不均時(shí)候Pod被驅(qū)逐evince。如下以創(chuàng)建一個(gè)pod運(yùn)行在node-3上為例:
- 編寫yaml將pod指定在node-3節(jié)點(diǎn)上運(yùn)行
[root@node-1 demo]# cat nginx-nodeName.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-run-on-nodename
annotations:
kubernetes.io/description: "Running the Pod on specific nodeName"
spec:
containers:
- name: nginx-run-on-nodename
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
nodeName: node-3 #通過nodeName指定將nginx-run-on-nodename運(yùn)行在特定節(jié)點(diǎn)node-3上
- 運(yùn)行yaml配置使之生效
[root@node-1 demo]# kubectl apply -f nginx-nodeName.yaml
pod/nginx-run-on-nodename created
- 查看確認(rèn)pod的運(yùn)行情況篷扩,已運(yùn)行在node-3節(jié)點(diǎn)
[root@node-1 demo]# kubectl get pods nginx-run-on-nodename -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-run-on-nodename 1/1 Running 0 6m52s 10.244.2.15 node-3 <none> <none>
1.2. 通過nodeSelector調(diào)度
nodeSelector是PodSpec中的一個(gè)字段兄猩,nodeSelector是最簡(jiǎn)單實(shí)現(xiàn)將pod運(yùn)行在特定node節(jié)點(diǎn)的實(shí)現(xiàn)方式,其通過指定key和value鍵值對(duì)的方式實(shí)現(xiàn)鉴未,需要node設(shè)置上匹配的Labels枢冤,節(jié)點(diǎn)調(diào)度的時(shí)候指定上特定的labels即可。如下以node-2添加一個(gè)app:web的labels铜秆,調(diào)度pod的時(shí)候通過nodeSelector選擇該labels:
- 給node-2添加labels
[root@node-1 demo]# kubectl label node node-2 app=web
node/node-2 labeled
- 查看校驗(yàn)labels設(shè)置情況淹真,node-2增加多了一個(gè)app=web的labels
[root@node-1 demo]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node-1 Ready master 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=
node-2 Ready <none> 15d v1.15.3 app=web,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-2,kubernetes.io/os=linux
node-3 Ready <none> 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-3,kubernetes.io/os=linux
- 通過nodeSelector將pod調(diào)度到app=web所屬的labels
[root@node-1 demo]# cat nginx-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-run-on-nodeselector
annotations:
kubernetes.io/description: "Running the Pod on specific node by nodeSelector"
spec:
containers:
- name: nginx-run-on-nodeselector
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
nodeSelector: #通過nodeSelector將pod調(diào)度到特定的labels
app: web
- 應(yīng)用yaml文件生成pod
[root@node-1 demo]# kubectl apply -f nginx-nodeselector.yaml
pod/nginx-run-on-nodeselector created
- 檢查驗(yàn)證pod的運(yùn)行情況,已經(jīng)運(yùn)行在node-2節(jié)點(diǎn)
[root@node-1 demo]# kubectl get pods nginx-run-on-nodeselector -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-run-on-nodeselector 1/1 Running 0 51s 10.244.1.24 node-2 <none> <none>
系統(tǒng)默認(rèn)預(yù)先定義有多種內(nèi)置的labels连茧,這些labels可以標(biāo)識(shí)node的屬性核蘸,如arch架構(gòu),操作系統(tǒng)類型啸驯,主機(jī)名等
- beta.kubernetes.io/arch=amd64
- beta.kubernetes.io/os=linux
- kubernetes.io/arch=amd64
- kubernetes.io/hostname=node-3
- kubernetes.io/os=linux
1.3 node Affinity and anti-affinity
affinity/anti-affinity和nodeSelector功能相類似客扎,相比于nodeSelector,affinity的功能更加豐富罚斗,未來會(huì)取代nodeSelector徙鱼,affinity增加了如下的一些功能增強(qiáng):
- 表達(dá)式更加豐富,匹配方式支持多樣针姿,如In,NotIn, Exists, DoesNotExist. Gt, and Lt袱吆;
- 可指定soft和preference規(guī)則,soft表示需要滿足的條件搓幌,通過requiredDuringSchedulingIgnoredDuringExecution來設(shè)置杆故,preference則是優(yōu)選選擇條件,通過preferredDuringSchedulingIgnoredDuringExecution指定
- affinity提供兩種級(jí)別的親和和反親和:基于node的node affinity和基于pod的inter-pod affinity/anti-affinity溉愁,node affinity是通過node上的labels來實(shí)現(xiàn)親和力的調(diào)度处铛,而pod affinity則是通過pod上的labels實(shí)現(xiàn)親和力的調(diào)度饲趋,兩者作用的范圍有所不同。
下面通過一個(gè)例子來演示node affinity的使用撤蟆,requiredDuringSchedulingIgnoredDuringExecution指定需要滿足的條件奕塑,preferredDuringSchedulingIgnoredDuringExecution指定優(yōu)選的條件,兩者之間取與關(guān)系家肯。
- 查詢node節(jié)點(diǎn)的labels龄砰,默認(rèn)包含有多個(gè)labels,如kubernetes.io/hostname
[root@node-1 ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node-1 Ready master 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=
node-2 Ready <none> 15d v1.15.3 app=web,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-2,kubernetes.io/os=linux
node-3 Ready <none> 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-3,kubernetes.io/os=linux
- 通過node affiinity實(shí)現(xiàn)調(diào)度讨衣,通過requiredDuringSchedulingIgnoredDuringExecution指定滿足條件kubernetes.io/hostname為node-2和node-3换棚,通過preferredDuringSchedulingIgnoredDuringExecution優(yōu)選條件需滿足app=web的labels
[root@node-1 demo]# cat nginx-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-run-node-affinity
annotations:
kubernetes.io/description: "Running the Pod on specific node by node affinity"
spec:
containers:
- name: nginx-run-node-affinity
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-1
- node-2
- node-3
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: app
operator: In
values: ["web"]
- 應(yīng)用yaml文件生成pod
[root@node-1 demo]# kubectl apply -f nginx-node-affinity.yaml
pod/nginx-run-node-affinity created
- 確認(rèn)pod所屬的node節(jié)點(diǎn),滿足require和 preferre條件的節(jié)點(diǎn)是node-2
[root@node-1 demo]# kubectl get pods --show-labels nginx-run-node-affinity -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
nginx-run-node-affinity 1/1 Running 0 106s 10.244.1.25 node-2 <none> <none> <none>
寫在最后
本文介紹了kubernetes中的調(diào)度機(jī)制反镇,默認(rèn)創(chuàng)建pod是全自動(dòng)調(diào)度機(jī)制固蚤,調(diào)度由kube-scheduler實(shí)現(xiàn),調(diào)度過程分為兩個(gè)階段調(diào)度階段(過濾和沉重排序)和綁定階段(在node上運(yùn)行pod)歹茶。通過干預(yù)有四種方式:
- 指定nodeName
- 通過nodeSelector
- 通過node affinity和anti-affinity
- 通過pod affinity和anti-affinity
附錄
調(diào)度框架介紹:https://kubernetes.io/docs/concepts/configuration/scheduling-framework/
Pod調(diào)度方法:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
當(dāng)你的才華撐不起你的野心時(shí)夕玩,你就應(yīng)該靜下心來學(xué)習(xí)
關(guān)于作者 劉海平(HappyLau )云計(jì)算高級(jí)顧問 目前在騰訊云從事公有云相關(guān)工作,曾就職于酷狗惊豺,EasyStack燎孟,擁有多年公有云+私有云計(jì)算架構(gòu)設(shè)計(jì),運(yùn)維尸昧,交付相關(guān)經(jīng)驗(yàn)揩页,參與了酷狗,南方電網(wǎng)彻磁,國泰君安等大型私有云平臺(tái)建設(shè)碍沐,精通Linux,Kubernetes衷蜓,OpenStack累提,Ceph等開源技術(shù),在云計(jì)算領(lǐng)域具有豐富實(shí)戰(zhàn)經(jīng)驗(yàn)磁浇,擁有RHCA/OpenStack/Linux授課經(jīng)驗(yàn)斋陪。