前言
前面我們講了Node的親和性調(diào)度检号,但那只是對(duì)于Pod與Node之間關(guān)系能夠更加容易的表達(dá),但是實(shí)際的生產(chǎn)環(huán)境中對(duì)于Pod的調(diào)度還有一些特殊的需求依痊,比如Pod之間存在相互依賴關(guān)系避除,調(diào)用頻繁,對(duì)于這一類的Pod我們希望它們盡量部署在同一個(gè)機(jī)房胸嘁,甚至同一個(gè)節(jié)點(diǎn)上瓶摆,相反,兩個(gè)毫無(wú)關(guān)系的Pod并且有可能存在一些競(jìng)爭(zhēng)性宏,會(huì)影響到該節(jié)點(diǎn)上其它的Pod群井,我們希望這些Pod盡量遠(yuǎn)離,所以K8s 1.4之后就引入了Pod親和性與反親和性調(diào)度毫胜。
親和性
如果兩個(gè)應(yīng)用交互頻繁书斜,那么就有必要讓兩個(gè)應(yīng)用盡量的靠近诬辈,這樣可以減少網(wǎng)絡(luò)通信帶來(lái)的性能損耗
親和性主要由三組條件決定
一是命名空間namespace
二是拓?fù)溆?topology,拓?fù)溆蚩梢岳斫鉃槭且唤MNode的集群荐吉,這些Node通常是有相同的地理空間坐標(biāo)焙糟,如同機(jī)架、機(jī)房或區(qū)域等样屠,在一些極端情況下一個(gè)Node也可以是一個(gè)拓?fù)溆虼┐椋琄8s也給我們內(nèi)置了一些拓?fù)溆?/p>
- kubernetes.io/hostname
- topology.kubernetes.io/zone
- topology.kubernetes.io/region
region一般表示機(jī)架,機(jī)房等痪欲,zone的跨度更大悦穿,一般表示地域。kubernetes.io/hostname被設(shè)置為Node節(jié)點(diǎn)上的hostname业踢,其它兩個(gè)則是由公有云廠商提供栗柒。
三是目標(biāo)Pod的標(biāo)簽label,通過(guò)尋找?guī)в衛(wèi)abel標(biāo)簽的Pod所在的節(jié)點(diǎn)進(jìn)行調(diào)度(通常我們的場(chǎng)景就使用它來(lái)完成)
可以通過(guò)describe查看Node自帶標(biāo)簽
[root@master ~]# kubectl describe node node01
Name: node01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node01
kubernetes.io/os=linux
與Node親和性一樣知举,也是有著硬限制和軟限制之分
通過(guò) kubectl explain pods.spec.affinity.podAffinity 命令可以查看所有的配置項(xiàng)
配置項(xiàng)如下
requiredDuringSchedulingIgnoredDuringExecution # 硬限制
- namespaces # 指定參照pod的namespace
topologyKey # 拓?fù)溆蛩猜伲仨毺顚?xiě)
labelSelector # 標(biāo)簽選擇器
matchExpressions # 按節(jié)點(diǎn)標(biāo)簽列出的節(jié)點(diǎn)選擇器要求列表(推薦)
- key # 標(biāo)簽
operator # 操作符 In, NotIn, Exists, DoesNotExist
values # 標(biāo)簽值
matchLabels # {key,value} 是一個(gè)map,相當(dāng)于matchExpressions的in操作
namespaceSelector # 還只是測(cè)試版本负蠕,暫時(shí)不介紹
preferredDuringSchedulingIgnoredDuringExecution # 軟限制
- podAffinityTerm # 選項(xiàng)
namespaces # 指定參照pod的namespace
topologyKey # 拓?fù)溆颍仨毺顚?xiě)
labelSelector
matchExpressions
- key # 標(biāo)簽
operator # 操作符 In, NotIn, Exists, DoesNotExist
values # 標(biāo)簽值
matchLabels # {key,value} 是一個(gè)map倦畅,相當(dāng)于matchExpressions的in操作
weight # 權(quán)重 范圍1-100
硬限制
Pod親和性需要有一個(gè)已經(jīng)運(yùn)行Pod作為參照遮糖,從而實(shí)現(xiàn)新的Pod與參照Pod在同一區(qū)域的功能
編寫(xiě)參照Pod podaffinity-target.yaml ,該yaml下有兩個(gè)Pod叠赐,分別在node01與node02節(jié)點(diǎn)欲账,標(biāo)簽分別是pro與test
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-target1
labels:
env: pro
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
nodeName: node01 # 將target放到node01
---
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-target2
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
nodeName: node02 # 將target放到node02
編寫(xiě) podaffinity-required.yaml 內(nèi)容如下,親和性選擇標(biāo)簽帶有env=pro的Pod
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-required
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
affinity:
podAffinity: # 使用Pod親和性
requiredDuringSchedulingIgnoredDuringExecution: # 硬限制
- labelSelector:
matchExpressions:
- key: env
operator: In
values: ["pro"]
topologyKey: kubernetes.io/hostname
在不啟動(dòng)podaffinity-target的情況下直接啟動(dòng) podaffinity-required芭概,觀察Pod情況
# 啟動(dòng)
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml
pod/podaffinity-required created
# 觀察Pod詳情赛不,啟動(dòng)失敗
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-required 0/1 Pending 0 7s <none> <none> <none> <none>
# 還是熟悉的錯(cuò)誤,一個(gè)污點(diǎn)罢洲,兩個(gè)標(biāo)簽不匹配問(wèn)題
[root@master pod-affinity]# kubectl describe pod podaffinity-required |grep -A 100 Event
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.
啟動(dòng)podaffinity-target踢故,再啟動(dòng)podaffinity-required,此時(shí)node01節(jié)點(diǎn)有標(biāo)簽為env=pro的Pod惹苗,所以podaffinity-required應(yīng)該調(diào)度到node01
# 啟動(dòng)podaffinity-target
[root@master pod-affinity]# kubectl create -f podaffinity-target.yaml
pod/podaffinity-target1 created
pod/podaffinity-target2 created
# 啟動(dòng)podaffinity-required
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml
pod/podaffinity-required created
# 觀察Pod詳情殿较,已經(jīng)調(diào)度到node01
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-required 1/1 Running 0 5s 10.244.1.82 node01 <none> <none>
podaffinity-target1 1/1 Running 0 12s 10.244.1.81 node01 <none> <none>
podaffinity-target2 1/1 Running 0 12s 10.244.2.28 node02 <none> <none>
修改 podaffinity-required.yaml 將 values: ["pro"]改為 values: ["test"],重新啟動(dòng)podaffinity-required桩蓉,此時(shí)node02節(jié)點(diǎn)有標(biāo)簽為env=test的Pod淋纲,所以podaffinity-required應(yīng)該調(diào)度到node02
# 刪除之前的Pod
[root@master pod-affinity]# kubectl delete -f podaffinity-required.yaml
pod "podaffinity-required" deleted
# 修改yaml后重新啟動(dòng)
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml
pod/podaffinity-required created
# 觀察Pod詳情,已經(jīng)調(diào)度到node02
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-required 1/1 Running 0 4s 10.244.2.29 node02 <none> <none>
podaffinity-target1 1/1 Running 0 2m6s 10.244.1.81 node01 <none> <none>
podaffinity-target2 1/1 Running 0 2m6s 10.244.2.28 node02 <none> <none>
修改 podaffinity-required.yaml 將 values: ["pro"]改為 values: ["dev"]院究,匹配不到任何Pod洽瞬,應(yīng)該報(bào)錯(cuò)
# 刪除之前的Pod
[root@master pod-affinity]# kubectl delete -f podaffinity-required.yaml
pod "podaffinity-required" deleted
# 修改yaml后重新啟動(dòng)
[root@master pod-affinity]# kubectl create -f podaffinity-required.yaml
pod/podaffinity-required created
# 觀察Pod詳情本涕,啟動(dòng)失敗
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-required 0/1 Pending 0 17s <none> <none> <none> <none>
podaffinity-target1 1/1 Running 0 4m26s 10.244.1.81 node01 <none> <none>
podaffinity-target2 1/1 Running 0 4m26s 10.244.2.28 node02 <none> <none>
# 依然是這個(gè)錯(cuò)誤
[root@master pod-affinity]# kubectl describe pod podaffinity-required |grep -A 100 Event
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.
軟限制
從上面的結(jié)果可以看出,硬限制如果匹配不到伙窃,那么Pod就運(yùn)行不起來(lái),軟限制則相反对供,匹配不到那么就退而求其次位他,找一個(gè)資源滿足的Pod調(diào)度就行
編寫(xiě) podaffinity-preferred.yaml 內(nèi)容如下,標(biāo)簽匹配為env=dev产场,此時(shí)尋找不到可以匹配的Pod
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-preferred
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
affinity:
podAffinity: # 使用Pod親和性
preferredDuringSchedulingIgnoredDuringExecution: # 軟限制
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: env
operator: In
values: ["dev"]
topologyKey: kubernetes.io/hostname
weight: 1
啟動(dòng)podaffinity-preferred鹅髓,觀察Pod是否可以正常運(yùn)行
# 啟動(dòng)podaffinity-preferred
[root@master pod-affinity]# kubectl create -f podaffinity-preferred.yaml
pod/podaffinity-preferred created
# 觀察Pod詳情,podaffinity-preferred可以正常運(yùn)行
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-preferred 1/1 Running 0 12s 10.244.2.31 node02 <none> <none>
podaffinity-target1 1/1 Running 0 19s 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 19s 10.244.2.30 node02 <none> <none>
修改podaffinity-preferred.yaml京景,設(shè)置兩個(gè)親和性規(guī)則窿冯,分別設(shè)置不同權(quán)重
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-preferred
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
affinity:
podAffinity: # 使用Pod親和性
preferredDuringSchedulingIgnoredDuringExecution: # 軟限制
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: env
operator: In
values: ["pro"]
topologyKey: kubernetes.io/hostname
weight: 1
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: env
operator: In
values: ["test"]
topologyKey: kubernetes.io/hostname
weight: 2
修改完yaml后重新啟動(dòng)podaffinity-preferred,由于env=test的權(quán)重比較大确徙,應(yīng)該匹配到node02節(jié)點(diǎn)
# 啟動(dòng)podaffinity-preferred
[root@master pod-affinity]# kubectl create -f podaffinity-preferred.yaml
pod/podaffinity-preferred created
# 觀察Pod詳情醒串,已經(jīng)調(diào)度到node02
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-preferred 1/1 Running 0 4s 10.244.2.32 node02 <none> <none>
podaffinity-target1 1/1 Running 0 2m10s 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 2m10s 10.244.2.30 node02 <none> <none>
修改podaffinity-preferred.yaml,將env=pro的權(quán)重設(shè)置為3鄙皇,重新啟動(dòng)podaffinity-preferred芜赌,此時(shí)應(yīng)該調(diào)度到node01節(jié)點(diǎn)
[root@master pod-affinity]# kubectl delete -f podaffinity-preferred.yaml
pod "podaffinity-preferred" deleted
[root@master pod-affinity]# kubectl create -f podaffinity-preferred.yaml
pod/podaffinity-preferred created
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-preferred 1/1 Running 0 5s 10.244.1.85 node01 <none> <none>
podaffinity-target1 1/1 Running 0 2m55s 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 2m55s 10.244.2.30 node02 <none> <none>
相互性
其實(shí)親和性也是相互的,下面驗(yàn)證一下
在編寫(xiě)一個(gè)podaffinity-target345.yaml 伴逸,同樣帶有標(biāo)簽env=pro
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-target3
labels:
env: pro
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
---
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-target4
labels:
env: pro
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
---
apiVersion: v1
kind: Pod
metadata:
name: podaffinity-target5
labels:
env: pro
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
啟動(dòng) podaffinity-target345缠沈,如果親和性具有相互性,應(yīng)該也會(huì)調(diào)度到node01
# 啟動(dòng)podaffinity-target345
[root@master pod-affinity]# kubectl create -f podaffinity-target345.yaml
pod/podaffinity-target3 created
pod/podaffinity-target4 created
pod/podaffinity-target5 created
# 啟動(dòng)成功错蝴,均調(diào)度到了node01節(jié)點(diǎn)
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-preferred 1/1 Running 0 31m 10.244.1.85 node01 <none> <none>
podaffinity-target1 1/1 Running 0 34m 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 34m 10.244.2.30 node02 <none> <none>
podaffinity-target3 1/1 Running 0 10s 10.244.1.87 node01 <none> <none>
podaffinity-target4 1/1 Running 0 10s 10.244.1.88 node01 <none> <none>
podaffinity-target5 1/1 Running 0 10s 10.244.1.89 node01 <none> <none>
反親和性
反親和性的應(yīng)用場(chǎng)景也挺多洲愤,如多副本部署應(yīng)用的時(shí)候,我們希望應(yīng)用可以打散分布在各個(gè)Node節(jié)點(diǎn)顷锰,這樣可以提高服務(wù)可用性
反親和性的配置項(xiàng)與親和性一樣柬赐,只需要將 podAffinity 改為 podAntiAffinity就可以,同樣也是會(huì)有硬限制和軟限制
刪除上面測(cè)試創(chuàng)建的Pod官紫,保留最開(kāi)始的兩個(gè)target肛宋,node01與node02各一個(gè)
[root@master pod-affinity]# kubectl delete -f podaffinity-target345.yaml
pod "podaffinity-target3" deleted
pod "podaffinity-target4" deleted
pod "podaffinity-target5" deleted
[root@master pod-affinity]# kubectl delete -f podaffinity-preferred.yaml
pod "podaffinity-preferred" deleted
# 保留這兩個(gè)target
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-target1 1/1 Running 0 42m 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 42m 10.244.2.30 node02 <none> <none>
硬限制
編寫(xiě) podantiaffinity-required.yaml 內(nèi)容如下,反親和性則相反束世,下面yaml表示匹配到env=pro,test的節(jié)點(diǎn)就不能調(diào)度悼吱,所以node01與node02均不能調(diào)度,又因?yàn)槭怯蚕拗屏急罚訮od應(yīng)該無(wú)法正常運(yùn)行
apiVersion: v1
kind: Pod
metadata:
name: podantiaffinity-required
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
affinity:
podAntiAffinity: # 使用Pod反親和性
requiredDuringSchedulingIgnoredDuringExecution: # 硬限制
- labelSelector:
matchExpressions:
- key: env
operator: In
values: ["pro","test"]
topologyKey: kubernetes.io/hostname
啟動(dòng) podantiaffinity-required 后添,觀察Pod是否正常運(yùn)行
# 啟動(dòng) podantiaffinity-required
[root@master pod-affinity]# kubectl create -f podantiaffinity-required.yaml
pod/podantiaffinity-required created
# Pod狀態(tài)為Pending,啟動(dòng)失敗
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-target1 1/1 Running 0 53m 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 53m 10.244.2.30 node02 <none> <none>
podantiaffinity-required 0/1 Pending 0 6s <none> <none> <none> <none>
# 熟悉的錯(cuò)誤
[root@master pod-affinity]# kubectl describe pod podantiaffinity-required |grep -A 100 Event
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 40s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod anti-affinity rules.
將podantiaffinity-required.yaml的 values: ["pro","test"] 改為 values: ["pro"],代表不能調(diào)度到有標(biāo)簽為env=pro的Pod所在的節(jié)點(diǎn)遇西,也就是node01節(jié)點(diǎn)馅精,所以只能調(diào)度到node02節(jié)點(diǎn),啟動(dòng)podantiaffinity-required
# 刪除之前的podantiaffinity-required
[root@master pod-affinity]# kubectl delete -f podantiaffinity-required.yaml
pod "podantiaffinity-required" deleted
# 啟動(dòng)podantiaffinity-required
[root@master pod-affinity]# kubectl create -f podantiaffinity-required.yaml
pod/podantiaffinity-required created
# Pod正常運(yùn)行粱檀,并且調(diào)度到了node02節(jié)點(diǎn)
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-target1 1/1 Running 0 54m 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 54m 10.244.2.30 node02 <none> <none>
podantiaffinity-required 1/1 Running 0 4s 10.244.2.33 node02 <none> <none>
軟限制
一樣的洲敢,如果匹配不到指定Node,那么選擇一個(gè)資源充足的Node即可
編寫(xiě) podantiaffinity-preferred.yaml 內(nèi)容如下茄蚯,表示不能調(diào)度到 node01與node02節(jié)點(diǎn)压彭,但是沒(méi)有其他節(jié)點(diǎn)了,由于是軟限制渗常,那么最終還是會(huì)在兩個(gè)節(jié)點(diǎn)之間選擇一個(gè)調(diào)度
apiVersion: v1
kind: Pod
metadata:
name: podantiaffinity-preferred
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent # 本地有不拉取鏡像
affinity:
podAntiAffinity: # 使用Pod親和性
preferredDuringSchedulingIgnoredDuringExecution: # 軟限制
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: env
operator: In
values: ["pro","test"]
topologyKey: kubernetes.io/hostname
weight: 1
啟動(dòng) podantiaffinity-preferred 壮不,觀察Pod是否正常運(yùn)行
# 啟動(dòng) podantiaffinity-preferred
[root@master pod-affinity]# kubectl create -f podantiaffinity-preferred.yaml
pod/podantiaffinity-preferred created
# Pod正常運(yùn)行
[root@master pod-affinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
podaffinity-target1 1/1 Running 0 58m 10.244.1.84 node01 <none> <none>
podaffinity-target2 1/1 Running 0 58m 10.244.2.30 node02 <none> <none>
podantiaffinity-preferred 1/1 Running 0 7s 10.244.1.90 node01 <none> <none>
注意
- 親和性調(diào)度不允許使用空的 topologyKey
- 反親和性硬限制不允許使用空的 topologyKey
- 反親和性軟限制空的topologyKey默認(rèn)使用這三種標(biāo)簽組合 kubernetes.io/hostname、failure-domain.beta.kubernetes.io/zone皱碘、failure-domain.beta.kubernetes.io/region
- 如果admission controller 設(shè)置了LimitPodHardAntiAffinityTopology询一, 則互斥性被限制在 kubernetes.io/hostname ,要使用自定義的topologyKey需要改寫(xiě)或者禁用該控制器
如果沒(méi)有上述情況癌椿,就可以使用任意合法的key
Pod的親和性與互斥性調(diào)度就介紹到這里了健蕊,后面我們介紹污點(diǎn)與容忍
歡迎關(guān)注,學(xué)習(xí)不迷路踢俄!