1. 簡(jiǎn)介
Draino 基于標(biāo)簽和 node conditions 自動(dòng)排干 Kubernetes 節(jié)點(diǎn)。匹配了所有指定標(biāo)簽和任意指定 node condition 的節(jié)點(diǎn)會(huì)立即被禁用(cordon),并在等待 drain-buffer
時(shí)間后排干(drain)節(jié)點(diǎn)上的 pod。
Draino 通常是與 Node Problem Detector 及 Cluster Autoscaler 一起使用。NPD 通過(guò)監(jiān)控節(jié)點(diǎn)日志或者執(zhí)行某一腳本來(lái)探測(cè)節(jié)點(diǎn)健康狀態(tài)奇昙,當(dāng) NPD 探測(cè)到某個(gè)節(jié)點(diǎn)上存在異常時(shí)陶珠,就會(huì)給該節(jié)點(diǎn)設(shè)置一個(gè) node condition畜隶。Cluster Autoscaler 可以配置為刪除未充分利用的節(jié)點(diǎn)犯犁。這兩者搭配上 Draino 可以實(shí)現(xiàn)一些場(chǎng)景下的自動(dòng)故障補(bǔ)救:
NPD 探測(cè)到節(jié)點(diǎn)存在一個(gè)永久問(wèn)題属愤,并且給該節(jié)點(diǎn)設(shè)置相應(yīng)的 node condition。
Draino 發(fā)現(xiàn)了這個(gè) node condition栖秕,它會(huì)馬上禁用該節(jié)點(diǎn)春塌,從而避免有新的 pod 調(diào)度到這個(gè)故障節(jié)點(diǎn),并開(kāi)啟定時(shí)任務(wù)來(lái)排干這個(gè)節(jié)點(diǎn)簇捍。
一旦該故障節(jié)點(diǎn)被排干只壳,Cluster Autoscaler 會(huì)認(rèn)為該節(jié)點(diǎn)未充分利用,Autoscaler 等待一段時(shí)間后將該節(jié)點(diǎn)縮容掉暑塑。
2. 使用
啟動(dòng)命令
$ docker run planetlabs/draino /draino --help
usage: draino [<flags>] <node-conditions>...
Automatically cordons and drains nodes that match the supplied conditions.
Flags:
--help Show context-sensitive help (also try --help-long and --help-man).
-d, --debug Run with debug logging.
--listen=":10002" Address at which to expose /metrics and /healthz.
--kubeconfig=KUBECONFIG Path to kubeconfig file. Leave unset to use in-cluster config.
--master=MASTER Address of Kubernetes API server. Leave unset to use in-cluster config.
--dry-run 只發(fā)出事件吼句,不禁用和排干匹配到的節(jié)點(diǎn)
--max-grace-period=8m0s 驅(qū)逐Pod時(shí),允許Pod優(yōu)雅終止的最長(zhǎng)等待時(shí)間
--eviction-headroom=30s Additional time to wait after a pod\'s termination grace period for it to have been deleted.
--drain-buffer=10m0s 執(zhí)行兩次排干操作的最小間隔時(shí)間事格,節(jié)點(diǎn)通常會(huì)被立刻禁用
--node-label="foo=bar" (已過(guò)期) 只有配置了該標(biāo)簽的節(jié)點(diǎn)將會(huì)被執(zhí)行禁用和排干操作惕艳。可能會(huì)被設(shè)置多次
--node-label-expr="metadata.labels.foo == 'bar'"
This is an expr string https://github.com/antonmedv/expr that must return true or false. See `nodefilters_test.go` for examples
--namespace="kube-system" 將會(huì)在該命名空間下創(chuàng)建 leader 選舉鎖對(duì)象
--leader-election-lease-duration=15s
Lease duration for leader election.
--leader-election-renew-deadline=10s
Leader election renew deadline.
--leader-election-retry-period=2s
Leader election retry period.
--skip-drain 禁用節(jié)點(diǎn)后是否執(zhí)行排干操作
--evict-daemonset-pods 驅(qū)逐被現(xiàn)存的 DaemonSet 創(chuàng)建的 pod
--evict-emptydir-pods 驅(qū)逐使用了 emptyDir 本地卷的 pod
--evict-unreplicated-pods 驅(qū)逐不是被 replication controller 創(chuàng)建的 pod
--protected-pod-annotation=KEY[=VALUE] ...
配置了這些注解的 pod 將會(huì)免于被驅(qū)逐驹愚≡短拢可能會(huì)被設(shè)置多次
Args:
<node-conditions> Nodes for which any of these conditions are true will be cordoned and drained.
標(biāo)簽和標(biāo)簽表達(dá)式
Draino 允許通過(guò) --node-label
和 --node-label-expr
參數(shù)來(lái)過(guò)濾符合條件的節(jié)點(diǎn)列表。--node-label
只能對(duì)指定的多個(gè)標(biāo)簽進(jìn)行 AND
判斷逢捺。為了表達(dá)更復(fù)雜的匹配規(guī)則谁鳍,新的 --node-label-expr
參數(shù)能夠支持 OR/AND/NOT
的邏輯的混合使用。詳見(jiàn):https://github.com/antonmedv/expr劫瞳。
--node-label-expr 示例:
(metadata.labels.region == 'us-west-1' && metadata.labels.app == 'nginx') || (metadata.labels.region == 'us-west-2' && metadata.labels.app == 'nginx')
3. 注意事項(xiàng)
部署 Draino 之前需要記住以下幾點(diǎn):
先以
--dry-run
模式運(yùn)行 Draino 來(lái)驗(yàn)證它是否能正確排干節(jié)點(diǎn)倘潜。dry-run 模式下,Draino 只會(huì)上報(bào)日志志于、指標(biāo)和事件涮因,而不會(huì)真正禁用或排干節(jié)點(diǎn)。Draino 會(huì)立刻禁用滿足它所配置的標(biāo)簽和 node conditions 的節(jié)點(diǎn)伺绽,但會(huì)在每排干一個(gè)節(jié)點(diǎn)后等待一段時(shí)間(通過(guò)
--drain-buffer
參數(shù)配置养泡,默認(rèn)是10min)再排干下一個(gè)節(jié)點(diǎn)。即憔恳,如果兩個(gè)節(jié)點(diǎn)同時(shí)觸發(fā)了一個(gè) node condition瓤荔,一個(gè)節(jié)點(diǎn)會(huì)立即被排干,另一個(gè)節(jié)點(diǎn)會(huì)等待10分鐘后再排干钥组。如果有任意一個(gè)被觸發(fā)驅(qū)逐的 pod 驅(qū)逐失敗输硝,Draino 會(huì)認(rèn)為此次排干失敗。如果被觸發(fā)驅(qū)逐的 5個(gè)pod 中2個(gè)驅(qū)逐失敗程梦,Draino 會(huì)認(rèn)為此次排干失敗点把,但它會(huì)繼續(xù)驅(qū)逐另外3個(gè)pod橘荠。
不能被 cluster-autoscaler 驅(qū)逐的 pod 也不會(huì)被 Draino 驅(qū)逐。
4. 部署
Draino 會(huì)自動(dòng)從master分支構(gòu)建郎逃,并被推送到 Docker Hub哥童。鏡像 tag 為 planetlabs/draino:$(git rev-parse --short HEAD)。
可以通過(guò) example Kubernetes deployment manifest 部署 Draino褒翰。
5. 監(jiān)控
Metrics
Draino 提供了一個(gè)簡(jiǎn)單的健康檢查站點(diǎn) /healthz
和 Prometheus 指標(biāo)站點(diǎn) /metrics
贮懈。會(huì)上報(bào)以下指標(biāo):
$ kubectl -n kube-system exec -it ${DRAINO_POD} -- apk add curl
$ kubectl -n kube-system exec -it ${DRAINO_POD} -- curl http://localhost:10002/metrics
# HELP draino_cordoned_nodes_total Number of nodes cordoned.
# TYPE draino_cordoned_nodes_total counter
draino_cordoned_nodes_total{result="succeeded"} 2
draino_cordoned_nodes_total{result="failed"} 1
# HELP draino_drained_nodes_total Number of nodes drained.
# TYPE draino_drained_nodes_total counter
draino_drained_nodes_total{result="succeeded"} 1
draino_drained_nodes_total{result="failed"} 1
Events
Draino 會(huì)在驅(qū)逐過(guò)程的每一個(gè)關(guān)鍵步驟生成一個(gè)事件。下面是一個(gè)以 DrainFailed
結(jié)尾的示例优训。當(dāng)所有步驟都運(yùn)行正常時(shí)朵你,最后會(huì)生成一個(gè) DrainSucceeded
事件。
> kubectl get events -n default | grep -E '(^LAST|draino)'
LAST SEEN FIRST SEEN COUNT NAME KIND TYPE REASON SOURCE MESSAGE
5m 5m 1 node-demo.15fe0c35f0b4bd10 Node Warning CordonStarting draino Cordoning node
5m 5m 1 node-demo.15fe0c35fe3386d8 Node Warning CordonSucceeded draino Cordoned node
5m 5m 1 node-demo.15fe0c360bd516f8 Node Warning DrainScheduled draino Will drain node after 2020-03-20T16:19:14.91905+01:00
5m 5m 1 node-demo.15fe0c3852986fe8 Node Warning DrainStarting draino Draining node
4m 4m 1 node-demo.15fe0c48d010ecb0 Node Warning DrainFailed draino Draining failed: timed out waiting for evictions to complete: timed out
Conditions
當(dāng)一次排干動(dòng)作開(kāi)始時(shí)揣非,Draino 會(huì)給目標(biāo)節(jié)點(diǎn)的 status 中添加一個(gè) DrainScheduled
類型的 condition抡医,這個(gè) condition 會(huì)記錄此次排干動(dòng)作的開(kāi)始和結(jié)束信息。
> kubectl describe node {node-name}
......
Unschedulable: true
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:02:09 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
ec2-host-retirement True Fri, 20 Mar 2020 15:23:26 +0100 Fri, 20 Mar 2020 15:23:26 +0100 NodeProblemDetector Condition added with tooling
DrainScheduled True Fri, 20 Mar 2020 15:50:50 +0100 Fri, 20 Mar 2020 15:23:26 +0100 Draino Drain activity scheduled 2020-03-20T15:50:34+01:00
之后早敬,當(dāng)排干動(dòng)作執(zhí)行完成后忌傻,Draino 會(huì)將執(zhí)行結(jié)果補(bǔ)充到 condition 中,以便你能知道執(zhí)行是成功還是失敻慵唷:
> kubectl describe node {node-name}
......
Unschedulable: true
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:01:59 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 20 Mar 2020 15:52:41 +0100 Fri, 20 Mar 2020 14:02:09 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
ec2-host-retirement True Fri, 20 Mar 2020 15:23:26 +0100 Fri, 20 Mar 2020 15:23:26 +0100 NodeProblemDetector Condition added with tooling
DrainScheduled True Fri, 20 Mar 2020 15:50:50 +0100 Fri, 20 Mar 2020 15:23:26 +0100 Draino Drain activity scheduled 2020-03-20T15:50:34+01:00 | Completed: 2020-03-20T15:50:50+01:00
6. 排干重試
有時(shí)候排干動(dòng)作會(huì)因?yàn)?Pod Disruption Budget 限制或其他的 Draino 以外的原因失敗水孩。這時(shí),目標(biāo)節(jié)點(diǎn)還處于禁用(cordon)狀態(tài)琐驴,且驅(qū)逐 condition 會(huì)被標(biāo)記為 Failed荷愕。如果你想再次嘗試在該節(jié)點(diǎn)執(zhí)行排干動(dòng)作,可以給該節(jié)點(diǎn)添加 draino/drain-retry: true 注解棍矛,Draino 就會(huì)再次嘗試在該節(jié)點(diǎn)執(zhí)行排干操作。
注意:如果排干重試失敗抛杨,目標(biāo)節(jié)點(diǎn)上的 draino/drain-retry: true 注解不會(huì)被修改或移除够委,而是會(huì)再次等待重試。
kubectl annotate node {node-name} draino/drain-retry=true
7. 運(yùn)行模式
Dry Run
:這種模式下怖现,Draino 匹配到故障節(jié)點(diǎn)后茁帽,只會(huì)上報(bào)事件,不會(huì)禁用和排干匹配到的節(jié)點(diǎn)屈嗤∨瞬Γ可以通過(guò)指定 --dry-on 參數(shù)啟動(dòng)該模式。
Cordon Only
:這種模式下饶号,Draino 匹配到故障節(jié)點(diǎn)后铁追,之后禁用節(jié)點(diǎn),而不會(huì)排干節(jié)點(diǎn)上的 Pod茫船±攀可以通過(guò) --skip-drain 參數(shù)啟動(dòng)該模式扭屁。