概述
在業(yè)務(wù)上了kubernetes集群以后霞掺,容器編排的功能之一是能提供業(yè)務(wù)服務(wù)彈性伸縮能力嗓蘑,但是我們應(yīng)該保持多大的節(jié)點(diǎn)規(guī)模來滿足應(yīng)用需求呢?這個(gè)時(shí)候可以通過cluster-autoscaler實(shí)現(xiàn)節(jié)點(diǎn)級別的動(dòng)態(tài)添加與刪除尝胆,動(dòng)態(tài)調(diào)整容器資源池暇韧,應(yīng)對峰值流量。在Kubernetes中共有三種不同的彈性伸縮策略廊宪,分別是HPA(HorizontalPodAutoscaling)矾瘾、VPA(VerticalPodAutoscaling)與CA(ClusterAutoscaler)。其中HPA和VPA主要擴(kuò)縮容的對象是容器箭启,而CA的擴(kuò)縮容對象是節(jié)點(diǎn)壕翩。
最近有轉(zhuǎn)碼業(yè)務(wù)從每天下午4點(diǎn)開始,負(fù)載開始飆高傅寡,在晚上10點(diǎn)左右負(fù)載會回落放妈;目前是采用靜態(tài)容量規(guī)劃導(dǎo)致服務(wù)器資源不能合理利用別浪費(fèi),所以記錄下cluster-autoscaler來動(dòng)態(tài)伸縮pod資源所需的容器資源池
應(yīng)用aliyun-cluster-autoscaler
前置條件:
- aliyun-cloud-provider
1. 權(quán)限認(rèn)證
由于autoscaler會調(diào)用阿里云ESS api來觸發(fā)集群規(guī)模調(diào)整荐操,因此需要配置api 訪問的AK芜抒。
自定義權(quán)限策略如下:
{
"Version": "1",
"Statement": [
{
"Action": [
"ess:Describe*",
"ess:CreateScalingRule",
"ess:ModifyScalingGroup",
"ess:RemoveInstances",
"ess:ExecuteScalingRule",
"ess:ModifyScalingRule",
"ess:DeleteScalingRule",
"ess:DetachInstances",
"ecs:DescribeInstanceTypes"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
創(chuàng)建一個(gè)k8s-cluster-autoscaler的編程訪問用戶,將其自定義權(quán)限策略應(yīng)用到此用戶托启,并創(chuàng)建AK.
2.ASG Setup
- 創(chuàng)建一個(gè)伸縮組ESS https://essnew.console.aliyun.com
自動(dòng)擴(kuò)展kubernetes集群需要阿里云ESS(彈性伸縮組)的支持宅倒,因此需要先創(chuàng)建一個(gè)ESS。
進(jìn)入ESS控制臺. 選擇北京Region(和kubernetes集群所在region保持一致)屯耸,點(diǎn)擊【創(chuàng)建伸縮組】拐迁,在彈出的對話框中填寫相應(yīng)信息蹭劈,注意網(wǎng)絡(luò)類型選擇專有網(wǎng)絡(luò),并且專有網(wǎng)絡(luò)選擇前置條件1中的Kubernetes集群所在的vpc網(wǎng)絡(luò)名唠亚,然后選擇vswitch(和kubernetes節(jié)點(diǎn)所在的vswitch)链方,然后提交。如下圖:
其中伸縮配置需要單獨(dú)創(chuàng)建灶搜,選擇實(shí)例規(guī)格(建議選擇多種資源一致的實(shí)例規(guī)格祟蚀,避免實(shí)力規(guī)格不足導(dǎo)致伸縮失敗)、安全組(和kubernetes node所在同個(gè)安全組)割卖、帶寬峰值選擇0(不分配公網(wǎng)IP),設(shè)置用戶數(shù)據(jù)等等前酿。注意用戶數(shù)據(jù)取使用文本形式,同時(shí)將獲取kubernetes集群的添加節(jié)點(diǎn)命令粘貼到該文本框中鹏溯,并在之前添加#!/bin/bash,下面是將此節(jié)點(diǎn)注冊到集群的實(shí)例示例:
#!/bin/bash
curl https://file.xx.com/kubernetes-stage/attach_node.sh | bash -s -- --kubeconfig [kubectl.kubeconfig | base64] --cluster-dns 172.19.0.10 --docker-version 18.06.2-ce-3 --labels type=autoscaler
然后完成創(chuàng)建罢维,啟用配置
3.部署Autoscaler到kubernetes集群中
需要手動(dòng)指定上面剛剛創(chuàng)建伸縮組ID以及伸縮最小和最大的機(jī)器數(shù)量,示例:--nodes=1:3:asg-2ze9hse7u4udb6y4kd25
---
apiVersion: v1
kind: ConfigMap
metadata:
name: cloud-autoscaler-config
namespace: kube-system
data:
access-key-id: "xxxx"
access-key-secret: "xxxxx"
region-id: "cn-beijing"
---
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["events","endpoints"]
verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["endpoints"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get","update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["watch","list","get","update"]
- apiGroups: [""]
resources: ["pods","services","replicationcontrollers","persistentvolumeclaims","persistentvolumes"]
verbs: ["watch","list","get"]
- apiGroups: ["extensions"]
resources: ["replicasets","daemonsets"]
verbs: ["watch","list","get"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["watch","list"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["watch","list","get"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["watch","list","get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create","list","watch"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
verbs: ["delete","get","update","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
priorityClassName: system-cluster-critical
serviceAccountName: cluster-autoscaler
containers:
- image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1-567fb17
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=alicloud
- --nodes={MIN_NODE}:{MAX_NODE}:{ASG_ID}
- --skip-nodes-with-system-pods=false
- --skip-nodes-with-local-storage=false
imagePullPolicy: "Always"
env:
- name: ACCESS_KEY_ID
valueFrom:
configMapKeyRef:
name: cloud-autoscaler-config
key: access-key-id
- name: ACCESS_KEY_SECRET
valueFrom:
configMapKeyRef:
name: cloud-autoscaler-config
key: access-key-secret
- name: REGION_ID
valueFrom:
configMapKeyRef:
name: cloud-autoscaler-config
key: region-id
測試自動(dòng)擴(kuò)展節(jié)點(diǎn)效果
Autoscaler根據(jù)用戶應(yīng)用的資源靜態(tài)請求量來決定是否擴(kuò)展集群大小丙挽,因此需要設(shè)置好應(yīng)用的資源請求量肺孵。
測試前節(jié)點(diǎn)數(shù)量如下,配置均為2核4G ECS颜阐,其中兩個(gè)節(jié)點(diǎn)可調(diào)度平窘。
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# kubectl get node
NAME STATUS ROLES AGE VERSION
cn-beijing.i-2ze190o505f86pvk8ois Ready master,node 46h v1.12.3
cn-beijing.i-2zeef9b1nhauqusbmn4z Ready node 46h v1.12.3
接下來我們創(chuàng)建一個(gè)副本nginx deployment, 指定每個(gè)nginx副本需要消耗2G內(nèi)存。
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# cat <<EOF | kubectl apply -f -
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-example
spec:
replicas: 2
revisionHistoryLimit: 2
template:
metadata:
labels:
app: nginx-example
spec:
containers:
- image: nginx:latest
name: nginx
ports:
- containerPort: 80
resources:
requests:
memory: 2G
EOF
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-example-6669fc6b48-ndclg 1/1 Running 0 15s
nginx-example-6669fc6b48-tn5wp 1/1 Running 0 15s
看到由于有足夠的cpu內(nèi)存資源凳怨,所以pod能夠正常調(diào)度瑰艘。接下來我們使用kubectl scale 命令來擴(kuò)展副本數(shù)量到4個(gè)。
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# kubectl scale deploy nginx-example --replicas 3
deployment.extensions/nginx-example scaled
[root@iZ2ze190o505f86pvk8oisZ ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-example-584bdb467-2s226 1/1 Running 0 13m
nginx-example-584bdb467-lz2jt 0/1 Pending 0 4s
nginx-example-584bdb467-r7fcc 1/1 Running 0 4s
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# kubectl describe pod nginx-example-584bdb467-lz2jt | grep -A 4 Event
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 1s (x5 over 19s) default-scheduler 0/2 nodes are available: 2 Insufficient memory.
發(fā)現(xiàn)由于沒有足夠的cpu內(nèi)存資源肤舞,該pod無法被調(diào)度(pod 處于pending狀態(tài))紫新。這時(shí)候autoscaler會介入,嘗試創(chuàng)建一個(gè)新的節(jié)點(diǎn)來讓pod可以被調(diào)度李剖∶⒙剩看下伸縮組狀態(tài),已經(jīng)創(chuàng)建了一臺機(jī)器
接下來我們執(zhí)行一個(gè)watch kubectl get no 的命令來監(jiān)視node的添加。大約幾分鐘后篙顺,就有新的節(jié)點(diǎn)添加進(jìn)來了偶芍。
[root@iZ2ze190o505f86pvk8oisZ ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
cn-beijing.i-2ze190o505f86pvk8ois Ready master,node 17m v1.12.3
cn-beijing.i-2zedqvw2bewvk0l2mk9x Ready autoscaler,node 2m30s v1.12.3
cn-beijing.i-2zeef9b1nhauqusbmn4z Ready node 2d17h v1.12.3
[root@iZ2ze190o505f86pvk8oisZ ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-example-584bdb467-2s226 1/1 Running 0 19m
nginx-example-584bdb467-lz2jt 1/1 Running 0 5m47s
nginx-example-584bdb467-r7fcc 1/1 Running 0 5m47s
可以觀察到比測試前新增了一個(gè)節(jié)點(diǎn),并且pod也正常調(diào)度了慰安。
測試自動(dòng)收縮節(jié)點(diǎn)數(shù)量
當(dāng)Autoscaler發(fā)現(xiàn)通過調(diào)整Pod分布時(shí)可以空閑出多余的node的時(shí)候腋寨,會執(zhí)行節(jié)點(diǎn)移除操作聪铺。這個(gè)操作不會立即執(zhí)行化焕,通常設(shè)置了一個(gè)冷卻時(shí)間,300s左右才會執(zhí)行scale down铃剔。
通過kubectl scale 來調(diào)整nginx副本數(shù)量到1個(gè)撒桨,觀察集群節(jié)點(diǎn)的變化查刻。
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# kubectl scale deploy nginx-example --replicas 1
deployment.extensions/nginx-example scaled
[root@iZ2ze190o505f86pvk8oisZ cluster-autoscaler]# kubectl get node
NAME STATUS ROLES AGE VERSION
cn-beijing.i-2ze190o505f86pvk8ois Ready master,node 46h v1.12.3
cn-beijing.i-2zeef9b1nhauqusbmn4z Ready node 46h v1.12.3
TODO:
- 模糊調(diào)度
創(chuàng)建多個(gè)伸縮組,每個(gè)伸縮組對應(yīng)不同實(shí)例規(guī)格機(jī)器比如高IO\高內(nèi)存的凤类,不同應(yīng)用彈性伸縮對應(yīng)類型的伸縮組 - cronHPA + autoscaler
由于node彈性伸縮存在一定的時(shí)延穗泵,這個(gè)時(shí)延主要包含:采集時(shí)延(分鐘級) + 判斷時(shí)延(分鐘級) + 伸縮時(shí)延(分鐘級)結(jié)合業(yè)務(wù),根據(jù)時(shí)間段谜疤,自動(dòng)伸縮業(yè)務(wù)(CronHPA)來處理高峰數(shù)據(jù)佃延,底層自動(dòng)彈性伸縮kubernetes node增大容器資源池
參考地址:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/alicloud