目標版本: kubernetes 1.17.5
集群系統(tǒng): centos 7
注: 不同版本的kubernetes性能測試有差異井仰,該指南不保證在其他版本有效
前言: kubemark是一個節(jié)點模擬工具挫酿,目的是用于測試大規(guī)模集群下kubernetes接口時延和調(diào)度的性能猜敢。
我們需要搭建A,B兩個集群乐纸,B是我們需要測試的集群,A是跑kubemark負載的集群鲁沥,
kubemark容器在A中以deployment的形式部署瘫筐,每個副本都會以節(jié)點的形式注冊到B中,
部署完成后在A中有多少個kubemark的副本喂链,B集群就有多少個node節(jié)點
現(xiàn)假設已有兩個可用集群A返十,B
編譯并打包kubemark鏡像
下載kurbernetes源碼,版本需要和B集群一致
編譯kubemark二進制文件
./hack/build-go.sh cmd/kubemark/
cp $GOPATH/src/k8s.io/kubernetes/_output/bin/kubemark $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/
- 構(gòu)建kubemark鏡像
cd $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/
make build
- 創(chuàng)建namespace椭微、configmap洞坑、secret、rbac
kubectl create ns kubemark
kubectl create cm node-configmap --from-literal=content.type="" --from-file=kernel.monitor="kernel-monitor.json" -n kubemark (./test/kubemark/resources/kernel-monitor.json)
kubectl create secret generic kubeconfig --type=Opaque --from-file=kubelet.kubeconfig=kubemark.kubeconfig --from-file=kubeproxy.kubeconfig=kubemark.kubeconfig --from-file=npd.kubeconfig=kubemark.kubeconfig --from-file=heapster.kubeconfig=kubemark.kubeconfig --from-file=cluster_autoscaler.kubeconfig=kubemark.kubeconfig --from-file=dns.kubeconfig=kubemark.kubeconfig(kubermark.kubeconfig是B集群的kubeconfig /root/.kube/config)
kubectl apply -f addons/ -n kubemark(./test/kubemark/resources/manifests/addons)
- 創(chuàng)建kubemark負載
kubectl apply -f hollow-node_template.yaml -n kubemark
參考配置:
apiVersion: v1
kind: ReplicationController
metadata:
name: hollow-node
labels:
name: hollow-node
spec:
replicas: 200
selector:
name: hollow-node
template:
metadata:
labels:
name: hollow-node
spec:
initContainers:
- name: init-inotify-limit
image: busybox
command: ['sysctl', '-w', 'fs.inotify.max_user_instances=1000']
securityContext:
privileged: true
volumes:
- name: kubeconfig-volume
secret:
secretName: kubeconfig
- name: kernelmonitorconfig-volume
configMap:
name: node-configmap
- name: logs-volume
hostPath:
path: /var/log
- name: no-serviceaccount-access-to-real-master
emptyDir: {}
containers:
- name: hollow-kubelet
image: test.cargo.io/release/kubemark:latest
ports:
- containerPort: 4194
- containerPort: 10250
- containerPort: 10255
env:
- name: CONTENT_TYPE
valueFrom:
configMapKeyRef:
name: node-configmap
key: content.type
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /kubemark --morph=kubelet --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubelet.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubelet-$(NODE_NAME).log 2>&1
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 20m
memory: 50M
securityContext:
privileged: true
- name: hollow-proxy
image: test.cargo.io/release/kubemark:latest
env:
- name: CONTENT_TYPE
valueFrom:
configMapKeyRef:
name: node-configmap
key: content.type
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /kubemark --morph=proxy --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubeproxy.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubeproxy-$(NODE_NAME).log 2>&1
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 20m
memory: 50M
- name: hollow-node-problem-detector
image: test.cargo.io/release/node-problem-detector:v0.8.0
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /node-problem-detector --system-log-monitors=/config/kernel.monitor --apiserver-override="https://192.168.0.16:6443?inClusterConfig=false&auth=/kubeconfig/npd.kubeconfig" --alsologtostderr 1>>/var/log/npd-$(NODE_NAME).log 2>&1
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: kernelmonitorconfig-volume
mountPath: /config
readOnly: true
- name: no-serviceaccount-access-to-real-master
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 20m
memory: 50M
securityContext:
privileged: true
- 運行e2e性能用例
這里有一個大坑蝇率,網(wǎng)上的搜到的文章大多數(shù)都是編譯e2e的二進制文件直接運行
make WHAT="test/e2e/e2e.test"
./e2e.test --kube-master=192.168.0.16 --host=https://192.168.0.16:6443 --ginkgo.focus="\[Performance\]" --provider=local --kubeconfig=kubemark.kubeconfig --num-nodes=10 --v=3 --ginkgo.failFast --e2e-output-dir=. --report-dir=.
但其實e2e的性能用例已經(jīng)被移出主庫了 https://github.com/kubernetes/kubernetes/pull/83322迟杂,所以在2019.10.1之后出的版本用上面的命令是無法運行性能測試的
利用perf-tests庫進行性能測試(切換到B集群的master節(jié)點運行)
下載對應kubernetes版本的perf-tests源碼 https://github.com/kubernetes/perf-tests
運行測試命令(節(jié)點數(shù)需要>=100,否則需要改動job.yaml里面的參數(shù))
./run-e2e.sh --testconfig=job.yaml --kubeconfig=config.yaml --provider=local --masterip=192.168.0.16,192.168.0.23,192.168.0.40 --mastername=kube-master-1,kube-master-2,kube-master-3 --master-internal-ip=192.168.0.16,192.168.0.23,192.168.0.40 --enable-prometheus-server --tear-down-prometheus-server=false
問題調(diào)試:
如果targets的端口需要修改本慕,直接編輯service并修改相應的endpoints
-
收集etcd-metrics的時候會報錯: etcdmetrics: failed to collect etcd database size排拷,是由于腳本直接通過2379無法拉取數(shù)據(jù),需要修改源碼:
perf-tests/clusterloader2/pkg/measurement/common/etcd_metrics.go "curl http://localhost:2379/metrics" -> "curl -L https://localhost:2379/metrics --key /etc/kubernetes/etcd/etcd.key --cert /etc/kubernetes/etcd/etcd.crt --insecure""
-
prometheus的target列表中有幾個接口可能沒法直接收集數(shù)據(jù)锅尘,會報401的權(quán)限問題监氢,解決辦法如下
編輯對應的servicemonitor,在endpoints字段里加上bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 5s
port: apiserver
scheme: https
tlsConfig:
insecureSkipVerify: true
kubelet加上--authentication-token-webhook=true,--authorization-mode=Webhook并重啟
把system:kubelet-api-admin的ClusterRole綁定到prometheus-k8s的ServiceAccount上面
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-k8s-1
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kubelet-api-admin
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring