文檔說明
實驗環(huán)境:kubernetes Version v1.10.9
網(wǎng)絡CNI:fannel
存儲CSI: NFS Dynamic Class
DNS: CoreDNS
背景
在學習Prometheus Operator 的部署,Prometheus 在代碼上就已經(jīng)對 Kubernetes 有了原生的支持荠割,可以通過服務發(fā)現(xiàn)的形式來自動監(jiān)控集群反浓,因此我們可以使用另外一種更加高級的方式來部署 Prometheus:Operator 框架址愿。
Prometheus-Operator的架構圖:
上圖是Prometheus-Operator官方提供的架構圖,其中Operator是最核心的部分落塑,作為一個控制器,他會去創(chuàng)建Prometheus
、ServiceMonitor
厌衙、AlertManage
r以及PrometheusRule
4個CRD資源對象,然后會一直監(jiān)控并維持這4個資源對象的狀態(tài)绞绒。
其中創(chuàng)建的prometheus這種資源對象就是作為Prometheus Server存在婶希,而ServiceMonitor就是exporter的各種抽象,exporter前面我們已經(jīng)學習了蓬衡,是用來提供專門提供metrics數(shù)據(jù)接口的工具喻杈,Prometheus就是通過ServiceMonitor提供的metrics數(shù)據(jù)接口去 pull 數(shù)據(jù)的彤枢,當然alertmanager這種資源對象就是對應的AlertManager的抽象,而PrometheusRule是用來被Prometheus實例使用的報警規(guī)則文件筒饰。
這樣我們要在集群中監(jiān)控什么數(shù)據(jù)缴啡,就變成了直接去操作 Kubernetes 集群的資源對象了,是不是方便很多了龄砰。上圖中的 Service
和 ServiceMonitor
都是 Kubernetes 的資源盟猖,一個 ServiceMonitor 可以通過 labelSelector 的方式去匹配一類 Service,Prometheus 也可以通過 labelSelector 去匹配多個ServiceMonitor换棚。
安裝Operator
$ git clone https://github.com/coreos/prometheus-operator
$ cd contrib/kube-prometheus/manifests/
$ ls
00namespace-namespace.yaml node-exporter-clusterRole.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml node-exporter-daemonset.yaml
......
prometheus-serviceMonitorKubelet.yaml
進行簡單的修改式镐,因為默認情況下,這個 ServiceMonitor 是關聯(lián)的 kubelet 的10250端口去采集的節(jié)點數(shù)據(jù)固蚤,而我們前面說過為了安全娘汞,這個 metrics 數(shù)據(jù)已經(jīng)遷移到10255這個只讀端口上,我們只需要將文件中的https-metrics
更改成http-metrics
即可
root@k8s-master-1:~/k8s_manifests/prometheus-operator# cat prometheus-serviceMonitorKubelet.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kubelet
name: kubelet
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 30s
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: true
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 30s
path: /metrics/cadvisor
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kubelet
修改完成后,直接在該文件夾下面執(zhí)行創(chuàng)建資源命令即可:
root@k8s-master-1:~/k8s_manifests/prometheus-operator# kubectl apply -f .
root@k8s-master-1:~/k8s_manifests/prometheus-operator# ls
00namespace-namespace.yaml grafana-dashboardDefinitions.yaml node-exporter-clusterRole.yaml prometheus-clusterRoleBinding.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml grafana-dashboardSources.yaml node-exporter-clusterRoleBinding.yaml prometheus-k8s-ingress.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml grafana-deployment.yaml node-exporter-daemonset-017.yaml prometheus-kubeSchedulerService.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml grafana-ingress.yaml node-exporter-daemonset.yaml prometheus-prometheus.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml grafana-service.yaml node-exporter-service.yaml prometheus-roleBindingConfig.yaml
0prometheus-operator-clusterRole.yaml grafana-serviceAccount.yaml node-exporter-serviceAccount.yaml prometheus-roleBindingSpecificNamespaces.yaml
0prometheus-operator-clusterRoleBinding.yaml kube-controller-manager-endpoints.yaml node-exporter-serviceMonitor.yaml prometheus-roleConfig.yaml
0prometheus-operator-deployment.yaml kube-controller-manager-service.yaml prometheus-adapter-apiService.yaml prometheus-roleSpecificNamespaces.yaml
0prometheus-operator-service.yaml kube-scheduler-endpoints.yaml prometheus-adapter-clusterRole.yaml prometheus-rules.yaml
0prometheus-operator-serviceAccount.yaml kube-scheduler-service.yaml prometheus-adapter-clusterRoleBinding.yaml prometheus-service.yaml
0prometheus-operator-serviceMonitor.yaml kube-state-metrics-clusterRole.yaml prometheus-adapter-clusterRoleBindingDelegator.yaml prometheus-serviceAccount.yaml
alertmanager-alertmanager.yaml kube-state-metrics-clusterRoleBinding.yaml prometheus-adapter-clusterRoleServerResources.yaml prometheus-serviceMonitor.yaml
alertmanager-secret.yaml kube-state-metrics-deployment.yaml prometheus-adapter-configMap.yaml prometheus-serviceMonitorApiserver.yaml
alertmanager-service.yaml kube-state-metrics-role.yaml prometheus-adapter-deployment.yaml prometheus-serviceMonitorCoreDNS.yaml
alertmanager-serviceAccount.yaml kube-state-metrics-roleBinding.yaml prometheus-adapter-roleBindingAuthReader.yaml prometheus-serviceMonitorKubeControllerManager.yaml
alertmanager-serviceMonitor.yaml kube-state-metrics-service.yaml prometheus-adapter-service.yaml prometheus-serviceMonitorKubeScheduler.yaml
coredns-metrics-service.yaml kube-state-metrics-serviceAccount.yaml prometheus-adapter-serviceAccount.yaml prometheus-serviceMonitorKubelet.yaml
grafana-dashboardDatasources.yaml kube-state-metrics-serviceMonitor.yaml prometheus-clusterRole.yaml
部署完成后夕玩,會創(chuàng)建一個名為monitoring的 namespace你弦,所以資源對象對將部署在改命名空間下面,此外 Operator 會自動創(chuàng)建4個 CRD 資源對象:
root@k8s-master-1:~/k8s_manifests/prometheus-operator# kubectl get crd |grep coreos
alertmanagers.monitoring.coreos.com 17d
prometheuses.monitoring.coreos.com 17d
prometheusrules.monitoring.coreos.com 17d
servicemonitors.monitoring.coreos.com 17d
可以在 monitoring 命名空間下面查看所有的 Pod燎孟,其中 alertmanager 和 prometheus 是用 StatefulSet 控制器管理的禽作,其中還有一個比較核心的 prometheus-operator 的 Pod,用來控制其他資源對象和監(jiān)聽對象變化的
root@k8s-master-1:~/k8s_manifests/prometheus-operator# kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 4 17d
alertmanager-main-1 2/2 Running 6 17d
alertmanager-main-2 2/2 Running 4 17d
grafana-6d6c4d998d-v9djh 1/1 Running 0 8d
kube-state-metrics-5c5c6f7f8f-frwpk 4/4 Running 0 14d
loki-5c5d8d7d7d-gcvcx 1/1 Running 0 8d
loki-grafana-996d8c8fc-shm29 1/1 Running 0 8d
loki-promtail-cpqq9 1/1 Running 0 8d
loki-promtail-k786c 1/1 Running 0 8d
loki-promtail-lmmn2 1/1 Running 0 8d
loki-promtail-xlb8b 1/1 Running 0 8d
node-exporter-8gdh4 2/2 Running 2 14d
node-exporter-cdmbk 2/2 Running 0 14d
node-exporter-pqzbf 2/2 Running 0 14d
node-exporter-x4968 2/2 Running 0 14d
prometheus-adapter-69466cc54b-vgqpg 1/1 Running 2 17d
prometheus-k8s-0 3/3 Running 8 13d
prometheus-k8s-1 3/3 Running 3 11d
prometheus-operator-56954c76b5-rjlbq 1/1 Running 0 14d
root@k8s-master-1:~/k8s_manifests/prometheus-operator# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.68.209.89 <none> 9093/TCP 17d
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 17d
grafana ClusterIP 10.68.149.168 <none> 3000/TCP 17d
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 17d
loki ClusterIP 10.68.118.118 <none> 3100/TCP 8d
loki-grafana ClusterIP 10.68.77.53 <none> 80/TCP 8d
node-exporter ClusterIP None <none> 9100/TCP 17d
prometheus-adapter ClusterIP 10.68.217.16 <none> 443/TCP 17d
prometheus-k8s ClusterIP 10.68.193.174 <none> 9090/TCP 17d
prometheus-operated ClusterIP None <none> 9090/TCP 17d
prometheus-operator ClusterIP None <none> 8080/TCP 17d
部署Ingress 允許外部訪問
root@k8s-master-1:~/k8s_manifests/prometheus-operator# cat grafana-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: grafana-ui
namespace: monitoring
spec:
rules:
- host: grafana.k8s.io
http:
paths:
- backend:
serviceName: grafana
servicePort: service
path: /
status:
loadBalancer: {}
監(jiān)控二進制組件
由于當前集群的部署方式揩页,Master的核心組件Kube-scheduler和kube-controller-manager是通過二進制文件啟動旷偿,而不是以Pod的形式,這是一個非常重要的概念
就和 ServiceMonitor 的定義有關系了爆侣,我們先來查看下 kube-scheduler 組件對應的 ServiceMonitor 資源的定義:(prometheus-serviceMonitorKubeScheduler.yaml
)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http-metrics
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-scheduler
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: monitoring
spec:
endpoints:
- interval: 30s
metricRelabelings:
- action: drop
regex: etcd_(debugging|disk|request|server).*
sourceLabels:
- __name__
port: http-metrics
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-controller-manager
上面是一個典型的 ServiceMonitor 資源文件的聲明方式萍程,上面我們通過selector.matchLabels在 kube-system 這個命名空間下面匹配具有k8s-app=kube-scheduler這樣的 Service,但是我們系統(tǒng)中根本就沒有對應的 Service兔仰,所以我們需要手動創(chuàng)建一個 Service:
kube-controller-manager-service.yaml
kube-scheduler-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: kube-system
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
protocol: TCP
targetPort: 10251
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
kube-controller-manager.service 監(jiān)聽的地址改成0.0.0.0
ExecStart=/opt/kube/bin/kube-controller-manager
--address=0.0.0.0
--master=http://0.0.0.0:8080
kube-scheduler.service 監(jiān)聽的地址改成0.0.0.0
ExecStart=/opt/kube/bin/kube-scheduler
--address=0.0.0.0
--master=http://0.0.0.0:8080 \