# Prometheus簡介(基于Kubernetes)
本文中不包含Alertmanager和遠(yuǎn)程存儲的內(nèi)容恃锉,下次有時間在補充9啄痢8羰ⅰ犹菱!
1、Prometheus簡介
*Prometheus是一個開源的系統(tǒng)監(jiān)控工具吮炕。根據(jù)配置的任務(wù)(job)以http/s周期性的收刮(scrape/pull)指定目標(biāo)(target)上的指標(biāo)(metric)腊脱。目標(biāo)(target)可以以靜態(tài)方式或者自動發(fā)現(xiàn)方式指定。Prometheus將收刮(scrape)的指標(biāo)(metric)保存在本地或者遠(yuǎn)程存儲上龙亲。
*Prometheus以pull方式來收集指標(biāo)陕凹。對比push方式,pull可以集中配置鳄炉、針對不同的視角搭建不同的監(jiān)控系統(tǒng)杜耙;
*Prometheus于2016年加入CNCF,是繼kubernetes之后拂盯,第二個加入CNCF的開源項目佑女!
1.1、體系結(jié)構(gòu)
Prometheus Server:核心組件谈竿,負(fù)責(zé)收刮和存儲時序數(shù)據(jù)(time series data)团驱,并且提供查詢接口;
Jobs/Exporters:客戶端空凸,監(jiān)控并采集指標(biāo)嚎花,對外暴露HTTP服務(wù)(/metrics);目前已經(jīng)有很多的軟件原生就支持Prometjeus呀洲,提供/metrics紊选,可以直接使用啼止;對于像操作系統(tǒng)已經(jīng)不提供/metrics的應(yīng)用,可以使用現(xiàn)有的exporters或者開發(fā)自己的exporters來提供/metrics服務(wù)兵罢;
Pushgateway:針對push系統(tǒng)設(shè)計族壳,Short-lived jobs定時將指標(biāo)push到Pushgateway,再由Prometheus Server從Pushgateway上pull趣些;
Alertmanager:報警組件仿荆,根據(jù)實現(xiàn)配置的規(guī)則(rule)進(jìn)行響應(yīng),例如發(fā)送郵件坏平;
Web UI:Prometheus內(nèi)置一個簡單的Web控制臺拢操,可以查詢指標(biāo),查看配置信息或者Service Discovery等舶替,實際工作中令境,查看指標(biāo)或者創(chuàng)建儀表盤通常使用Grafana,Prometheus作為Grafana的數(shù)據(jù)源顾瞪;
1.2舔庶、數(shù)據(jù)結(jié)構(gòu)
Prometheus按照時間序列存儲指標(biāo),每一個指標(biāo)都由Notation + Samples組成:
*Notation:通常有指標(biāo)名稱與一組label組成:
<metric name>{<label name>=<label value>, ...}
*Samples:樣品陈醒,通常包含一個64位的浮點值和一個毫秒級的時間戳
2惕橙、安裝部署
2.1、環(huán)境清單
系統(tǒng)環(huán)境
root@master:~# uname -a
Linux master 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@master:~# kubectl get nodes -o wide
NAME? ? ? STATUS? ? ROLES? ? AGE? ? ? VERSION? ? ? ? ? EXTERNAL-IP? OS-IMAGE? ? ? ? ? ? KERNEL-VERSION? ? CONTAINER-RUNTIME
master? ? Ready? ? master? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce
node1? ? Ready? ? <none>? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce
node2? ? Ready? ? <none>? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce
node3? ? Ready? ? <none>? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce
root@master:~# kubectl get pods --all-namespaces -o wide
NAMESPACE? ? NAME? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE
kube-system? calico-node-64btj? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.213? node3
kube-system? calico-node-8wqtc? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.211? node1
kube-system? calico-node-hrmql? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master
kube-system? calico-node-wvgtc? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.212? node2
kube-system? kube-apiserver-master? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master
kube-system? kube-controller-manager-master? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master
kube-system? kube-dns-7d9c4d7876-wxss9? ? ? ? ? ? ? 3/3? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.75.2? ? ? node2
kube-system? kube-dns-7d9c4d7876-xbxbg? ? ? ? ? ? ? 3/3? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.102.129? ? node1
kube-system? kube-proxy-gprzq? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.211? node1
kube-system? kube-proxy-k9gpk? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.213? node3
kube-system? kube-proxy-kwl5c? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.212? node2
kube-system? kube-proxy-plxpc? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master
kube-system? kube-scheduler-master? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master
kube-system? kube-state-metrics-868cf44b5f-g8qfj? ? 2/2? ? ? Running? 0? ? ? ? ? 6d? ? ? ? 10.233.102.157? ? node1
kube-system? kubedns-autoscaler-564b455d77-7rm9g? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.75.1? ? ? node2
kube-system? kubernetes-dashboard-767994d8b8-wmzs7? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.75.3? ? ? node2
kube-system? nginx-proxy-node1? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.211? node1
kube-system? nginx-proxy-node2? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.212? node2
kube-system? nginx-proxy-node3? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.213? node3
kube-system? tiller-deploy-f9b69765d-lvw8k? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.71.5? ? ? node3
root@master:~# showmount -e
Export list for master:
/nfs *
創(chuàng)建namespace
root@master:~/kubernetes/prometheus# cat namespace.yml
---
apiVersion: v1
kind: Namespace
metadata:
? name: ns-monitor
? labels:
? ? name: ns-monitor
root@master:~/kubernetes/prometheus# kubectl apply -f namespace.yml
2.2钉跷、部署node-exporter
node-exporter.yml文件內(nèi)容
---
kind: DaemonSet
apiVersion: apps/v1beta2
metadata:
? labels:
? ? app: node-exporter
? name: node-exporter
? namespace: ns-monitor
spec:
? revisionHistoryLimit: 10
? selector:
? ? matchLabels:
? ? ? app: node-exporter
? template:
? ? metadata:
? ? ? labels:
? ? ? ? app: node-exporter
? ? spec:
? ? ? containers:
? ? ? ? - name: node-exporter
? ? ? ? ? image: 192.168.101.88:5000/prom/node-exporter:v0.15.2
? ? ? ? ? ports:
? ? ? ? ? ? - containerPort: 9100
? ? ? ? ? ? ? protocol: TCP
? ? ? hostNetwork: true
? ? ? hostPID: true
? ? ? tolerations:
? ? ? ? - effect: NoSchedule
? ? ? ? ? operator: Exists
---
kind: Service
apiVersion: v1
metadata:
? labels:
? ? app: node-exporter
? name: node-exporter-service
? namespace: ns-monitor
spec:
? ports:
? ? - port: 9100
? ? ? targetPort: 9100
? selector:
? ? app: node-exporter
? clusterIP: None
部署node-exporter
root@master:~/kubernetes/prometheus# kubectl apply -f node-exporter.yml
root@master:~/kubernetes/prometheus# kubectl get pods -n ns-monitor -o wide
NAME? ? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE
node-exporter-br7wz? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.210? master
node-exporter-jzc6f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.212? node2
node-exporter-t9s2f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.213? node3
node-exporter-trh52? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.211? node1
Node-exporter用于采集kubernetes集群中各個節(jié)點的物理指標(biāo)弥鹦,比如:Memory、CPU等爷辙”蚧担可以直接在每個物理節(jié)點是直接安裝,這里我們使用DaemonSet部署到每個節(jié)點上膝晾,使用 hostNetwork: true 和 hostPID: true 使其獲得Node的物理指標(biāo)信息栓始;
配置tolerations使其在master節(jié)點也啟動一個pod,我的集群默認(rèn)情況下血当,master不參與負(fù)載幻赚;
查看node-exporter指標(biāo)信息
使用瀏覽器訪問任意節(jié)點的9100端口
2.3、部署Prometheus
prometheus.yml文件內(nèi)容
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
? name: prometheus
rules:
? - apiGroups: [""] # "" indicates the core API group
? ? resources:
? ? ? - nodes
? ? ? - nodes/proxy
? ? ? - services
? ? ? - endpoints
? ? ? - pods
? ? verbs:
? ? ? - get
? ? ? - watch
? ? ? - list
? - apiGroups:
? ? ? - extensions
? ? resources:
? ? ? - ingresses
? ? verbs:
? ? ? - get
? ? ? - watch
? ? ? - list
? - nonResourceURLs: ["/metrics"]
? ? verbs:
? ? ? - get
---
apiVersion: v1
kind: ServiceAccount
metadata:
? name: prometheus
? namespace: ns-monitor
? labels:
? ? app: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
? name: prometheus
subjects:
? - kind: ServiceAccount
? ? name: prometheus
? ? namespace: ns-monitor
roleRef:
? kind: ClusterRole
? name: prometheus
? apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
? name: prometheus-conf
? namespace: ns-monitor
? labels:
? ? app: prometheus
data:
? prometheus.yml: |-
? ? # my global config
? ? global:
? ? ? scrape_interval:? ? 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
? ? ? evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
? ? ? # scrape_timeout is set to the global default (10s).
? ? # Alertmanager configuration
? ? alerting:
? ? ? alertmanagers:
? ? ? - static_configs:
? ? ? ? - targets:
? ? ? ? ? # - alertmanager:9093
? ? # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
? ? rule_files:
? ? ? # - "first_rules.yml"
? ? ? # - "second_rules.yml"
? ? # A scrape configuration containing exactly one endpoint to scrape:
? ? # Here it's Prometheus itself.
? ? scrape_configs:
? ? ? # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
? ? ? - job_name: 'prometheus'
? ? ? ? # metrics_path defaults to '/metrics'
? ? ? ? # scheme defaults to 'http'.
? ? ? ? static_configs:
? ? ? ? ? - targets: ['localhost:9090']
? ? ? - job_name: 'grafana'
? ? ? ? static_configs:
? ? ? ? ? - targets:
? ? ? ? ? ? ? - 'grafana-service.ns-monitor:3000'
? ? ? - job_name: 'kubernetes-apiservers'
? ? ? ? kubernetes_sd_configs:
? ? ? ? - role: endpoints
? ? ? ? # Default to scraping over https. If required, just disable this or change to
? ? ? ? # `http`.
? ? ? ? scheme: https
? ? ? ? # This TLS & bearer token file config is used to connect to the actual scrape
? ? ? ? # endpoints for cluster components. This is separate to discovery auth
? ? ? ? # configuration because discovery & scraping are two separate concerns in
? ? ? ? # Prometheus. The discovery auth config is automatic if Prometheus runs inside
? ? ? ? # the cluster. Otherwise, more config options have to be provided within the
? ? ? ? # <kubernetes_sd_config>.
? ? ? ? tls_config:
? ? ? ? ? ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
? ? ? ? ? # If your node certificates are self-signed or use a different CA to the
? ? ? ? ? # master CA, then disable certificate verification below. Note that
? ? ? ? ? # certificate verification is an integral part of a secure infrastructure
? ? ? ? ? # so this should only be disabled in a controlled environment. You can
? ? ? ? ? # disable certificate verification by uncommenting the line below.
? ? ? ? ? #
? ? ? ? ? # insecure_skip_verify: true
? ? ? ? bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
? ? ? ? # Keep only the default/kubernetes service endpoints for the https port. This
? ? ? ? # will add targets for each API server which Kubernetes adds an endpoint to
? ? ? ? # the default/kubernetes service.
? ? ? ? relabel_configs:
? ? ? ? - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
? ? ? ? ? action: keep
? ? ? ? ? regex: default;kubernetes;https
? ? ? # Scrape config for nodes (kubelet).
? ? ? #
? ? ? # Rather than connecting directly to the node, the scrape is proxied though the
? ? ? # Kubernetes apiserver.? This means it will work if Prometheus is running out of
? ? ? # cluster, or can't connect to nodes for some other reason (e.g. because of
? ? ? # firewalling).
? ? ? - job_name: 'kubernetes-nodes'
? ? ? ? # Default to scraping over https. If required, just disable this or change to
? ? ? ? # `http`.
? ? ? ? scheme: https
? ? ? ? # This TLS & bearer token file config is used to connect to the actual scrape
? ? ? ? # endpoints for cluster components. This is separate to discovery auth
? ? ? ? # configuration because discovery & scraping are two separate concerns in
? ? ? ? # Prometheus. The discovery auth config is automatic if Prometheus runs inside
? ? ? ? # the cluster. Otherwise, more config options have to be provided within the
? ? ? ? # <kubernetes_sd_config>.
? ? ? ? tls_config:
? ? ? ? ? ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
? ? ? ? bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
? ? ? ? kubernetes_sd_configs:
? ? ? ? - role: node
? ? ? ? relabel_configs:
? ? ? ? - action: labelmap
? ? ? ? ? regex: __meta_kubernetes_node_label_(.+)
? ? ? ? - target_label: __address__
? ? ? ? ? replacement: kubernetes.default.svc:443
? ? ? ? - source_labels: [__meta_kubernetes_node_name]
? ? ? ? ? regex: (.+)
? ? ? ? ? target_label: __metrics_path__
? ? ? ? ? replacement: /api/v1/nodes/${1}/proxy/metrics
? ? ? # Scrape config for Kubelet cAdvisor.
? ? ? #
? ? ? # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
? ? ? # (those whose names begin with 'container_') have been removed from the
? ? ? # Kubelet metrics endpoint.? This job scrapes the cAdvisor endpoint to
? ? ? # retrieve those metrics.
? ? ? #
? ? ? # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
? ? ? # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
? ? ? # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
? ? ? # the --cadvisor-port=0 Kubelet flag).
? ? ? #
? ? ? # This job is not necessary and should be removed in Kubernetes 1.6 and
? ? ? # earlier versions, or it will cause the metrics to be scraped twice.
? ? ? - job_name: 'kubernetes-cadvisor'
? ? ? ? # Default to scraping over https. If required, just disable this or change to
? ? ? ? # `http`.
? ? ? ? scheme: https
? ? ? ? # This TLS & bearer token file config is used to connect to the actual scrape
? ? ? ? # endpoints for cluster components. This is separate to discovery auth
? ? ? ? # configuration because discovery & scraping are two separate concerns in
? ? ? ? # Prometheus. The discovery auth config is automatic if Prometheus runs inside
? ? ? ? # the cluster. Otherwise, more config options have to be provided within the
? ? ? ? # <kubernetes_sd_config>.
? ? ? ? tls_config:
? ? ? ? ? ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
? ? ? ? bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
? ? ? ? kubernetes_sd_configs:
? ? ? ? - role: node
? ? ? ? relabel_configs:
? ? ? ? - action: labelmap
? ? ? ? ? regex: __meta_kubernetes_node_label_(.+)
? ? ? ? - target_label: __address__
? ? ? ? ? replacement: kubernetes.default.svc:443
? ? ? ? - source_labels: [__meta_kubernetes_node_name]
? ? ? ? ? regex: (.+)
? ? ? ? ? target_label: __metrics_path__
? ? ? ? ? replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
? ? ? # Scrape config for service endpoints.
? ? ? #
? ? ? # The relabeling allows the actual service scrape endpoint to be configured
? ? ? # via the following annotations:
? ? ? #
? ? ? # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
? ? ? # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
? ? ? # to set this to `https` & most likely set the `tls_config` of the scrape config.
? ? ? # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
? ? ? # * `prometheus.io/port`: If the metrics are exposed on a different port to the
? ? ? # service then set this appropriately.
? ? ? - job_name: 'kubernetes-service-endpoints'
? ? ? ? kubernetes_sd_configs:
? ? ? ? - role: endpoints
? ? ? ? relabel_configs:
? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
? ? ? ? ? action: keep
? ? ? ? ? regex: true
? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
? ? ? ? ? action: replace
? ? ? ? ? target_label: __scheme__
? ? ? ? ? regex: (https?)
? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
? ? ? ? ? action: replace
? ? ? ? ? target_label: __metrics_path__
? ? ? ? ? regex: (.+)
? ? ? ? - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
? ? ? ? ? action: replace
? ? ? ? ? target_label: __address__
? ? ? ? ? regex: ([^:]+)(?::\d+)?;(\d+)
? ? ? ? ? replacement: $1:$2
? ? ? ? - action: labelmap
? ? ? ? ? regex: __meta_kubernetes_service_label_(.+)
? ? ? ? - source_labels: [__meta_kubernetes_namespace]
? ? ? ? ? action: replace
? ? ? ? ? target_label: kubernetes_namespace
? ? ? ? - source_labels: [__meta_kubernetes_service_name]
? ? ? ? ? action: replace
? ? ? ? ? target_label: kubernetes_name
? ? ? # Example scrape config for probing services via the Blackbox Exporter.
? ? ? #
? ? ? # The relabeling allows the actual service scrape endpoint to be configured
? ? ? # via the following annotations:
? ? ? #
? ? ? # * `prometheus.io/probe`: Only probe services that have a value of `true`
? ? ? - job_name: 'kubernetes-services'
? ? ? ? metrics_path: /probe
? ? ? ? params:
? ? ? ? ? module: [http_2xx]
? ? ? ? kubernetes_sd_configs:
? ? ? ? - role: service
? ? ? ? relabel_configs:
? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
? ? ? ? ? action: keep
? ? ? ? ? regex: true
? ? ? ? - source_labels: [__address__]
? ? ? ? ? target_label: __param_target
? ? ? ? - target_label: __address__
? ? ? ? ? replacement: blackbox-exporter.example.com:9115
? ? ? ? - source_labels: [__param_target]
? ? ? ? ? target_label: instance
? ? ? ? - action: labelmap
? ? ? ? ? regex: __meta_kubernetes_service_label_(.+)
? ? ? ? - source_labels: [__meta_kubernetes_namespace]
? ? ? ? ? target_label: kubernetes_namespace
? ? ? ? - source_labels: [__meta_kubernetes_service_name]
? ? ? ? ? target_label: kubernetes_name
? ? ? # Example scrape config for probing ingresses via the Blackbox Exporter.
? ? ? #
? ? ? # The relabeling allows the actual ingress scrape endpoint to be configured
? ? ? # via the following annotations:
? ? ? #
? ? ? # * `prometheus.io/probe`: Only probe services that have a value of `true`
? ? ? - job_name: 'kubernetes-ingresses'
? ? ? ? metrics_path: /probe
? ? ? ? params:
? ? ? ? ? module: [http_2xx]
? ? ? ? kubernetes_sd_configs:
? ? ? ? ? - role: ingress
? ? ? ? relabel_configs:
? ? ? ? ? - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
? ? ? ? ? ? action: keep
? ? ? ? ? ? regex: true
? ? ? ? ? - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
? ? ? ? ? ? regex: (.+);(.+);(.+)
? ? ? ? ? ? replacement: ${1}://${2}${3}
? ? ? ? ? ? target_label: __param_target
? ? ? ? ? - target_label: __address__
? ? ? ? ? ? replacement: blackbox-exporter.example.com:9115
? ? ? ? ? - source_labels: [__param_target]
? ? ? ? ? ? target_label: instance
? ? ? ? ? - action: labelmap
? ? ? ? ? ? regex: __meta_kubernetes_ingress_label_(.+)
? ? ? ? ? - source_labels: [__meta_kubernetes_namespace]
? ? ? ? ? ? target_label: kubernetes_namespace
? ? ? ? ? - source_labels: [__meta_kubernetes_ingress_name]
? ? ? ? ? ? target_label: kubernetes_name
? ? ? # Example scrape config for pods
? ? ? #
? ? ? # The relabeling allows the actual pod scrape endpoint to be configured via the
? ? ? # following annotations:
? ? ? #
? ? ? # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
? ? ? # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
? ? ? # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
? ? ? # pod's declared ports (default is a port-free target if none are declared).
? ? ? - job_name: 'kubernetes-pods'
? ? ? ? kubernetes_sd_configs:
? ? ? ? - role: pod
? ? ? ? relabel_configs:
? ? ? ? - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
? ? ? ? ? action: keep
? ? ? ? ? regex: true
? ? ? ? - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
? ? ? ? ? action: replace
? ? ? ? ? target_label: __metrics_path__
? ? ? ? ? regex: (.+)
? ? ? ? - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
? ? ? ? ? action: replace
? ? ? ? ? regex: ([^:]+)(?::\d+)?;(\d+)
? ? ? ? ? replacement: $1:$2
? ? ? ? ? target_label: __address__
? ? ? ? - action: labelmap
? ? ? ? ? regex: __meta_kubernetes_pod_label_(.+)
? ? ? ? - source_labels: [__meta_kubernetes_namespace]
? ? ? ? ? action: replace
? ? ? ? ? target_label: kubernetes_namespace
? ? ? ? - source_labels: [__meta_kubernetes_pod_name]
? ? ? ? ? action: replace
? ? ? ? ? target_label: kubernetes_pod_name
---
apiVersion: v1
kind: ConfigMap
metadata:
? name: prometheus-rules
? namespace: ns-monitor
? labels:
? ? app: prometheus
data:
? cpu-usage.rule: |
? ? groups:
? ? ? - name: NodeCPUUsage
? ? ? ? rules:
? ? ? ? ? - alert: NodeCPUUsage
? ? ? ? ? ? expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75
? ? ? ? ? ? for: 2m
? ? ? ? ? ? labels:
? ? ? ? ? ? ? severity: "page"
? ? ? ? ? ? annotations:
? ? ? ? ? ? ? summary: "{{$labels.instance}}: High CPU usage detected"
? ? ? ? ? ? ? description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
---
apiVersion: v1
kind: PersistentVolume
metadata:
? name: "prometheus-data-pv"
? labels:
? ? name: prometheus-data-pv
? ? release: stable
spec:
? capacity:
? ? storage: 5Gi
? accessModes:
? ? - ReadWriteOnce
? persistentVolumeReclaimPolicy: Recycle
? nfs:
? ? path: /nfs/prometheus/data
? ? server: 192.168.115.210
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
? name: prometheus-data-pvc
? namespace: ns-monitor
spec:
? accessModes:
? ? - ReadWriteOnce
? resources:
? ? requests:
? ? ? storage: 5Gi
? selector:
? ? matchLabels:
? ? ? name: prometheus-data-pv
? ? ? release: stable
---
kind: Deployment
apiVersion: apps/v1beta2
metadata:
? labels:
? ? app: prometheus
? name: prometheus
? namespace: ns-monitor
spec:
? replicas: 1
? revisionHistoryLimit: 10
? selector:
? ? matchLabels:
? ? ? app: prometheus
? template:
? ? metadata:
? ? ? labels:
? ? ? ? app: prometheus
? ? spec:
? ? ? serviceAccountName: prometheus
? ? ? securityContext:
? ? ? ? runAsUser: 65534
? ? ? ? fsGroup: 65534
? ? ? containers:
? ? ? ? - name: prometheus
? ? ? ? ? image: 192.168.101.88:5000/prom/prometheus:v2.2.1
? ? ? ? ? volumeMounts:
? ? ? ? ? ? - mountPath: /prometheus
? ? ? ? ? ? ? name: prometheus-data-volume
? ? ? ? ? ? - mountPath: /etc/prometheus/prometheus.yml
? ? ? ? ? ? ? name: prometheus-conf-volume
? ? ? ? ? ? ? subPath: prometheus.yml
? ? ? ? ? ? - mountPath: /etc/prometheus/rules
? ? ? ? ? ? ? name: prometheus-rules-volume
? ? ? ? ? ports:
? ? ? ? ? ? - containerPort: 9090
? ? ? ? ? ? ? protocol: TCP
? ? ? volumes:
? ? ? ? - name: prometheus-data-volume
? ? ? ? ? persistentVolumeClaim:
? ? ? ? ? ? claimName: prometheus-data-pvc
? ? ? ? - name: prometheus-conf-volume
? ? ? ? ? configMap:
? ? ? ? ? ? name: prometheus-conf
? ? ? ? - name: prometheus-rules-volume
? ? ? ? ? configMap:
? ? ? ? ? ? name: prometheus-rules
? ? ? tolerations:
? ? ? ? - key: node-role.kubernetes.io/master
? ? ? ? ? effect: NoSchedule
---
kind: Service
apiVersion: v1
metadata:
? annotations:
? ? prometheus.io/scrape: 'true'
? labels:
? ? app: prometheus
? name: prometheus-service
? namespace: ns-monitor
spec:
? ports:
? ? - port: 9090
? ? ? targetPort: 9090
? selector:
? ? app: prometheus
? type: NodePort
說明:
1歹颓、在啟用了RBAC的Kubernetes環(huán)境中坯屿,為Prometheus配置SA及其相關(guān)權(quán)限油湖;
2巍扛、Prometheus默認(rèn)使用本地存儲,默認(rèn)路徑/prometheus乏德,為其設(shè)置PVC撤奸;
3吠昭、使用CM配置Prometheus的prometheus.yml配置文件,掛載到默認(rèn)路徑/etc/prometheus/prometheus.yml胧瓜;
關(guān)于/etc/prometheus/prometheus.yml的配置參考:官方文檔矢棚。
關(guān)于采集Kubernetes指標(biāo)的配置參考:官方事例。
關(guān)于relabel_configs的配置參考:官方文檔府喳。
4蒲肋、以Deployment部署Prometheus實例并配置相應(yīng)的SVC,使用NodePort暴露服務(wù)钝满;
特別注意:
在掛載prometheus-data-volume的時候兜粘,默認(rèn)情況下,掛載點屬于root用戶弯蚜,其他用戶沒有寫入的權(quán)限孔轴,而Prometheus默認(rèn)的運行用戶是nobody:nogroup,所以在在默認(rèn)情況下直接掛載/prometheus將導(dǎo)致prometheus啟動失敗碎捺,解決辦法:
? ? ? serviceAccountName: prometheus
? ? ? securityContext:
? ? ? ? runAsUser: 65534
? ? ? ? fsGroup: 65534
? ? ? containers:
nobody:nogroup的UID和GID都是65534路鹰,可以通過容器內(nèi)的/etc/passwd查看!
部署Prometheus
root@master:~/kubernetes/prometheus# kubectl apply -f prometheus.yml
root@master:~/kubernetes/prometheus# kubectl get pods -n ns-monitor -o wide
NAME? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE
node-exporter-br7wz? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.210? master
node-exporter-jzc6f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.212? node2
node-exporter-t9s2f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.213? node3
node-exporter-trh52? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.211? node1
prometheus-985cd7c77-766sc? 1/1? ? ? Running? 0? ? ? ? ? 20m? ? ? 10.233.71.47? ? ? node3
查看Prometheus的Web UI
使用瀏覽器訪問Prometheus SVC對應(yīng)的NodePort
查看target
查看service-discovery
Prometheus會根據(jù)/etc/prometheus/promethues.yml中的relabel_configs配置對指標(biāo)進(jìn)行處理收厨,比如:dropped晋柱、replace等
Prometheus自己的指標(biāo)
瀏覽器訪問/metrics
2.4、部署grafana
grafana.yml文件內(nèi)容
apiVersion: v1
kind: PersistentVolume
metadata:
? name: "grafana-data-pv"
? labels:
? ? name: grafana-data-pv
? ? release: stable
spec:
? capacity:
? ? storage: 5Gi
? accessModes:
? ? - ReadWriteOnce
? persistentVolumeReclaimPolicy: Recycle
? nfs:
? ? path: /nfs/grafana/data
? ? server: 192.168.115.210
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
? name: grafana-data-pvc
? namespace: ns-monitor
spec:
? accessModes:
? ? - ReadWriteOnce
? resources:
? ? requests:
? ? ? storage: 5Gi
? selector:
? ? matchLabels:
? ? ? name: grafana-data-pv
? ? ? release: stable
---
kind: Deployment
apiVersion: apps/v1beta2
metadata:
? labels:
? ? app: grafana
? name: grafana
? namespace: ns-monitor
spec:
? replicas: 1
? revisionHistoryLimit: 10
? selector:
? ? matchLabels:
? ? ? app: grafana
? template:
? ? metadata:
? ? ? labels:
? ? ? ? app: grafana
? ? spec:
? ? ? containers:
? ? ? ? - name: grafana
? ? ? ? ? image: 192.168.101.88:5000/grafana/grafana:5.0.4
? ? ? ? ? env:
? ? ? ? ? ? - name: GF_AUTH_BASIC_ENABLED
? ? ? ? ? ? ? value: "true"
? ? ? ? ? ? - name: GF_AUTH_ANONYMOUS_ENABLED
? ? ? ? ? ? ? value: "false"
? ? ? ? ? readinessProbe:
? ? ? ? ? ? httpGet:
? ? ? ? ? ? ? path: /login
? ? ? ? ? ? ? port: 3000
? ? ? ? ? volumeMounts:
? ? ? ? ? ? - mountPath: /var/lib/grafana
? ? ? ? ? ? ? name: grafana-data-volume
? ? ? ? ? ports:
? ? ? ? ? ? - containerPort: 3000
? ? ? ? ? ? ? protocol: TCP
? ? ? volumes:
? ? ? ? - name: grafana-data-volume
? ? ? ? ? persistentVolumeClaim:
? ? ? ? ? ? claimName: grafana-data-pvc
---
kind: Service
apiVersion: v1
metadata:
? labels:
? ? app: grafana
? name: grafana-service
? namespace: ns-monitor
spec:
? ports:
? ? - port: 3000
? ? ? targetPort: 3000
? selector:
? ? app: grafana
? type: NodePort
說明:
1诵叁、使用NFS存儲Grafana數(shù)據(jù)趣斤、啟用基礎(chǔ)權(quán)限認(rèn)證、禁用匿名訪問黎休;
部署Grafana
root@master:~/kubernetes/prometheus# kubectl apply -f grafana.yml
root@master:~/kubernetes/prometheus# kubectl get pods -n ns-monitor -o wide
NAME? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE
grafana-55494b59d6-6k4km? ? 1/1? ? ? Running? 0? ? ? ? ? 2d? ? ? ? 10.233.71.0? ? ? node3
node-exporter-br7wz? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.210? master
node-exporter-jzc6f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.212? node2
node-exporter-t9s2f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.213? node3
node-exporter-trh52? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.211? node1
prometheus-985cd7c77-766sc? 1/1? ? ? Running? 0? ? ? ? ? 20m? ? ? 10.233.71.47? ? ? node3
配置Grafana
登錄Grafana浓领,因為使用NodePort暴露服務(wù),通過SVC查看端口势腮,默認(rèn)用戶admin/admin
root@master:~/kubernetes/prometheus# kubectl get svc -n ns-monitor
NAME? ? ? ? ? ? ? ? ? ? TYPE? ? ? ? CLUSTER-IP? ? ? EXTERNAL-IP? PORT(S)? ? ? ? ? AGE
grafana-service? ? ? ? NodePort? ? 10.233.13.130? <none>? ? ? ? 3000:32712/TCP? 2d
node-exporter-service? ClusterIP? None? ? ? ? ? ? <none>? ? ? ? 9100/TCP? ? ? ? 6h
prometheus-service? ? ? NodePort? ? 10.233.57.158? <none>? ? ? ? 9090:32014/TCP? 26m
登錄之后联贩,跟隨Grafana的引導(dǎo)完成設(shè)置
將prometheus配置為數(shù)據(jù)源、導(dǎo)入Prometheus和Grafana的Dashboard
導(dǎo)入Kubernetes的Dashboard模版捎拯,下文附下載鏈接
查看Dashboard
Dashboard中的每一個Panel可以自行編輯泪幌、保存和回滾!
如果instance下拉框顯示有問題署照,點擊右上方的設(shè)置(settings)~變量(Variables)祸泪,修改$instance變量的Regex值,可以直接清空建芙;
配置數(shù)據(jù)源没隘、導(dǎo)入Dashboard、安裝插件等這些操作可以配置到grafana.yml文件中禁荸,但是配置過程比較麻煩右蒲,這里先提供在界面上操作的說明阀湿,后期需要再處理。
3瑰妄、參考資料
https://prometheus.io/docs/
http://docs.grafana.org/
https://github.com/prometheus/prometheus/tree/release-2.2/documentation/examples
https://github.com/giantswarm/kubernetes-prometheus
https://github.com/zalando-incubator/kubernetes-on-aws/pull/861
http://yunlzheng.github.io/2018/01/17/prometheus-sd-and-relabel/
4陷嘴、附件下載
Kubernetes的Grafana監(jiān)控模版:https://pan.baidu.com/s/1y7HDQCPXy9LCAzA01uzIBQ
---------------------
作者:迷途的攻城獅(798570156)
來源:CSDN
原文:https://blog.csdn.net/chenleiking/article/details/80009529
版權(quán)聲明:本文為博主原創(chuàng)文章,轉(zhuǎn)載請附上博文鏈接间坐!