Prometheus簡介（基于Kubernetes）

# Prometheus簡介（基于Kubernetes）

本文中不包含Alertmanager和遠(yuǎn)程存儲的內(nèi)容恃锉，下次有時間在補充９啄痢８羰ⅰ犹菱！

1、Prometheus簡介

*Prometheus是一個開源的系統(tǒng)監(jiān)控工具吮炕。根據(jù)配置的任務(wù)（job）以http/s周期性的收刮（scrape/pull）指定目標(biāo)（target）上的指標(biāo)（metric）腊脱。目標(biāo)（target）可以以靜態(tài)方式或者自動發(fā)現(xiàn)方式指定。Prometheus將收刮（scrape）的指標(biāo)（metric）保存在本地或者遠(yuǎn)程存儲上龙亲。

*Prometheus以pull方式來收集指標(biāo)陕凹。對比push方式，pull可以集中配置鳄炉、針對不同的視角搭建不同的監(jiān)控系統(tǒng)杜耙；

*Prometheus于2016年加入CNCF，是繼kubernetes之后拂盯，第二個加入CNCF的開源項目佑女！

1.1、體系結(jié)構(gòu)

Prometheus Server：核心組件谈竿，負(fù)責(zé)收刮和存儲時序數(shù)據(jù)（time series data）团驱，并且提供查詢接口；

Jobs/Exporters：客戶端空凸，監(jiān)控并采集指標(biāo)嚎花，對外暴露HTTP服務(wù)（/metrics）；目前已經(jīng)有很多的軟件原生就支持Prometjeus呀洲，提供/metrics紊选，可以直接使用啼止；對于像操作系統(tǒng)已經(jīng)不提供/metrics的應(yīng)用，可以使用現(xiàn)有的exporters或者開發(fā)自己的exporters來提供/metrics服務(wù)兵罢；

Pushgateway：針對push系統(tǒng)設(shè)計族壳，Short-lived jobs定時將指標(biāo)push到Pushgateway，再由Prometheus Server從Pushgateway上pull趣些；

Alertmanager：報警組件仿荆，根據(jù)實現(xiàn)配置的規(guī)則（rule）進(jìn)行響應(yīng)，例如發(fā)送郵件坏平；

Web UI：Prometheus內(nèi)置一個簡單的Web控制臺拢操，可以查詢指標(biāo)，查看配置信息或者Service Discovery等舶替，實際工作中令境，查看指標(biāo)或者創(chuàng)建儀表盤通常使用Grafana，Prometheus作為Grafana的數(shù)據(jù)源顾瞪；

1.2舔庶、數(shù)據(jù)結(jié)構(gòu)

Prometheus按照時間序列存儲指標(biāo)，每一個指標(biāo)都由Notation + Samples組成：

*Notation：通常有指標(biāo)名稱與一組label組成：

<metric name>{<label name>=<label value>, ...}

*Samples：樣品陈醒，通常包含一個64位的浮點值和一個毫秒級的時間戳

2惕橙、安裝部署

2.1、環(huán)境清單

系統(tǒng)環(huán)境

root@master:~# uname -a

Linux master 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@master:~# kubectl get nodes -o wide

NAME? ? ? STATUS? ? ROLES? ? AGE? ? ? VERSION? ? ? ? ? EXTERNAL-IP? OS-IMAGE? ? ? ? ? ? KERNEL-VERSION? ? CONTAINER-RUNTIME

master? ? Ready? ? master? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce

node1? ? Ready? ? <none>? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce

node2? ? Ready? ? <none>? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce

node3? ? Ready? ? <none>? ? 11d? ? ? v1.9.0+coreos.0? <none>? ? ? ? Ubuntu 16.04.2 LTS? 4.4.0-62-generic? docker://17.12.0-ce

root@master:~# kubectl get pods --all-namespaces -o wide

NAMESPACE? ? NAME? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE

kube-system? calico-node-64btj? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.213? node3

kube-system? calico-node-8wqtc? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.211? node1

kube-system? calico-node-hrmql? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master

kube-system? calico-node-wvgtc? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.212? node2

kube-system? kube-apiserver-master? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master

kube-system? kube-controller-manager-master? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master

kube-system? kube-dns-7d9c4d7876-wxss9? ? ? ? ? ? ? 3/3? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.75.2? ? ? node2

kube-system? kube-dns-7d9c4d7876-xbxbg? ? ? ? ? ? ? 3/3? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.102.129? ? node1

kube-system? kube-proxy-gprzq? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.211? node1

kube-system? kube-proxy-k9gpk? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.213? node3

kube-system? kube-proxy-kwl5c? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.212? node2

kube-system? kube-proxy-plxpc? ? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master

kube-system? kube-scheduler-master? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.210? master

kube-system? kube-state-metrics-868cf44b5f-g8qfj? ? 2/2? ? ? Running? 0? ? ? ? ? 6d? ? ? ? 10.233.102.157? ? node1

kube-system? kubedns-autoscaler-564b455d77-7rm9g? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.75.1? ? ? node2

kube-system? kubernetes-dashboard-767994d8b8-wmzs7? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.75.3? ? ? node2

kube-system? nginx-proxy-node1? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.211? node1

kube-system? nginx-proxy-node2? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.212? node2

kube-system? nginx-proxy-node3? ? ? ? ? ? ? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 192.168.115.213? node3

kube-system? tiller-deploy-f9b69765d-lvw8k? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 11d? ? ? 10.233.71.5? ? ? node3

root@master:~# showmount -e

Export list for master:

/nfs *

創(chuàng)建namespace

root@master:~/kubernetes/prometheus# cat namespace.yml

---

apiVersion: v1

kind: Namespace

metadata:

? name: ns-monitor

? labels:

? ? name: ns-monitor

root@master:~/kubernetes/prometheus# kubectl apply -f namespace.yml

2.2钉跷、部署node-exporter

node-exporter.yml文件內(nèi)容

---

kind: DaemonSet

apiVersion: apps/v1beta2

metadata:

? labels:

? ? app: node-exporter

? name: node-exporter

? namespace: ns-monitor

spec:

? revisionHistoryLimit: 10

? selector:

? ? matchLabels:

? ? ? app: node-exporter

? template:

? ? metadata:

? ? ? labels:

? ? ? ? app: node-exporter

? ? spec:

? ? ? containers:

? ? ? ? - name: node-exporter

? ? ? ? ? image: 192.168.101.88:5000/prom/node-exporter:v0.15.2

? ? ? ? ? ports:

? ? ? ? ? ? - containerPort: 9100

? ? ? ? ? ? ? protocol: TCP

? ? ? hostNetwork: true

? ? ? hostPID: true

? ? ? tolerations:

? ? ? ? - effect: NoSchedule

? ? ? ? ? operator: Exists

---

kind: Service

apiVersion: v1

metadata:

? labels:

? ? app: node-exporter

? name: node-exporter-service

? namespace: ns-monitor

spec:

? ports:

? ? - port: 9100

? ? ? targetPort: 9100

? selector:

? ? app: node-exporter

? clusterIP: None

部署node-exporter

root@master:~/kubernetes/prometheus# kubectl apply -f node-exporter.yml

root@master:~/kubernetes/prometheus# kubectl get pods -n ns-monitor -o wide

NAME? ? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE

node-exporter-br7wz? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.210? master

node-exporter-jzc6f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.212? node2

node-exporter-t9s2f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.213? node3

node-exporter-trh52? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 3h? ? ? ? 192.168.115.211? node1

Node-exporter用于采集kubernetes集群中各個節(jié)點的物理指標(biāo)弥鹦，比如：Memory、CPU等爷辙”蚧担可以直接在每個物理節(jié)點是直接安裝，這里我們使用DaemonSet部署到每個節(jié)點上膝晾，使用 hostNetwork: true 和 hostPID: true 使其獲得Node的物理指標(biāo)信息栓始；

配置tolerations使其在master節(jié)點也啟動一個pod，我的集群默認(rèn)情況下血当，master不參與負(fù)載幻赚；

查看node-exporter指標(biāo)信息

使用瀏覽器訪問任意節(jié)點的9100端口

2.3、部署Prometheus

prometheus.yml文件內(nèi)容

---

apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRole

metadata:

? name: prometheus

rules:

? - apiGroups: [""] # "" indicates the core API group

? ? resources:

? ? ? - nodes

? ? ? - nodes/proxy

? ? ? - services

? ? ? - endpoints

? ? ? - pods

? ? verbs:

? ? ? - get

? ? ? - watch

? ? ? - list

? - apiGroups:

? ? ? - extensions

? ? resources:

? ? ? - ingresses

? ? verbs:

? ? ? - get

? ? ? - watch

? ? ? - list

? - nonResourceURLs: ["/metrics"]

? ? verbs:

? ? ? - get

---

apiVersion: v1

kind: ServiceAccount

metadata:

? name: prometheus

? namespace: ns-monitor

? labels:

? ? app: prometheus

---

apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRoleBinding

metadata:

? name: prometheus

subjects:

? - kind: ServiceAccount

? ? name: prometheus

? ? namespace: ns-monitor

roleRef:

? kind: ClusterRole

? name: prometheus

? apiGroup: rbac.authorization.k8s.io

---

apiVersion: v1

kind: ConfigMap

metadata:

? name: prometheus-conf

? namespace: ns-monitor

? labels:

? ? app: prometheus

data:

? prometheus.yml: |-

? ? # my global config

? ? global:

? ? ? scrape_interval:? ? 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

? ? ? evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

? ? ? # scrape_timeout is set to the global default (10s).

? ? # Alertmanager configuration

? ? alerting:

? ? ? alertmanagers:

? ? ? - static_configs:

? ? ? ? - targets:

? ? ? ? ? # - alertmanager:9093

? ? # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

? ? rule_files:

? ? ? # - "first_rules.yml"

? ? ? # - "second_rules.yml"

? ? # A scrape configuration containing exactly one endpoint to scrape:

? ? # Here it's Prometheus itself.

? ? scrape_configs:

? ? ? # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

? ? ? - job_name: 'prometheus'

? ? ? ? # metrics_path defaults to '/metrics'

? ? ? ? # scheme defaults to 'http'.

? ? ? ? static_configs:

? ? ? ? ? - targets: ['localhost:9090']

? ? ? - job_name: 'grafana'

? ? ? ? static_configs:

? ? ? ? ? - targets:

? ? ? ? ? ? ? - 'grafana-service.ns-monitor:3000'

? ? ? - job_name: 'kubernetes-apiservers'

? ? ? ? kubernetes_sd_configs:

? ? ? ? - role: endpoints

? ? ? ? # Default to scraping over https. If required, just disable this or change to

? ? ? ? # `http`.

? ? ? ? scheme: https

? ? ? ? # This TLS & bearer token file config is used to connect to the actual scrape

? ? ? ? # endpoints for cluster components. This is separate to discovery auth

? ? ? ? # configuration because discovery & scraping are two separate concerns in

? ? ? ? # Prometheus. The discovery auth config is automatic if Prometheus runs inside

? ? ? ? # the cluster. Otherwise, more config options have to be provided within the

? ? ? ? # <kubernetes_sd_config>.

? ? ? ? tls_config:

? ? ? ? ? ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

? ? ? ? ? # If your node certificates are self-signed or use a different CA to the

? ? ? ? ? # master CA, then disable certificate verification below. Note that

? ? ? ? ? # certificate verification is an integral part of a secure infrastructure

? ? ? ? ? # so this should only be disabled in a controlled environment. You can

? ? ? ? ? # disable certificate verification by uncommenting the line below.

? ? ? ? ? #

? ? ? ? ? # insecure_skip_verify: true

? ? ? ? bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

? ? ? ? # Keep only the default/kubernetes service endpoints for the https port. This

? ? ? ? # will add targets for each API server which Kubernetes adds an endpoint to

? ? ? ? # the default/kubernetes service.

? ? ? ? relabel_configs:

? ? ? ? - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

? ? ? ? ? action: keep

? ? ? ? ? regex: default;kubernetes;https

? ? ? # Scrape config for nodes (kubelet).

? ? ? #

? ? ? # Rather than connecting directly to the node, the scrape is proxied though the

? ? ? # Kubernetes apiserver.? This means it will work if Prometheus is running out of

? ? ? # cluster, or can't connect to nodes for some other reason (e.g. because of

? ? ? # firewalling).

? ? ? - job_name: 'kubernetes-nodes'

? ? ? ? # Default to scraping over https. If required, just disable this or change to

? ? ? ? # `http`.

? ? ? ? scheme: https

? ? ? ? # This TLS & bearer token file config is used to connect to the actual scrape

? ? ? ? # endpoints for cluster components. This is separate to discovery auth

? ? ? ? # configuration because discovery & scraping are two separate concerns in

? ? ? ? # Prometheus. The discovery auth config is automatic if Prometheus runs inside

? ? ? ? # the cluster. Otherwise, more config options have to be provided within the

? ? ? ? # <kubernetes_sd_config>.

? ? ? ? tls_config:

? ? ? ? ? ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

? ? ? ? bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

? ? ? ? kubernetes_sd_configs:

? ? ? ? - role: node

? ? ? ? relabel_configs:

? ? ? ? - action: labelmap

? ? ? ? ? regex: __meta_kubernetes_node_label_(.+)

? ? ? ? - target_label: __address__

? ? ? ? ? replacement: kubernetes.default.svc:443

? ? ? ? - source_labels: [__meta_kubernetes_node_name]

? ? ? ? ? regex: (.+)

? ? ? ? ? target_label: __metrics_path__

? ? ? ? ? replacement: /api/v1/nodes/${1}/proxy/metrics

? ? ? # Scrape config for Kubelet cAdvisor.

? ? ? #

? ? ? # This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics

? ? ? # (those whose names begin with 'container_') have been removed from the

? ? ? # Kubelet metrics endpoint.? This job scrapes the cAdvisor endpoint to

? ? ? # retrieve those metrics.

? ? ? #

? ? ? # In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor

? ? ? # HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"

? ? ? # in that case (and ensure cAdvisor's HTTP server hasn't been disabled with

? ? ? # the --cadvisor-port=0 Kubelet flag).

? ? ? #

? ? ? # This job is not necessary and should be removed in Kubernetes 1.6 and

? ? ? # earlier versions, or it will cause the metrics to be scraped twice.

? ? ? - job_name: 'kubernetes-cadvisor'

? ? ? ? # Default to scraping over https. If required, just disable this or change to

? ? ? ? # `http`.

? ? ? ? scheme: https

? ? ? ? # This TLS & bearer token file config is used to connect to the actual scrape

? ? ? ? # endpoints for cluster components. This is separate to discovery auth

? ? ? ? # configuration because discovery & scraping are two separate concerns in

? ? ? ? # Prometheus. The discovery auth config is automatic if Prometheus runs inside

? ? ? ? # the cluster. Otherwise, more config options have to be provided within the

? ? ? ? # <kubernetes_sd_config>.

? ? ? ? tls_config:

? ? ? ? ? ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

? ? ? ? bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

? ? ? ? kubernetes_sd_configs:

? ? ? ? - role: node

? ? ? ? relabel_configs:

? ? ? ? - action: labelmap

? ? ? ? ? regex: __meta_kubernetes_node_label_(.+)

? ? ? ? - target_label: __address__

? ? ? ? ? replacement: kubernetes.default.svc:443

? ? ? ? - source_labels: [__meta_kubernetes_node_name]

? ? ? ? ? regex: (.+)

? ? ? ? ? target_label: __metrics_path__

? ? ? ? ? replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

? ? ? # Scrape config for service endpoints.

? ? ? #

? ? ? # The relabeling allows the actual service scrape endpoint to be configured

? ? ? # via the following annotations:

? ? ? #

? ? ? # * `prometheus.io/scrape`: Only scrape services that have a value of `true`

? ? ? # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need

? ? ? # to set this to `https` & most likely set the `tls_config` of the scrape config.

? ? ? # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.

? ? ? # * `prometheus.io/port`: If the metrics are exposed on a different port to the

? ? ? # service then set this appropriately.

? ? ? - job_name: 'kubernetes-service-endpoints'

? ? ? ? kubernetes_sd_configs:

? ? ? ? - role: endpoints

? ? ? ? relabel_configs:

? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

? ? ? ? ? action: keep

? ? ? ? ? regex: true

? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

? ? ? ? ? action: replace

? ? ? ? ? target_label: __scheme__

? ? ? ? ? regex: (https?)

? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

? ? ? ? ? action: replace

? ? ? ? ? target_label: __metrics_path__

? ? ? ? ? regex: (.+)

? ? ? ? - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

? ? ? ? ? action: replace

? ? ? ? ? target_label: __address__

? ? ? ? ? regex: ([^:]+)(?::\d+)?;(\d+)

? ? ? ? ? replacement: $1:$2

? ? ? ? - action: labelmap

? ? ? ? ? regex: __meta_kubernetes_service_label_(.+)

? ? ? ? - source_labels: [__meta_kubernetes_namespace]

? ? ? ? ? action: replace

? ? ? ? ? target_label: kubernetes_namespace

? ? ? ? - source_labels: [__meta_kubernetes_service_name]

? ? ? ? ? action: replace

? ? ? ? ? target_label: kubernetes_name

? ? ? # Example scrape config for probing services via the Blackbox Exporter.

? ? ? #

? ? ? # The relabeling allows the actual service scrape endpoint to be configured

? ? ? # via the following annotations:

? ? ? #

? ? ? # * `prometheus.io/probe`: Only probe services that have a value of `true`

? ? ? - job_name: 'kubernetes-services'

? ? ? ? metrics_path: /probe

? ? ? ? params:

? ? ? ? ? module: [http_2xx]

? ? ? ? kubernetes_sd_configs:

? ? ? ? - role: service

? ? ? ? relabel_configs:

? ? ? ? - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]

? ? ? ? ? action: keep

? ? ? ? ? regex: true

? ? ? ? - source_labels: [__address__]

? ? ? ? ? target_label: __param_target

? ? ? ? - target_label: __address__

? ? ? ? ? replacement: blackbox-exporter.example.com:9115

? ? ? ? - source_labels: [__param_target]

? ? ? ? ? target_label: instance

? ? ? ? - action: labelmap

? ? ? ? ? regex: __meta_kubernetes_service_label_(.+)

? ? ? ? - source_labels: [__meta_kubernetes_namespace]

? ? ? ? ? target_label: kubernetes_namespace

? ? ? ? - source_labels: [__meta_kubernetes_service_name]

? ? ? ? ? target_label: kubernetes_name

? ? ? # Example scrape config for probing ingresses via the Blackbox Exporter.

? ? ? #

? ? ? # The relabeling allows the actual ingress scrape endpoint to be configured

? ? ? # via the following annotations:

? ? ? #

? ? ? # * `prometheus.io/probe`: Only probe services that have a value of `true`

? ? ? - job_name: 'kubernetes-ingresses'

? ? ? ? metrics_path: /probe

? ? ? ? params:

? ? ? ? ? module: [http_2xx]

? ? ? ? kubernetes_sd_configs:

? ? ? ? ? - role: ingress

? ? ? ? relabel_configs:

? ? ? ? ? - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]

? ? ? ? ? ? action: keep

? ? ? ? ? ? regex: true

? ? ? ? ? - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]

? ? ? ? ? ? regex: (.+);(.+);(.+)

? ? ? ? ? ? replacement: ${1}://${2}${3}

? ? ? ? ? ? target_label: __param_target

? ? ? ? ? - target_label: __address__

? ? ? ? ? ? replacement: blackbox-exporter.example.com:9115

? ? ? ? ? - source_labels: [__param_target]

? ? ? ? ? ? target_label: instance

? ? ? ? ? - action: labelmap

? ? ? ? ? ? regex: __meta_kubernetes_ingress_label_(.+)

? ? ? ? ? - source_labels: [__meta_kubernetes_namespace]

? ? ? ? ? ? target_label: kubernetes_namespace

? ? ? ? ? - source_labels: [__meta_kubernetes_ingress_name]

? ? ? ? ? ? target_label: kubernetes_name

? ? ? # Example scrape config for pods

? ? ? #

? ? ? # The relabeling allows the actual pod scrape endpoint to be configured via the

? ? ? # following annotations:

? ? ? #

? ? ? # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`

? ? ? # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.

? ? ? # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the

? ? ? # pod's declared ports (default is a port-free target if none are declared).

? ? ? - job_name: 'kubernetes-pods'

? ? ? ? kubernetes_sd_configs:

? ? ? ? - role: pod

? ? ? ? relabel_configs:

? ? ? ? - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

? ? ? ? ? action: keep

? ? ? ? ? regex: true

? ? ? ? - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

? ? ? ? ? action: replace

? ? ? ? ? target_label: __metrics_path__

? ? ? ? ? regex: (.+)

? ? ? ? - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

? ? ? ? ? action: replace

? ? ? ? ? regex: ([^:]+)(?::\d+)?;(\d+)

? ? ? ? ? replacement: $1:$2

? ? ? ? ? target_label: __address__

? ? ? ? - action: labelmap

? ? ? ? ? regex: __meta_kubernetes_pod_label_(.+)

? ? ? ? - source_labels: [__meta_kubernetes_namespace]

? ? ? ? ? action: replace

? ? ? ? ? target_label: kubernetes_namespace

? ? ? ? - source_labels: [__meta_kubernetes_pod_name]

? ? ? ? ? action: replace

? ? ? ? ? target_label: kubernetes_pod_name

---

apiVersion: v1

kind: ConfigMap

metadata:

? name: prometheus-rules

? namespace: ns-monitor

? labels:

? ? app: prometheus

data:

? cpu-usage.rule: |

? ? groups:

? ? ? - name: NodeCPUUsage

? ? ? ? rules:

? ? ? ? ? - alert: NodeCPUUsage

? ? ? ? ? ? expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75

? ? ? ? ? ? for: 2m

? ? ? ? ? ? labels:

? ? ? ? ? ? ? severity: "page"

? ? ? ? ? ? annotations:

? ? ? ? ? ? ? summary: "{{$labels.instance}}: High CPU usage detected"

? ? ? ? ? ? ? description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"

---

apiVersion: v1

kind: PersistentVolume

metadata:

? name: "prometheus-data-pv"

? labels:

? ? name: prometheus-data-pv

? ? release: stable

spec:

? capacity:

? ? storage: 5Gi

? accessModes:

? ? - ReadWriteOnce

? persistentVolumeReclaimPolicy: Recycle

? nfs:

? ? path: /nfs/prometheus/data

? ? server: 192.168.115.210

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

? name: prometheus-data-pvc

? namespace: ns-monitor

spec:

? accessModes:

? ? - ReadWriteOnce

? resources:

? ? requests:

? ? ? storage: 5Gi

? selector:

? ? matchLabels:

? ? ? name: prometheus-data-pv

? ? ? release: stable

---

kind: Deployment

apiVersion: apps/v1beta2

metadata:

? labels:

? ? app: prometheus

? name: prometheus

? namespace: ns-monitor

spec:

? replicas: 1

? revisionHistoryLimit: 10

? selector:

? ? matchLabels:

? ? ? app: prometheus

? template:

? ? metadata:

? ? ? labels:

? ? ? ? app: prometheus

? ? spec:

? ? ? serviceAccountName: prometheus

? ? ? securityContext:

? ? ? ? runAsUser: 65534

? ? ? ? fsGroup: 65534

? ? ? containers:

? ? ? ? - name: prometheus

? ? ? ? ? image: 192.168.101.88:5000/prom/prometheus:v2.2.1

? ? ? ? ? volumeMounts:

? ? ? ? ? ? - mountPath: /prometheus

? ? ? ? ? ? ? name: prometheus-data-volume

? ? ? ? ? ? - mountPath: /etc/prometheus/prometheus.yml

? ? ? ? ? ? ? name: prometheus-conf-volume

? ? ? ? ? ? ? subPath: prometheus.yml

? ? ? ? ? ? - mountPath: /etc/prometheus/rules

? ? ? ? ? ? ? name: prometheus-rules-volume

? ? ? ? ? ports:

? ? ? ? ? ? - containerPort: 9090

? ? ? ? ? ? ? protocol: TCP

? ? ? volumes:

? ? ? ? - name: prometheus-data-volume

? ? ? ? ? persistentVolumeClaim:

? ? ? ? ? ? claimName: prometheus-data-pvc

? ? ? ? - name: prometheus-conf-volume

? ? ? ? ? configMap:

? ? ? ? ? ? name: prometheus-conf

? ? ? ? - name: prometheus-rules-volume

? ? ? ? ? configMap:

? ? ? ? ? ? name: prometheus-rules

? ? ? tolerations:

? ? ? ? - key: node-role.kubernetes.io/master

? ? ? ? ? effect: NoSchedule

---

kind: Service

apiVersion: v1

metadata:

? annotations:

? ? prometheus.io/scrape: 'true'

? labels:

? ? app: prometheus

? name: prometheus-service

? namespace: ns-monitor

spec:

? ports:

? ? - port: 9090

? ? ? targetPort: 9090

? selector:

? ? app: prometheus

? type: NodePort

說明：

1歹颓、在啟用了RBAC的Kubernetes環(huán)境中坯屿，為Prometheus配置SA及其相關(guān)權(quán)限油湖；

2巍扛、Prometheus默認(rèn)使用本地存儲，默認(rèn)路徑/prometheus乏德，為其設(shè)置PVC撤奸；

3吠昭、使用CM配置Prometheus的prometheus.yml配置文件，掛載到默認(rèn)路徑/etc/prometheus/prometheus.yml胧瓜；

關(guān)于/etc/prometheus/prometheus.yml的配置參考：官方文檔矢棚。

關(guān)于采集Kubernetes指標(biāo)的配置參考：官方事例。

關(guān)于relabel_configs的配置參考：官方文檔府喳。

4蒲肋、以Deployment部署Prometheus實例并配置相應(yīng)的SVC，使用NodePort暴露服務(wù)钝满；

特別注意：

在掛載prometheus-data-volume的時候兜粘，默認(rèn)情況下，掛載點屬于root用戶弯蚜，其他用戶沒有寫入的權(quán)限孔轴，而Prometheus默認(rèn)的運行用戶是nobody:nogroup，所以在在默認(rèn)情況下直接掛載/prometheus將導(dǎo)致prometheus啟動失敗碎捺，解決辦法：

? ? ? serviceAccountName: prometheus

? ? ? securityContext:

? ? ? ? runAsUser: 65534

? ? ? ? fsGroup: 65534

? ? ? containers:

nobody:nogroup的UID和GID都是65534路鹰，可以通過容器內(nèi)的/etc/passwd查看！

部署Prometheus

root@master:~/kubernetes/prometheus# kubectl apply -f prometheus.yml

root@master:~/kubernetes/prometheus# kubectl get pods -n ns-monitor -o wide

NAME? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE

node-exporter-br7wz? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.210? master

node-exporter-jzc6f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.212? node2

node-exporter-t9s2f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.213? node3

node-exporter-trh52? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.211? node1

prometheus-985cd7c77-766sc? 1/1? ? ? Running? 0? ? ? ? ? 20m? ? ? 10.233.71.47? ? ? node3

查看Prometheus的Web UI

使用瀏覽器訪問Prometheus SVC對應(yīng)的NodePort

查看target

查看service-discovery

Prometheus會根據(jù)/etc/prometheus/promethues.yml中的relabel_configs配置對指標(biāo)進(jìn)行處理收厨，比如：dropped晋柱、replace等

Prometheus自己的指標(biāo)

瀏覽器訪問/metrics

2.4、部署grafana

grafana.yml文件內(nèi)容

apiVersion: v1

kind: PersistentVolume

metadata:

? name: "grafana-data-pv"

? labels:

? ? name: grafana-data-pv

? ? release: stable

spec:

? capacity:

? ? storage: 5Gi

? accessModes:

? ? - ReadWriteOnce

? persistentVolumeReclaimPolicy: Recycle

? nfs:

? ? path: /nfs/grafana/data

? ? server: 192.168.115.210

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

? name: grafana-data-pvc

? namespace: ns-monitor

spec:

? accessModes:

? ? - ReadWriteOnce

? resources:

? ? requests:

? ? ? storage: 5Gi

? selector:

? ? matchLabels:

? ? ? name: grafana-data-pv

? ? ? release: stable

---

kind: Deployment

apiVersion: apps/v1beta2

metadata:

? labels:

? ? app: grafana

? name: grafana

? namespace: ns-monitor

spec:

? replicas: 1

? revisionHistoryLimit: 10

? selector:

? ? matchLabels:

? ? ? app: grafana

? template:

? ? metadata:

? ? ? labels:

? ? ? ? app: grafana

? ? spec:

? ? ? containers:

? ? ? ? - name: grafana

? ? ? ? ? image: 192.168.101.88:5000/grafana/grafana:5.0.4

? ? ? ? ? env:

? ? ? ? ? ? - name: GF_AUTH_BASIC_ENABLED

? ? ? ? ? ? ? value: "true"

? ? ? ? ? ? - name: GF_AUTH_ANONYMOUS_ENABLED

? ? ? ? ? ? ? value: "false"

? ? ? ? ? readinessProbe:

? ? ? ? ? ? httpGet:

? ? ? ? ? ? ? path: /login

? ? ? ? ? ? ? port: 3000

? ? ? ? ? volumeMounts:

? ? ? ? ? ? - mountPath: /var/lib/grafana

? ? ? ? ? ? ? name: grafana-data-volume

? ? ? ? ? ports:

? ? ? ? ? ? - containerPort: 3000

? ? ? ? ? ? ? protocol: TCP

? ? ? volumes:

? ? ? ? - name: grafana-data-volume

? ? ? ? ? persistentVolumeClaim:

? ? ? ? ? ? claimName: grafana-data-pvc

---

kind: Service

apiVersion: v1

metadata:

? labels:

? ? app: grafana

? name: grafana-service

? namespace: ns-monitor

spec:

? ports:

? ? - port: 3000

? ? ? targetPort: 3000

? selector:

? ? app: grafana

? type: NodePort

說明：

1诵叁、使用NFS存儲Grafana數(shù)據(jù)趣斤、啟用基礎(chǔ)權(quán)限認(rèn)證、禁用匿名訪問黎休；

部署Grafana

root@master:~/kubernetes/prometheus# kubectl apply -f grafana.yml

root@master:~/kubernetes/prometheus# kubectl get pods -n ns-monitor -o wide

NAME? ? ? ? ? ? ? ? ? ? ? ? READY? ? STATUS? ? RESTARTS? AGE? ? ? IP? ? ? ? ? ? ? ? NODE

grafana-55494b59d6-6k4km? ? 1/1? ? ? Running? 0? ? ? ? ? 2d? ? ? ? 10.233.71.0? ? ? node3

node-exporter-br7wz? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.210? master

node-exporter-jzc6f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.212? node2

node-exporter-t9s2f? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.213? node3

node-exporter-trh52? ? ? ? ? 1/1? ? ? Running? 0? ? ? ? ? 6h? ? ? ? 192.168.115.211? node1

prometheus-985cd7c77-766sc? 1/1? ? ? Running? 0? ? ? ? ? 20m? ? ? 10.233.71.47? ? ? node3

配置Grafana

登錄Grafana浓领，因為使用NodePort暴露服務(wù)，通過SVC查看端口势腮，默認(rèn)用戶admin/admin

root@master:~/kubernetes/prometheus# kubectl get svc -n ns-monitor

NAME? ? ? ? ? ? ? ? ? ? TYPE? ? ? ? CLUSTER-IP? ? ? EXTERNAL-IP? PORT(S)? ? ? ? ? AGE

grafana-service? ? ? ? NodePort? ? 10.233.13.130? <none>? ? ? ? 3000:32712/TCP? 2d

node-exporter-service? ClusterIP? None? ? ? ? ? ? <none>? ? ? ? 9100/TCP? ? ? ? 6h

prometheus-service? ? ? NodePort? ? 10.233.57.158? <none>? ? ? ? 9090:32014/TCP? 26m

登錄之后联贩，跟隨Grafana的引導(dǎo)完成設(shè)置

將prometheus配置為數(shù)據(jù)源、導(dǎo)入Prometheus和Grafana的Dashboard

導(dǎo)入Kubernetes的Dashboard模版捎拯，下文附下載鏈接

查看Dashboard

Dashboard中的每一個Panel可以自行編輯泪幌、保存和回滾！

如果instance下拉框顯示有問題署照，點擊右上方的設(shè)置（settings）～變量（Variables）祸泪，修改$instance變量的Regex值，可以直接清空建芙；

配置數(shù)據(jù)源没隘、導(dǎo)入Dashboard、安裝插件等這些操作可以配置到grafana.yml文件中禁荸，但是配置過程比較麻煩右蒲，這里先提供在界面上操作的說明阀湿，后期需要再處理。

3瑰妄、參考資料

https://prometheus.io/docs/

http://docs.grafana.org/

https://github.com/prometheus/prometheus/tree/release-2.2/documentation/examples

https://github.com/giantswarm/kubernetes-prometheus

https://github.com/zalando-incubator/kubernetes-on-aws/pull/861

http://yunlzheng.github.io/2018/01/17/prometheus-sd-and-relabel/

4陷嘴、附件下載

Kubernetes的Grafana監(jiān)控模版：https://pan.baidu.com/s/1y7HDQCPXy9LCAzA01uzIBQ

---------------------

作者：迷途的攻城獅（798570156）

來源：CSDN

原文：https://blog.csdn.net/chenleiking/article/details/80009529

版權(quán)聲明：本文為博主原創(chuàng)文章，轉(zhuǎn)載請附上博文鏈接间坐！

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末灾挨，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子竹宋，更是在濱河造成了極大的恐慌涨醋，老刑警劉巖，帶你破解...
沈念sama閱讀 206,602評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件逝撬，死亡現(xiàn)場離奇詭異浴骂，居然都是意外死亡，警方通過查閱死者的電腦和手機宪潮，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,442評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門溯警，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人狡相，你說我怎么就攤上這事梯轻。” “怎么了尽棕？”我有些...
開封第一講書人閱讀 152,878評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵喳挑，是天一觀的道長。經(jīng)常有香客問我滔悉，道長伊诵，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,306評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任回官，我火速辦了婚禮曹宴，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘歉提。我一直安慰自己笛坦，他們只是感情好，可當(dāng)我...
茶點故事閱讀 64,330評論 5贊 373
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著，像睡著了一般走孽。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上载佳，一...
開封第一講書人閱讀 49,071評論 1贊 285
城市分裂傳說
那天，我揣著相機與錄音，去河邊找鬼楚午。笑死宴偿，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的诀豁。我是一名探鬼主播窄刘，決...
沈念sama閱讀 38,382評論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼舷胜！你這毒婦竟也來了娩践？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,006評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤烹骨，失蹤者是張志新（化名）和其女友劉穎翻伺，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體沮焕，經(jīng)...
沈念sama閱讀 43,512評論 1贊 300
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡吨岭，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,965評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了峦树。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片辣辫。...
茶點故事閱讀 38,094評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖魁巩，靈堂內(nèi)的尸體忽然破棺而出急灭，到底是詐尸還是另有隱情，我是刑警寧澤谷遂，帶...
沈念sama閱讀 33,732評論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布葬馋，位于F島的核電站，受9級特大地震影響肾扰，放射性物質(zhì)發(fā)生泄漏畴嘶。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 39,283評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一集晚、第九天我趴在偏房一處隱蔽的房頂上張望掠廓。院中可真熱鬧，春花似錦甩恼、人聲如沸蟀瞧。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,286評論 0贊 19
一樁弒父案条摸，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽悦污。三九已至，卻和暖如春钉蒲，著一層夾襖步出監(jiān)牢的瞬間切端，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,512評論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工顷啼，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留踏枣，地道東北人昌屉。一個月前我還...
沈念sama閱讀 45,536評論 2贊 354
代替公主和親
正文我出身青樓，卻偏偏與公主長得像茵瀑，于是被迫代替她去往敵國和親间驮。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,828評論 2贊 345

Prometheus簡介（基于Kubernetes）

1.1、體系結(jié)構(gòu)

1.2舔庶、數(shù)據(jù)結(jié)構(gòu)

推薦閱讀更多精彩內(nèi)容