1 Prometheus是什么

摘自Prometheus官網(wǎng)：

From metrics to insight

Power your metrics and alerting with a leading open-source monitoring solution.

翻譯過來就是涩维，Prometheus是一個領(lǐng)先的開源監(jiān)控解決方案

2 Prometheus特點

2.1 高維數(shù)據(jù)模型（Highly Dimensional Data Model）及時序列（Time Series）

Prometheus把所有的數(shù)據(jù)按時序列進(jìn)行存儲，所有采集上來的數(shù)據(jù)在都被打了時間戳并按時間先后順序進(jìn)行流化，這些數(shù)據(jù)屬于相同的指標(biāo)名以及一組標(biāo)簽維度（labeled dimensions）

2.1.1 時序列

A time series is a series of data points indexed (or listed or graphed) in time order.

時序列是按照時間順序排列的一系列數(shù)據(jù)點

2.1.2 指標(biāo)名（Metric names）和標(biāo)簽（Labels）

每個時序列被唯一識別為它的一個指標(biāo)名（metric name）以及一組鍵值對（key-value pairs）麸锉，鍵值對就是我們所說的標(biāo)簽（labels）

指標(biāo)名指定了被測量系統(tǒng)的一個常規(guī)特征，比如：http_requests_total - 收到的HTTP請求總數(shù)蹲盘。指標(biāo)名的命名必須符合正則表達(dá)式規(guī)則[a-zA-Z_:][a-zA-Z0-9_:]*

標(biāo)簽使Prometheus的維度數(shù)據(jù)模型成為可能驼唱，對于一個指標(biāo)名給定任意一組標(biāo)簽的組合都能標(biāo)識一個特定的指標(biāo)維度實例（particular dimensional instantiation of that metric）彤委，例如：通過POST方法訪問 /api/tracks 的HTTP請求铐尚。標(biāo)簽名的命名必須符合正則表達(dá)式規(guī)則[a-zA-Z_][a-zA-Z0-9_]*

2.1.3 記號（Notation）

給定一個指標(biāo)名和一組標(biāo)簽拨脉，時序列通過Notation被標(biāo)記：

 <metric name>{<label name>=<label value>, ...}

例如，一個指標(biāo)名為 api_http_requests_total 且標(biāo)簽為 method="POST" 和 handler="/messages" 的時序列可以寫成：

api_http_requests_total{method="POST", handler="/messages"}

2.2 PromQL

建立在高維數(shù)據(jù)模型上的查詢語言宣增，這里暫不展開

2.3 高效存儲

Prometheus存儲時序列數(shù)據(jù)于內(nèi)存和本地磁盤中玫膀，不依賴分布式存儲，單節(jié)點工作爹脾。擴(kuò)展通過功能分片和聯(lián)邦來實現(xiàn)

2.4 可視化效果出眾

通過與Grafana集成帖旨，能夠為使用者提供非常直觀且漂亮的可視化效果

2.5 通過拉取方式采集數(shù)據(jù)，或者通過中間網(wǎng)關(guān)推送方式采集數(shù)據(jù)

2.6 通過服務(wù)發(fā)現(xiàn)或者靜態(tài)配置來發(fā)現(xiàn)監(jiān)控目標(biāo)

3 架構(gòu)及組件說明

3.1 架構(gòu)圖

image1

3.2 組件說明

3.2.1 Prometheus Server

負(fù)責(zé)數(shù)據(jù)采集和存儲灵妨，提供PromQL查詢語言的支持

3.2.2 Push gateway

負(fù)責(zé)支持short-lived jobs解阅，push gateway能夠讓臨時（ephemeral）或批處理job暴露他們的指標(biāo)給Prometheus。因為臨時和批處理job很可能并不長期存在泌霍，所以Prometheus無法抓到相應(yīng)的數(shù)據(jù)货抄，取而代之，我們讓這樣的job把指標(biāo)數(shù)據(jù)主動推送給push gateway朱转，之后push gateway再把這些指標(biāo)數(shù)據(jù)暴露給Prometheus蟹地。push gateway就像一個指標(biāo)緩存，并不負(fù)責(zé)計算藤为。

3.2.3 Exporters

Exporters幫助把第三方系統(tǒng)的既有指標(biāo)輸出為Prometheus指標(biāo)怪与。
我們可以在 exporter default port wiki上查看Exporters的列表，也可以在
EXPORTERS AND INTEGRATIONS上查看列表

3.2.4 Alertmanager

Alertmanager負(fù)責(zé)處理從客戶端應(yīng)用（例如：Prometheus）發(fā)送過來的警報缅疟，把接受到的信息去重分别、分組，并把他們路由到正確的接收器存淫，如PagerDuty, OpsGenie耘斩。Alertmanager還負(fù)責(zé)警報信息的消聲或抑制

4 Prometheus優(yōu)缺點

Prometheus對于采集純數(shù)字值的時間序列非常在行，所以它既適合以物理機為中心的監(jiān)控纫雁，也適合監(jiān)控高度動態(tài)的面向服務(wù)的架構(gòu)體煌往。在微服務(wù)領(lǐng)域，它的多維數(shù)據(jù)采集以及查詢非常獨到且很有競爭力轧邪。
Prometheus最大的價值在于可靠性刽脖，用戶可以在任何時候看到整個被監(jiān)控系統(tǒng)的統(tǒng)計信息，即使在系統(tǒng)有問題的時候忌愚。但它不能做到100%的精確曲管，比如如果你要按request數(shù)計費，那么Prometheus未必能采集到所有的請求硕糊，這個時候Prometheus就不太合適了院水。

5 Prometheus與Kubernetes

Prometheus是Kubernetes的近親腊徙，Google公布的Kubernetes派生于他們的Borg集群系統(tǒng)，而Prometheus與Borgmon共享基礎(chǔ)設(shè)計概念檬某， Borgmon是與Borg的監(jiān)控系統(tǒng)∏颂冢現(xiàn)在Prometheus和kubernetes都被云原生計算基金會（CNCF）所掌管。技術(shù)層面上Kubernetes會把它內(nèi)部的指標(biāo)數(shù)據(jù)以Prometheus可以接受的格式暴露出來恢恼。

5.1 Prometheus與Kubernetes集成的方式

Prometheus Operator
kube-prometheus
kubernetes addon

5.1.1 Prometheus Operator

Operator是CoreOS引入的一種操作其他軟件民傻，并把人們收集到的操作知識集成到部署過程的軟件。
Prometheus Operator可以方便的讓用戶安裝Prometheus场斑，并用簡單的聲明式配置來管理和配置Prometheus實例漓踢。其核心思想在于解耦Prometheus實例的部署與針對被監(jiān)控實體的配置，使Prometheus運行在Kubernetes上的步驟盡可能的簡單漏隐。

Prometheus向Kubernetes中引入了額外的資源用于聲明期望Prometheus及Alertmanager集群所達(dá)到的狀態(tài)喧半，這些資源包括：

Prometheus
Alertmanager
ServiceMonitor
PrometheusRule

Altermanager不在本文范圍之內(nèi)，以后的文章單獨陳述青责。

Prometheus資源聲明式的描述了部署Prometheus部署時所期望達(dá)到的狀態(tài)挺据，而ServiceMonitor描述了一組被Prometheus所監(jiān)控的目標(biāo)。

image2

上圖的Operator用來確保在任意時間對于每個處于Kubernetes集群中的Prometheus資源都有一組按照期望配置的Prometheus Server在運行爽柒。每個Prometheus實例又與各自的配置綁定在一起吴菠，這些配置指定了該監(jiān)視哪些目標(biāo)從而抓取指標(biāo)。
用戶可以手動指定這些配置浩村，或者讓Operator基于ServiceMonitor生成出來。ServiceMonitor資源指定如何從一組服務(wù)中獲取指標(biāo)占哟。而Prometheus資源對象可以通過標(biāo)簽（labels）動態(tài)的引入ServiceMonitor對象心墅，Operator設(shè)置Prometheus實例來監(jiān)控所有被ServiceMonitor所覆蓋的服務(wù)并保持配置與集群中的變化同步。

5.1.2 kube-prometheus

kube-prometheus把Prometheus Operator和一系列manifests結(jié)合起來榨乎，幫助用戶從監(jiān)控Kubernetes自身及跑在上面的應(yīng)用開始怎燥，提供全棧的監(jiān)控配置。

6 部署Prometheus

6.1 部署環(huán)境

序號	節(jié)點名	角色	內(nèi)存	IP	版本
1	master.mike.com	master	2GB	192.168.56.101	v1.13.4
2	node1.mike.com	node	2GB	192.168.56.102	v1.13.4
3	node2.mike.com	node	2GB	192.168.56.103	v1.13.4

6.2 通過kube-prometheus快速部署Prometheus

6.2.1 前提

準(zhǔn)備好一個kubernetes集群蜜暑，這里參見章節(jié)6.1
確認(rèn)以下flag在kubernetes集群中被設(shè)置好铐姚，目的在于告訴kubelet使用token來認(rèn)證及鑒權(quán)，這可以允許更細(xì)粒度及更簡單的訪問控制：
- ---authentication-token-webhook=true
- --authorization-mode=Webhook
因為kube-prometheus已經(jīng)包含了資源指標(biāo)API服務(wù)（resource metrics API server）肛捍，這與metrics-server功能相同隐绵，所以，如果kubernetes集群要是已經(jīng)部署了metrics-server拙毫，則先卸載掉metrics-server依许，如果沒有，則略過此項缀蹄。

6.2.2 克隆kube-prometheus倉庫

[root@master ~]# git clone https://github.com/coreos/kube-prometheus.git
Cloning into 'kube-prometheus'...
remote: Enumerating objects: 49, done.
remote: Counting objects: 100% (49/49), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 5763 (delta 19), reused 19 (delta 2), pack-reused 5714
Receiving objects: 100% (5763/5763), 3.72 MiB | 714.00 KiB/s, done.
Resolving deltas: 100% (3396/3396), done.

6.2.3 快速部署監(jiān)控棧

筆者的網(wǎng)絡(luò)環(huán)境位于圍墻之外峭跳，所以當(dāng)通過資源文件創(chuàng)建kubernetes中的各種資源時膘婶，kubernetes自動的從各種庫中下載docker所需要的鏡像文件而未受到任何的阻礙，如果部署的環(huán)境位于墻內(nèi)蛀醉，則需要預(yù)先下載好所有資源文件中涉及的鏡像文件到集群中的節(jié)點上悬襟。

創(chuàng)建資源

[root@master kube-prometheus]# kubectl create -f manifests/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-k8s-cluster-rsrc-use created
configmap/grafana-dashboard-k8s-node-rsrc-use created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created

檢查資源就緒狀況

[root@master kube-prometheus]# until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
NAME                                    CREATED AT
servicemonitors.monitoring.coreos.com   2019-05-13T05:38:41Z

[root@master kube-prometheus]# until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
NAMESPACE    NAME                      AGE
monitoring   alertmanager              28s
monitoring   coredns                   25s
monitoring   grafana                   27s
monitoring   kube-apiserver            25s
monitoring   kube-controller-manager   25s
monitoring   kube-scheduler            25s
monitoring   kube-state-metrics        27s
monitoring   kubelet                   25s
monitoring   node-exporter             27s
monitoring   prometheus                25s
monitoring   prometheus-operator       28s

7 訪問Prometheus

通過kube-Prometheus在kubernetes集群中部署的Prometheus，默認(rèn)情況下拯刁，只能在集群內(nèi)部訪問脊岳，其建立起來的所有service的類型（type）都是ClusterIP:

[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       ClusterIP   10.102.26.246    <none>        9093/TCP            84m
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   83m
grafana                 ClusterIP   10.111.252.205   <none>        3000/TCP            84m
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   84m
node-exporter           ClusterIP   None             <none>        9100/TCP            84m
prometheus-adapter      ClusterIP   10.100.63.156    <none>        443/TCP             84m
prometheus-k8s          ClusterIP   10.108.202.31    <none>        9090/TCP            83m
prometheus-operated     ClusterIP   None             <none>        9090/TCP            83m
prometheus-operator     ClusterIP   None             <none>        8080/TCP            84m

如果想在集群外部訪問Prometheus，就需要把服務(wù)暴露給外界筛璧，下面提供具體的方式

7.1 通過修改NodePort從外界訪問Promethus

7.1.1 修改prometheus-k8s服務(wù)的類型

編輯prometheus-service.yaml文件逸绎，修改為如下內(nèi)容（注意nodePort及type屬性）：

apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  ports:
  - name: web
    nodePort: 32090
    port: 9090
    targetPort: web
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP
  type: NodePort

應(yīng)用prometheus-service.yaml

[root@master manifests]# kubectl apply -f prometheus-service.yaml --force
service/prometheus-k8s created

檢查prometheus-k8s service修改后的類型及對應(yīng)的nodePort:

[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       ClusterIP   10.102.26.246    <none>        9093/TCP            139m
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   139m
grafana                 ClusterIP   10.111.252.205   <none>        3000/TCP            139m
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   139m
node-exporter           ClusterIP   None             <none>        9100/TCP            139m
prometheus-adapter      ClusterIP   10.100.63.156    <none>        443/TCP             139m
prometheus-k8s          NodePort    10.96.91.11      <none>        9090:32090/TCP      19s
prometheus-operated     ClusterIP   None             <none>        9090/TCP            138m
prometheus-operator     ClusterIP   None             <none>        8080/TCP            139m

7.1.2 通過nodePort及節(jié)點IP訪問Promethus

我們已知master節(jié)點的IP為192.168.56.101，且在上面我們修改nodePort為32090夭谤，所以從集群外的節(jié)點可以訪問地址：http://192.168.56.101:32090

首頁如圖：

Image3.png

到這里棺牧，其實我們已經(jīng)可以通過Prometheus的界面以及PromQL來查詢我們已經(jīng)獲取的所有指標(biāo)信息了，當(dāng)然Prometheus提供的界面相對簡陋朗儒，功能比較簡單颊乘，所以我們可以繼續(xù)下面的操作，通過訪問已經(jīng)部署好的Granfana來以更加美觀的各種圖標(biāo)來展示Prometheus中獲得的數(shù)據(jù)醉锄。

7.2 通過nodePort及節(jié)點IP訪問Grafana

7.2.1 修改grafana服務(wù)的類型

編輯grafana-service.yaml文件乏悄，修改為如下內(nèi)容（注意nodePort及type屬性）：

apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 32030
  selector:
    app: grafana
  type: NodePort

應(yīng)用grafana-service.yaml

[root@master manifests]# kubectl apply -f grafana-service.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/grafana configured

檢查grafana service修改后的類型及對應(yīng)的nodePort:

[root@master manifests]# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       ClusterIP   10.102.26.246    <none>        9093/TCP            174m
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   173m
grafana                 NodePort    10.111.252.205   <none>        3000:32030/TCP      174m
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   174m
node-exporter           ClusterIP   None             <none>        9100/TCP            174m
prometheus-adapter      ClusterIP   10.100.63.156    <none>        443/TCP             174m
prometheus-k8s          NodePort    10.96.91.11      <none>        9090:32090/TCP      35m
prometheus-operated     ClusterIP   None             <none>        9090/TCP            173m
prometheus-operator     ClusterIP   None             <none>        8080/TCP            174m

7.2.2 通過nodePort及節(jié)點IP訪問grafana

我們已知master節(jié)點的IP為192.168.56.101，且在上面我們修改nodePort為32030恳不，所以從集群外的節(jié)點可以訪問地址：http://192.168.56.101:32030 (默認(rèn)用戶名/密碼：admin:admin）