1 Prometheus是什么
摘自Prometheus官網(wǎng):
From metrics to insight
Power your metrics and alerting with a leading open-source monitoring solution.
翻譯過來就是涩维,Prometheus是一個領(lǐng)先的開源監(jiān)控解決方案
2 Prometheus特點
2.1 高維數(shù)據(jù)模型(Highly Dimensional Data Model)及時序列(Time Series)
Prometheus把所有的數(shù)據(jù)按時序列進(jìn)行存儲, 所有采集上來的數(shù)據(jù)在都被打了時間戳并按時間先后順序進(jìn)行流化,這些數(shù)據(jù)屬于相同的指標(biāo)名以及一組標(biāo)簽維度(labeled dimensions)
2.1.1 時序列
A time series is a series of data points indexed (or listed or graphed) in time order.
時序列是按照時間順序排列的一系列數(shù)據(jù)點
2.1.2 指標(biāo)名(Metric names)和標(biāo)簽(Labels)
每個時序列被唯一識別為它的一個指標(biāo)名(metric name)以及一組鍵值對(key-value pairs)麸锉,鍵值對就是我們所說的標(biāo)簽(labels)
指標(biāo)名指定了被測量系統(tǒng)的一個常規(guī)特征,比如:http_requests_total - 收到的HTTP請求總數(shù)蹲盘。指標(biāo)名的命名必須符合正則表達(dá)式規(guī)則[a-zA-Z_:][a-zA-Z0-9_:]*
標(biāo)簽使Prometheus的維度數(shù)據(jù)模型成為可能驼唱,對于一個指標(biāo)名給定任意一組標(biāo)簽的組合都能標(biāo)識一個特定的指標(biāo)維度實例(particular dimensional instantiation of that metric)彤委,例如:通過POST方法訪問 /api/tracks 的HTTP請求铐尚。標(biāo)簽名的命名必須符合正則表達(dá)式規(guī)則[a-zA-Z_][a-zA-Z0-9_]*
2.1.3 記號(Notation)
給定一個指標(biāo)名和一組標(biāo)簽拨脉,時序列通過Notation被標(biāo)記:
<metric name>{<label name>=<label value>, ...}
例如,一個指標(biāo)名為 api_http_requests_total 且標(biāo)簽為 method="POST" 和 handler="/messages" 的時序列可以寫成:
api_http_requests_total{method="POST", handler="/messages"}
2.2 PromQL
建立在高維數(shù)據(jù)模型上的查詢語言宣增,這里暫不展開
2.3 高效存儲
Prometheus存儲時序列數(shù)據(jù)于內(nèi)存和本地磁盤中玫膀,不依賴分布式存儲,單節(jié)點工作爹脾。擴(kuò)展通過功能分片和聯(lián)邦來實現(xiàn)
2.4 可視化效果出眾
通過與Grafana集成帖旨,能夠為使用者提供非常直觀且漂亮的可視化效果
2.5 通過拉取方式采集數(shù)據(jù),或者通過中間網(wǎng)關(guān)推送方式采集數(shù)據(jù)
2.6 通過服務(wù)發(fā)現(xiàn)或者靜態(tài)配置來發(fā)現(xiàn)監(jiān)控目標(biāo)
3 架構(gòu)及組件說明
3.1 架構(gòu)圖
3.2 組件說明
3.2.1 Prometheus Server
負(fù)責(zé)數(shù)據(jù)采集和存儲灵妨,提供PromQL查詢語言的支持
3.2.2 Push gateway
負(fù)責(zé)支持short-lived jobs解阅,push gateway能夠讓臨時(ephemeral)或批處理job暴露他們的指標(biāo)給Prometheus。因為臨時和批處理job很可能并不長期存在泌霍,所以Prometheus無法抓到相應(yīng)的數(shù)據(jù)货抄,取而代之,我們讓這樣的job把指標(biāo)數(shù)據(jù)主動推送給push gateway朱转,之后push gateway再把這些指標(biāo)數(shù)據(jù)暴露給Prometheus蟹地。push gateway就像一個指標(biāo)緩存,并不負(fù)責(zé)計算藤为。
3.2.3 Exporters
Exporters幫助把第三方系統(tǒng)的既有指標(biāo)輸出為Prometheus指標(biāo)怪与。
我們可以在 exporter default port wiki上查看Exporters的列表,也可以在
EXPORTERS AND INTEGRATIONS上查看列表
3.2.4 Alertmanager
Alertmanager負(fù)責(zé)處理從客戶端應(yīng)用(例如:Prometheus)發(fā)送過來的警報缅疟,把接受到的信息去重分别、分組,并把他們路由到正確的接收器存淫,如PagerDuty, OpsGenie耘斩。Alertmanager還負(fù)責(zé)警報信息的消聲或抑制
4 Prometheus優(yōu)缺點
Prometheus對于采集純數(shù)字值的時間序列非常在行,所以它既適合以物理機為中心的監(jiān)控纫雁,也適合監(jiān)控高度動態(tài)的面向服務(wù)的架構(gòu)體煌往。在微服務(wù)領(lǐng)域,它的多維數(shù)據(jù)采集以及查詢非常獨到且很有競爭力轧邪。
Prometheus最大的價值在于可靠性刽脖,用戶可以在任何時候看到整個被監(jiān)控系統(tǒng)的統(tǒng)計信息,即使在系統(tǒng)有問題的時候忌愚。但它不能做到100%的精確曲管,比如如果你要按request數(shù)計費,那么Prometheus未必能采集到所有的請求硕糊,這個時候Prometheus就不太合適了院水。
5 Prometheus與Kubernetes
Prometheus是Kubernetes的近親腊徙,Google公布的Kubernetes派生于他們的Borg集群系統(tǒng),而Prometheus與Borgmon共享基礎(chǔ)設(shè)計概念檬某, Borgmon是與Borg的監(jiān)控系統(tǒng)∏颂冢現(xiàn)在Prometheus和kubernetes都被云原生計算基金會(CNCF)所掌管。技術(shù)層面上Kubernetes會把它內(nèi)部的指標(biāo)數(shù)據(jù)以Prometheus可以接受的格式暴露出來恢恼。
5.1 Prometheus與Kubernetes集成的方式
- Prometheus Operator
- kube-prometheus
- kubernetes addon
5.1.1 Prometheus Operator
Operator是CoreOS引入的一種操作其他軟件民傻,并把人們收集到的操作知識集成到部署過程的軟件。
Prometheus Operator可以方便的讓用戶安裝Prometheus场斑,并用簡單的聲明式配置來管理和配置Prometheus實例漓踢。其核心思想在于解耦Prometheus實例的部署與針對被監(jiān)控實體的配置,使Prometheus運行在Kubernetes上的步驟盡可能的簡單漏隐。
Prometheus向Kubernetes中引入了額外的資源用于聲明期望Prometheus及Alertmanager集群所達(dá)到的狀態(tài)喧半,這些資源包括:
- Prometheus
- Alertmanager
- ServiceMonitor
- PrometheusRule
Altermanager不在本文范圍之內(nèi),以后的文章單獨陳述青责。
Prometheus資源聲明式的描述了部署Prometheus部署時所期望達(dá)到的狀態(tài)挺据,而ServiceMonitor描述了一組被Prometheus所監(jiān)控的目標(biāo)。
上圖的Operator用來確保在任意時間對于每個處于Kubernetes集群中的Prometheus資源都有一組按照期望配置的Prometheus Server在運行爽柒。每個Prometheus實例又與各自的配置綁定在一起吴菠,這些配置指定了該監(jiān)視哪些目標(biāo)從而抓取指標(biāo)。
用戶可以手動指定這些配置浩村,或者讓Operator基于ServiceMonitor生成出來。ServiceMonitor資源指定如何從一組服務(wù)中獲取指標(biāo)占哟。而Prometheus資源對象可以通過標(biāo)簽(labels)動態(tài)的引入ServiceMonitor對象心墅,Operator設(shè)置Prometheus實例來監(jiān)控所有被ServiceMonitor所覆蓋的服務(wù)并保持配置與集群中的變化同步。
5.1.2 kube-prometheus
kube-prometheus把Prometheus Operator和一系列manifests結(jié)合起來榨乎,幫助用戶從監(jiān)控Kubernetes自身及跑在上面的應(yīng)用開始怎燥,提供全棧的監(jiān)控配置。
6 部署Prometheus
6.1 部署環(huán)境
序號 | 節(jié)點名 | 角色 | 內(nèi)存 | IP | 版本 |
---|---|---|---|---|---|
1 | master.mike.com | master | 2GB | 192.168.56.101 | v1.13.4 |
2 | node1.mike.com | node | 2GB | 192.168.56.102 | v1.13.4 |
3 | node2.mike.com | node | 2GB | 192.168.56.103 | v1.13.4 |
6.2 通過kube-prometheus快速部署Prometheus
6.2.1 前提
- 準(zhǔn)備好一個kubernetes集群蜜暑,這里參見章節(jié)6.1
- 確認(rèn)以下flag在kubernetes集群中被設(shè)置好铐姚,目的在于告訴kubelet使用token來認(rèn)證及鑒權(quán),這可以允許更細(xì)粒度及更簡單的訪問控制:
- ---authentication-token-webhook=true
- --authorization-mode=Webhook
- 因為kube-prometheus已經(jīng)包含了資源指標(biāo)API服務(wù)(resource metrics API server)肛捍,這與metrics-server功能相同隐绵,所以,如果kubernetes集群要是已經(jīng)部署了metrics-server拙毫,則先卸載掉metrics-server依许,如果沒有,則略過此項缀蹄。
6.2.2 克隆kube-prometheus倉庫
[root@master ~]# git clone https://github.com/coreos/kube-prometheus.git
Cloning into 'kube-prometheus'...
remote: Enumerating objects: 49, done.
remote: Counting objects: 100% (49/49), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 5763 (delta 19), reused 19 (delta 2), pack-reused 5714
Receiving objects: 100% (5763/5763), 3.72 MiB | 714.00 KiB/s, done.
Resolving deltas: 100% (3396/3396), done.
6.2.3 快速部署監(jiān)控棧
筆者的網(wǎng)絡(luò)環(huán)境位于圍墻之外峭跳,所以當(dāng)通過資源文件創(chuàng)建kubernetes中的各種資源時膘婶,kubernetes自動的從各種庫中下載docker所需要的鏡像文件而未受到任何的阻礙,如果部署的環(huán)境位于墻內(nèi)蛀醉,則需要預(yù)先下載好所有資源文件中涉及的鏡像文件到集群中的節(jié)點上悬襟。
創(chuàng)建資源
[root@master kube-prometheus]# kubectl create -f manifests/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-k8s-cluster-rsrc-use created
configmap/grafana-dashboard-k8s-node-rsrc-use created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
檢查資源就緒狀況
[root@master kube-prometheus]# until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
NAME CREATED AT
servicemonitors.monitoring.coreos.com 2019-05-13T05:38:41Z
[root@master kube-prometheus]# until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
NAMESPACE NAME AGE
monitoring alertmanager 28s
monitoring coredns 25s
monitoring grafana 27s
monitoring kube-apiserver 25s
monitoring kube-controller-manager 25s
monitoring kube-scheduler 25s
monitoring kube-state-metrics 27s
monitoring kubelet 25s
monitoring node-exporter 27s
monitoring prometheus 25s
monitoring prometheus-operator 28s
7 訪問Prometheus
通過kube-Prometheus在kubernetes集群中部署的Prometheus,默認(rèn)情況下拯刁,只能在集群內(nèi)部訪問脊岳,其建立起來的所有service的類型(type)都是ClusterIP:
[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.102.26.246 <none> 9093/TCP 84m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 83m
grafana ClusterIP 10.111.252.205 <none> 3000/TCP 84m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 84m
node-exporter ClusterIP None <none> 9100/TCP 84m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 84m
prometheus-k8s ClusterIP 10.108.202.31 <none> 9090/TCP 83m
prometheus-operated ClusterIP None <none> 9090/TCP 83m
prometheus-operator ClusterIP None <none> 8080/TCP 84m
如果想在集群外部訪問Prometheus,就需要把服務(wù)暴露給外界筛璧,下面提供具體的方式
7.1 通過修改NodePort從外界訪問Promethus
7.1.1 修改prometheus-k8s服務(wù)的類型
編輯prometheus-service.yaml文件逸绎,修改為如下內(nèi)容(注意nodePort及type屬性):
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
ports:
- name: web
nodePort: 32090
port: 9090
targetPort: web
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
type: NodePort
應(yīng)用prometheus-service.yaml
[root@master manifests]# kubectl apply -f prometheus-service.yaml --force
service/prometheus-k8s created
檢查prometheus-k8s service修改后的類型及對應(yīng)的nodePort:
[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.102.26.246 <none> 9093/TCP 139m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 139m
grafana ClusterIP 10.111.252.205 <none> 3000/TCP 139m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 139m
node-exporter ClusterIP None <none> 9100/TCP 139m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 139m
prometheus-k8s NodePort 10.96.91.11 <none> 9090:32090/TCP 19s
prometheus-operated ClusterIP None <none> 9090/TCP 138m
prometheus-operator ClusterIP None <none> 8080/TCP 139m
7.1.2 通過nodePort及節(jié)點IP訪問Promethus
我們已知master節(jié)點的IP為192.168.56.101,且在上面我們修改nodePort為32090夭谤,所以從集群外的節(jié)點可以訪問地址:http://192.168.56.101:32090
首頁如圖:
到這里棺牧,其實我們已經(jīng)可以通過Prometheus的界面以及PromQL來查詢我們已經(jīng)獲取的所有指標(biāo)信息了,當(dāng)然Prometheus提供的界面相對簡陋朗儒,功能比較簡單颊乘,所以我們可以繼續(xù)下面的操作,通過訪問已經(jīng)部署好的Granfana來以更加美觀的各種圖標(biāo)來展示Prometheus中獲得的數(shù)據(jù)醉锄。
7.2 通過nodePort及節(jié)點IP訪問Grafana
7.2.1 修改grafana服務(wù)的類型
編輯grafana-service.yaml文件乏悄,修改為如下內(nèi)容(注意nodePort及type屬性):
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
ports:
- name: http
port: 3000
targetPort: http
nodePort: 32030
selector:
app: grafana
type: NodePort
應(yīng)用grafana-service.yaml
[root@master manifests]# kubectl apply -f grafana-service.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/grafana configured
檢查grafana service修改后的類型及對應(yīng)的nodePort:
[root@master manifests]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.102.26.246 <none> 9093/TCP 174m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 173m
grafana NodePort 10.111.252.205 <none> 3000:32030/TCP 174m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 174m
node-exporter ClusterIP None <none> 9100/TCP 174m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 174m
prometheus-k8s NodePort 10.96.91.11 <none> 9090:32090/TCP 35m
prometheus-operated ClusterIP None <none> 9090/TCP 173m
prometheus-operator ClusterIP None <none> 8080/TCP 174m
7.2.2 通過nodePort及節(jié)點IP訪問grafana
我們已知master節(jié)點的IP為192.168.56.101,且在上面我們修改nodePort為32030恳不,所以從集群外的節(jié)點可以訪問地址:http://192.168.56.101:32030 (默認(rèn)用戶名/密碼:admin:admin)
首頁如圖:
我們可以發(fā)現(xiàn)檩小,通過kube-prometheus部署出來的Granfana已經(jīng)把數(shù)據(jù)源配置為相同集群中的Prometheus,同時存在了大量已經(jīng)定義好的圖表烟勋,使用起來非常簡單规求。
7.3 通過nodePort及節(jié)點IP訪問Alert Manager
7.3.1 修改alertmanager-main服務(wù)的類型
編輯alertmanager-service.yaml文件,修改為如下內(nèi)容(注意nodePort及type屬性):
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30093
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
type: NodePort
應(yīng)用alertmanager-service.yaml
[root@master manifests]# kubectl apply -f alertmanager-service.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/alertmanager-main configured
檢查alertmanager-main service修改后的類型及對應(yīng)的nodePort:
[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.102.26.246 <none> 9093:30093/TCP 3h10m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 3h9m
grafana NodePort 10.111.252.205 <none> 3000:32030/TCP 3h10m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 3h10m
node-exporter ClusterIP None <none> 9100/TCP 3h10m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 3h10m
prometheus-k8s NodePort 10.96.91.11 <none> 9090:32090/TCP 51m
prometheus-operated ClusterIP None <none> 9090/TCP 3h9m
prometheus-operator ClusterIP None <none> 8080/TCP 3h10m
7.3.2 通過nodePort及節(jié)點IP訪問alertmanager-main
我們已知master節(jié)點的IP為192.168.56.101卵惦,且在上面我們修改nodePort為30093阻肿,所以從集群外的節(jié)點可以訪問地址:http://192.168.56.101:30093