kubernetes監(jiān)控之prometheus+grafana

什么是 Prometheus

Prometheus 是由 SoundCloud 開(kāi)源監(jiān)控告警解決方案县耽,從 2012 年開(kāi)始編寫(xiě)代碼至耻,再到 2015 年 github 上開(kāi)源以來(lái),已經(jīng)吸引了 9k+ 關(guān)注蚁鳖,以及很多大公司的使用磺芭;2016 年 Prometheus 成為繼 k8s 后赁炎,第二名 CNCF成員醉箕。作為新一代開(kāi)源解決方案,很多理念與 Google SRE 運(yùn)維之道不謀而合徙垫。

基礎(chǔ)架構(gòu):


image

安裝prometheus:
1.創(chuàng)建prometheus和altertmanager(prometheus的報(bào)警模塊)的configmap文件
下面用到的yaml配置文件已放到github上有需要的可以自己clone
cat prometheus-cm.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: kube-ops
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-ops
data:
  prometheus.yml: |
    global:
      scrape_interval: 30s
      scrape_timeout: 30s
    rule_files:
    - /etc/prometheus/rules.yml
    alerting:
      alertmanagers:
        - static_configs:
          - targets: ["localhost:9093"]
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
        - targets: ['localhost:9090']
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics
    - job_name: 'kubernetes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-node-exporter'
      scheme: http
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [__meta_kubernetes_role]
        action: replace
        target_label: kubernetes_role
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:31672'
        target_label: __address__
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: 'kubernetes-services'
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter.example.com:9115
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
  rules.yml: |
    groups:
    - name: test-rule
      rules:
      - alert: NodeFilesystemUsage
        expr: (node_filesystem_size{device="rootfs"} - node_filesystem_free{device="rootfs"}) / node_filesystem_size{device="rootfs"} * 100 > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Filesystem usage detected"
          description: "{{$labels.instance}}: Filesystem usage is above 80% (current value is: {{ $value }}"
      - alert: NodeMemoryUsage
        expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High Memory usage detected"
          description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
      - alert: NodeCPUUsage
        expr: (100 - (avg by (instance) (irate(node_cpu{job="kubernetes-node-exporter",mode="idle"}[5m])) * 100)) > 80
        for: 2m
        labels:
          team: node
        annotations:
          summary: "{{$labels.instance}}: High CPU usage detected"
          description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }}"
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: kube-ops
data:
  config.yml: |-
    global:
      resolve_timeout: 5m
    route:
      receiver: webhook
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      group_by: [alertname]
      routes:
      - receiver: webhook
        group_wait: 10s
        match:
          team: node
    receivers:
    - name: webhook
      webhook_configs:
      - url: 'http://apollo/hooks/dingtalk/'
        send_resolved: true
      - url: 'http://apollo/hooks/prome/'
        send_resolved: true

kubectl apply -f prometheus-cm.yaml

2.部署node-exporter(在每個(gè)節(jié)點(diǎn)都安裝,需要用daemonset模式)
vim node-exporter.yaml

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-ops
  labels:
    k8s-app: node-exporter
spec:
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      containers:
      - image: prom/node-exporter
        name: node-exporter
        ports:
        - containerPort: 9100
          protocol: TCP
          name: http

---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: kube-ops
spec:
  ports:
  - name: http
    port: 9100
    nodePort: 31672
    protocol: TCP
  type: NodePort
  selector:
    k8s-app: node-exporter

kubectl apply -f node-exporter.yaml

3.安裝prometheus+alertmanager(注意這里的data數(shù)據(jù)盤(pán)是使用cephfs pv pvc掛載的姻报,需要自行創(chuàng)建己英,pv pvc的模板前面的github鏈接中可以找到)
vim prometheus-deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus
  name: prometheus
  namespace: kube-ops
spec:
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: prometheus
    spec:
      serviceAccountName: prometheus
      securityContext:
        runAsUser: 0
      containers:
      - image: prom/prometheus:v2.0.0
        name: prometheus
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention=24h"
        ports:
        - containerPort: 9090
          protocol: TCP
          name: http
        volumeMounts:
        - mountPath: "/prometheus"
          name: data
        - mountPath: "/etc/prometheus"
          name: config-volume
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 200m
            memory: 1Gi
      - image: quay.io/prometheus/alertmanager:v0.12.0
        name: alertmanager
        args:
        - "-config.file=/etc/alertmanager/config.yml"
        - "-storage.path=/alertmanager"
        ports:
        - containerPort: 9093
          protocol: TCP
          name: http
        volumeMounts:
        - name: alertmanager-config-volume
          mountPath: /etc/alertmanager
        resources:
          requests:
            cpu: 50m
            memory: 50Mi
          limits:
            cpu: 200m
            memory: 200Mi
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: prometheus-pvc
      - configMap:
          name: prometheus-config
        name: config-volume
      - name: alertmanager-config-volume
        configMap:
          name: alertmanager

kubectl apply -f prometheus-deployment.yaml

4.暴露prometheus端口(也可以使用ingress規(guī)則暴露,可自行選擇吴旋,下面的grafana選擇用ingress暴露)
vim prometheus-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: kube-ops
  labels:
    k8s-app: prometheus
spec:
  selector:
    k8s-app: prometheus
  type: NodePort
  ports:
    - name: web
      port: 9090
      targetPort: http

kubectl apply -f prometheus-svc.yaml

5.安裝grafana(prometheus自帶的圖形展示不行很理想损肛,所以要搭配強(qiáng)大的圖形展示工具grafana)
安裝grafnan的時(shí)候用到了cephfs pv pvc和ingress厢破。
vim grafana.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: grafana
  name: grafana
  namespace: kube-ops
spec:
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    k8s-app: grafana
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  labels:
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: grafana
  name: grafana-ingress
  namespace: kube-ops
spec:
  rules:
  - host: grafana.io
    http:
      paths:
      - path: /
        backend:
          serviceName: grafana
          servicePort: 3000
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-pv
  namespace: kube-ops
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  cephfs:
    monitors:
      - 192.168.0.231:6789
      - 192.168.0.242:6789
      - 192.168.0.211:6789
    path: /data/system/grafana
    user: admin
    secretRef:
      name: ceph-secret
    readOnly: false
  persistentVolumeReclaimPolicy: Recycle
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: kube-ops
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-ops
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
      - name: grafana
        image: gcr.io/google_containers/heapster-grafana-amd64:v4.4.3
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /var
          name: grafana
          subPath: grafana/data
        - mountPath: /ssl
          name: ssl
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        env:
        - name: INFLUXDB_HOST
          value: influxdb.kube-system
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
        - name: GF_AUTH_BASIC_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "false"
        - name: GF_SERVER_ROOT_URL
          value: /
        - name: GF_SMTP_ENABLED
          value: "true"
        - name: GF_ALERTING_ENABLED
          value: "true"
        - name: GF_ALERTING_EXECUTE_ALERTS
          value: "true"
        readinessProbe:
          httpGet:
            path: /login
            port: 3000
          initialDelaySeconds: 30
          timeoutSeconds: 2
      volumes:
      - name: ssl
        hostPath:
          path: /etc/ssl/certs
      - name: grafana
        persistentVolumeClaim:
          claimName: prometheus-pvc

kubectl apply -f grafana.yaml

這里的ingress是選擇 grafana.io作為域名,需要將本地的hosts解析成nodeip(沒(méi)做lb的話(huà))治拿。
進(jìn)入到以下界面摩泪,填寫(xiě)數(shù)據(jù)信息

image.png

隨后設(shè)置dashboard頁(yè)面,推薦下面的兩款劫谅。
https://grafana.com/dashboards/1621
https://grafana.com/dashboards/315
這里選擇使用315
效果展示如下圖所示
image.png

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末见坑,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子捏检,更是在濱河造成了極大的恐慌荞驴,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,542評(píng)論 6 504
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件贯城,死亡現(xiàn)場(chǎng)離奇詭異熊楼,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)能犯,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,822評(píng)論 3 394
  • 文/潘曉璐 我一進(jìn)店門(mén)孙蒙,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人悲雳,你說(shuō)我怎么就攤上這事挎峦。” “怎么了合瓢?”我有些...
    開(kāi)封第一講書(shū)人閱讀 163,912評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵坦胶,是天一觀(guān)的道長(zhǎng)。 經(jīng)常有香客問(wèn)我晴楔,道長(zhǎng)顿苇,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,449評(píng)論 1 293
  • 正文 為了忘掉前任税弃,我火速辦了婚禮纪岁,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘则果。我一直安慰自己幔翰,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,500評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布西壮。 她就那樣靜靜地躺著遗增,像睡著了一般。 火紅的嫁衣襯著肌膚如雪款青。 梳的紋絲不亂的頭發(fā)上做修,一...
    開(kāi)封第一講書(shū)人閱讀 51,370評(píng)論 1 302
  • 那天,我揣著相機(jī)與錄音,去河邊找鬼饰及。 笑死蔗坯,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的燎含。 我是一名探鬼主播步悠,決...
    沈念sama閱讀 40,193評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼瘫镇!你這毒婦竟也來(lái)了鼎兽?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,074評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤铣除,失蹤者是張志新(化名)和其女友劉穎谚咬,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體尚粘,經(jīng)...
    沈念sama閱讀 45,505評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡择卦,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,722評(píng)論 3 335
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了郎嫁。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片秉继。...
    茶點(diǎn)故事閱讀 39,841評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖泽铛,靈堂內(nèi)的尸體忽然破棺而出尚辑,到底是詐尸還是另有隱情,我是刑警寧澤盔腔,帶...
    沈念sama閱讀 35,569評(píng)論 5 345
  • 正文 年R本政府宣布杠茬,位于F島的核電站,受9級(jí)特大地震影響弛随,放射性物質(zhì)發(fā)生泄漏瓢喉。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,168評(píng)論 3 328
  • 文/蒙蒙 一舀透、第九天 我趴在偏房一處隱蔽的房頂上張望栓票。 院中可真熱鬧,春花似錦愕够、人聲如沸走贪。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,783評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)厉斟。三九已至挚躯,卻和暖如春强衡,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背码荔。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 32,918評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工漩勤, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留感挥,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 47,962評(píng)論 2 370
  • 正文 我出身青樓越败,卻偏偏與公主長(zhǎng)得像触幼,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子究飞,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,781評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容