配置報(bào)警規(guī)則
Alertmanager部署
- 下載二進(jìn)制包
https://prometheus.io/download/
安裝alertmanager
- 安裝步驟
[root@prometheus ~]# tar xf alertmanager-0.20.0-rc.0.linux-amd64.tar.gz -C /usr/local/
[root@prometheus ~]# ln -sv /usr/local/alertmanager-0.20.0-rc.0.linux-amd64/ /usr/local/Prometheus_alertmanager/
[root@prometheus ~]# cd /usr/local/Prometheus_alertmanager/
配置alertmanager郵件告警
- 修改alertmanager配置文件
[root@prometheus Prometheus_alertmanager]# cp alertmanager.yml alertmanager.yml.bak
[root@prometheus Prometheus_alertmanager]# vim alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25' # smtp地址
smtp_from: 'xxx@163.com' # 誰(shuí)發(fā)郵件
smtp_auth_username: 'xxx@163.com' # 郵箱用戶
smtp_auth_password: 'xxxxx' # 郵箱客戶端授權(quán)密碼
smtp_require_tls: false
route: # route用來(lái)設(shè)置報(bào)警的分發(fā)策略
group_by: ["alertname"] # 分組名
group_wait: 30s # 當(dāng)收到告警的時(shí)候戈盈,等待三十秒看是否還有告警比原,如果有就一起發(fā)出去
group_interval: 30s # 發(fā)送警告間隔時(shí)間
repeat_interval: 20m # 重復(fù)報(bào)警的間隔時(shí)間
receiver: Node_warning # 設(shè)置默認(rèn)接收人院究,如果想分組接收剃执,把下面這段的注釋去掉
# routes: # 可以指定哪些組接收哪些消息
# - receiver: 'Node_warning'
# continue: true
# group_wait: 10s
# match_re:
# service: mysql|cassandra # 所有service=mysql或者service=cassandra的告警分配到數(shù)據(jù)庫(kù)接收端
# - receiver: 'MySQL_warning'
# group_wait: 10s
# match_re: # 根據(jù)標(biāo)簽分組,匹配標(biāo)簽dest=szjf的為fping-receiver組
# serverity: warning
receivers: # 定義接收者勺良,將告警發(fā)送給誰(shuí)
- name: 'Node_warning'
email_configs:
- to: 'xxx@126.com'
#- name: 'MySQL_warning'
# email_configs:
# - to: 'xxx@qq.com'
測(cè)試
- 檢查配置
[root@prometheus Prometheus_alertmanager]# ./amtool check-config alertmanager.yml
Checking 'alertmanager.yml' SUCCESS
Found:
- global config
- route
- 0 inhibit rules
- 1 receivers
- 0 templates
測(cè)試成功
- 配置systemd啟動(dòng)alertmanager
[root@prometheus ~]# vim /lib/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target
[Service]
ExecStart=/usr/local/Prometheus_alertmanager/alertmanager --config.file='/usr/local/Prometheus_alertmanager/alertmanager.yml'
[Install]
WantedBy=multi-user.target
# 重載配置并設(shè)置開(kāi)機(jī)自啟
[root@prometheus ~]# systemctl daemon-reload
[root@prometheus ~]# systemctl start alertmanager
[root@prometheus ~]# systemctl enable alertmanager
在web中使用url:http://alertmanager_ip:9093往史,會(huì)顯示alertmanager的界面
建立通信
- 修改prometheus配置文件,添加alertmanager通訊地址
[root@prometheus ~]# cd /usr/local/Prometheus
[root@prometheus Prometheus]# vim prometheus.yml
# 建立通信
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
# 規(guī)則讀取路徑(可配置多個(gè))
rule_files:
- "rules/node_rules.yml"
# - "rules/mysql_rules.yml"
- 配置告警規(guī)則
[root@prometheus Prometheus]# mkdir rules
[root@prometheus Prometheus]# vim rules/node_rules.yml
groups:
- name: test
rules:
- alert: 內(nèi)存使用率過(guò)高
expr: 100-(node_memory_Buffers_bytes+node_memory_Cached_bytes+node_memory_MemFree_bytes)/node_memory_MemTotal_bytes*100 > 90
for: 30s # 告警持續(xù)時(shí)間黍瞧,超過(guò)這個(gè)時(shí)間才會(huì)發(fā)送給alertmanager
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} 內(nèi)存使用率過(guò)高"
description: "{{ $labels.instance }} of job {{$labels.job}}內(nèi)存使用率超過(guò)80%,當(dāng)前使用率[{{ $value }}]."
- alert: cpu使用率過(guò)高
expr: 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90
for: 30s
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} cpu使用率過(guò)高"
description: "{{ $labels.instance }} of job {{$labels.job}}cpu使用率超過(guò)80%,當(dāng)前使用率[{{ $value }}]."
測(cè)試規(guī)則
- 檢查告警規(guī)則诸尽,重載prometheus
[root@prometheus Prometheus]# curl -XPOST http://localhost:9090/-/reload
- 在prometheus界面的alert可以看到告警狀態(tài)
- 綠色表示正常。
- 紅色狀態(tài)為PENDING表示alerts還沒(méi)有發(fā)送至Alertmanager印颤,因?yàn)閞ules里面配置了for: 30s您机。
- 30s后狀態(tài)由PENDING變?yōu)镕IRING,此時(shí),prometheus才將告警發(fā)給alertmanager际看,在Alertmanager中可以看到有一個(gè)alert咸产。