一句話總結:跟 prometheus 結合,監(jiān)控網(wǎng)頁歌粥。
更準確的描述如下:
基于blackbox_exporter實現(xiàn)對URL狀態(tài)攀操、IP可用性、端口狀態(tài)锡移、TLS證書的過期時間監(jiān)控呕童。
我們監(jiān)控主機的資源用量、容器的運行狀態(tài)淆珊、數(shù)據(jù)庫中間件的運行數(shù)據(jù)夺饲。這些都是支持業(yè)務和服務的基礎設施,通過白盒能夠了解其內(nèi)部的實際運行狀態(tài),通過對監(jiān)控指標的觀察能夠預判可能出現(xiàn)的問題往声,從而對潛在的不確定因素進行優(yōu)化擂找。
而從完整的監(jiān)控邏輯的角度,除了大量的應用白盒監(jiān)控以外浩销,還應該添加適當?shù)暮诤斜O(jiān)控贯涎。黑盒監(jiān)控即以用戶的身份測試服務的外部可見性,常見的黑盒監(jiān)控包括HTTP探針慢洋、TCP探針等用于檢測站點或者服務的可訪問性塘雳,以及訪問效率等。
黑盒監(jiān)控相較于白盒監(jiān)控最大的不同在于黑盒監(jiān)控是以故障為導向當故障發(fā)生時普筹,黑盒監(jiān)控能快速發(fā)現(xiàn)故障败明,而白盒監(jiān)控則側重于主動發(fā)現(xiàn)或者預測潛在的問題。一個完善的監(jiān)控目標是要能夠從白盒的角度發(fā)現(xiàn)潛在問題太防,能夠在黑盒的角度快速發(fā)現(xiàn)已經(jīng)發(fā)生的問題妻顶。
一、blackbox_exporter介紹
Blackbox Exporter是Prometheus社區(qū)提供的官方黑盒監(jiān)控解決方案蜒车,其允許用戶通過:HTTP讳嘱、HTTPS、DNS酿愧、TCP以及ICMP的方式對網(wǎng)絡進行探測沥潭。
HTTP/HTPPS: URL/API 可用性檢測
TCP: 端口監(jiān)聽檢測
ICMP: 主機存活檢測
DNS: 域名解析
二、安裝部署blackbox_exporter
假設Prometheus寓娩,Alert Manager 已經(jīng)安裝配置叛氨。
安裝
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz
tar -xf blackbox_exporter-0.22.0.linux-amd64.tar.gz -C /apps/
cd /apps/
mv blackbox_exporter-0.22.0.linux-amd64/ blackbox_exporter
自啟動
[root@monitoring ~]# vim /etc/systemd/system/blackbox-exporter.service
[root@monitoring ~]# cat /etc/systemd/system/blackbox-exporter.service
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/apps/blackbox_exporter/blackbox_exporter \
--config.file=/apps/blackbox_exporter/blackbox.yml \
--web.listen-address=:9115
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@monitoring ~]# systemctl enable --now blackbox-exporter.service
Created symlink /etc/systemd/system/multi-user.target.wants/blackbox-exporter.service → /etc/systemd/system/blackbox-exporter.service.
[root@monitoring ~]# systemctl status blackbox-exporter.service
● blackbox-exporter.service - Prometheus Blackbox Exporter
Loaded: loaded (/etc/systemd/system/blackbox-exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-09-27 16:56:04 CST; 1min 13s ago
Main PID: 29832 (blackbox_export)
Tasks: 8 (limit: 49440)
Memory: 4.9M
CGroup: /system.slice/blackbox-exporter.service
└─29832 /apps/blackbox_exporter/blackbox_exporter --config.file=/apps/blackbox_exporter/blackbox.yml --web.listen-address=:9115
Sep 27 16:56:04 monitoring systemd[1]: Started Prometheus Blackbox Exporter.
...
Creating BlackBox job in Prometheus
Go to the installation directory of our Prometheus e.g. /opt/prometheus
and edit the prometheus.yml
file
under scrape_configs
add a new job named blackbox
with the following snippet
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://gritfy.com
- https://www.google.com
- https://middlewareinventory.com
- https://devopsjunction.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
you can change the URLs of your choice. but the relabel_configs
should remain the same.
這里針對每一個探針服務(如http_2xx)定義一個采集任務,并且直接將任務的采集目標定義為我們需要探測的站點棘伴。在采集樣本數(shù)據(jù)之前通過relabel_configs對采集任務進行動態(tài)設置寞埠。
通過以上3個relabel步驟,即可大大簡化Prometheus任務配置的復雜度
Configuring Alerts and Rules in Prometheus
As part of Alert triggering, we are going to setup alerts for two scenarios
- SSLCertExpiringSoon ( with in 24 days )
- TargetUrlDown (Endpoint down or returning invalid response)
To generate alerts we need to create rules in Prometheus first.
If the rules are satisfied Prometheus
would send the alert to AlertManager
Create a new Rule file
Go to Prometheus installation directory i.e /opt/prometheus
and create a new directory named rules
under rules directory. create a new file named blackbox-rules.yml
groups:
- name: Blackbox rules
rules:
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 24
for: 1m
labels:
severity: warning
annotations:
description: "TLS certificate will expire in {{ $value | humanizeDuration }} (instance {{ $labels.instance }})"
- alert: EndpointDown
expr: probe_success == 0
for: 10m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
for the SSL Cert Expiry: If the earliest cert expiry value is below 86400 * 24
24 days it would trigger an alert.
Adding a rule into prometheus.yml
the rule file can now be added into our prometheus.yml
configuration file
under the rule_files
add our recently created rule filename rules/blackbox-rules.yml
rule_files:
- "rules/blackbox-rules.yml"
Enabling the alert manager in prometheus.yml
While you are adding new rules. you have to also enable the alertmanager
configuration which is disabled by default
It should point to the localhost:9093
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
refs:
http://www.reibang.com/p/0c2d2528f310