來源:https://blog.51cto.com/xiaoluoge/2476375
作者:小羅ge11
概述
對(duì)于MySQL的監(jiān)控平臺(tái),相信大家實(shí)現(xiàn)起來有很多了:基于天兔的監(jiān)控缩功,還有基于zabbix相關(guān)的二次開發(fā)情竹。相信很多同行都應(yīng)該已經(jīng)開始玩起來了羽利。我這邊的選型是Prometheus
+ Granafa
的實(shí)現(xiàn)方式痊末。簡(jiǎn)而言之就是我現(xiàn)在的生產(chǎn)環(huán)境使用的是prometheus檀蹋,還有就是granafa滿足的我的日常工作需要绪钥。在入門的簡(jiǎn)介和安裝灿里,大家可以參考這里:
1、首先看下我們的監(jiān)控效果程腹、mysql主從
2匣吊、mysql狀態(tài):
3、緩沖池狀態(tài):
exporter 相關(guān)部署
1寸潦、安裝exporter
[root@controller2 opt]# https://github.com/prometheus/mysqld_exporter/releases/download/v0.10.0/mysqld_exporter-0.10.0.linux-amd64.tar.gz
[root@controller2 opt]# tar -xf mysqld_exporter-0.10.0.linux-amd64.tar.gz
2色鸳、添加mysql 賬戶:
GRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD ON *.* TO 'exporter'@'%' IDENTIFIED BY 'localhost';
flush privileges;
3、編輯配置文件:
[root@controller2 mysqld_exporter-0.10.0.linux-amd64]# cat /opt/mysqld_exporter-0.10.0.linux-amd64/.my.cnf
[client]
user=exporter
password=123456
4见转、設(shè)置配置文件:
[root@controller2 mysqld_exporter-0.10.0.linux-amd64]# cat /etc/systemd/system/mysql_exporter.service
[Unit]
Description=mysql Monitoring System
Documentation=mysql Monitoring System
[Service]
ExecStart=/opt/mysqld_exporter-0.10.0.linux-amd64/mysqld_exporter \
-collect.info_schema.processlist \
-collect.info_schema.innodb_tablespaces \
-collect.info_schema.innodb_metrics \
-collect.perf_schema.tableiowaits \
-collect.perf_schema.indexiowaits \
-collect.perf_schema.tablelocks \
-collect.engine_innodb_status \
-collect.perf_schema.file_events \
-collect.info_schema.processlist \
-collect.binlog_size \
-collect.info_schema.clientstats \
-collect.perf_schema.eventswaits \
-config.my-cnf=/opt/mysqld_exporter-0.10.0.linux-amd64/.my.cnf
[Install]
WantedBy=multi-user.target
5命雀、添加配置到prometheus server
- job_name: 'mysql'
static_configs:
- targets: ['192.168.1.11:9104','192.168.1.12:9104']
6、測(cè)試看有沒有返回?cái)?shù)值:
正常我們通過mysql_up可以查詢倒mysql監(jiān)控是否已經(jīng)生效池户,是否起起來
#HELP mysql_up Whether the MySQL server is up.
#TYPE mysql_up gauge
mysql_up 1
監(jiān)控相關(guān)指標(biāo)
在做任何一個(gè)東西監(jiān)控的時(shí)候咏雌,我們要時(shí)刻明白我們要監(jiān)控的是什么凡怎,指標(biāo)是啥才能更好的去監(jiān)控我們的服務(wù),在mysql里面我們通成薅叮可以通過一下指標(biāo)去衡量mysql的運(yùn)行情況:mysql主從運(yùn)行情況统倒、查詢吞吐量、慢查詢情況氛雪、連接數(shù)情況房匆、緩沖池使用情況以及查詢執(zhí)行性能等。
主從復(fù)制運(yùn)行指標(biāo):
1报亩、主從復(fù)制線程監(jiān)控:
大部分情況下浴鸿,很多企業(yè)使用的都是主從復(fù)制的環(huán)境,監(jiān)控兩個(gè)線程是非常重要的弦追,在mysql里面我們通常是通過命令:
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.1.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000045
Read_Master_Log_Pos: 72904854
Relay_Log_File: mariadb-relay-bin.000127
Relay_Log_Pos: 72905142
Relay_Master_Log_File: mysql-bin.000045
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Slave_IO_Running岳链、Slave_SQL_Running兩個(gè)線程正常那么說明我們的復(fù)制集群是健康狀態(tài)的。
MySQLD Exporter中返回的樣本數(shù)據(jù)中通過mysql_slave_status_slave_sql_running來獲取主從集群的健康狀況劲件。
# HELP mysql_slave_status_slave_sql_running Generic metric from SHOW SLAVE STATUS.
# TYPE mysql_slave_status_slave_sql_running untyped
mysql_slave_status_slave_sql_running{channel_name="",connection_name="",master_host="172.16.1.1",master_uuid=""} 1
2掸哑、主從復(fù)制落后時(shí)間:
在使用show slave status
里面還有一個(gè)關(guān)鍵的參數(shù)Seconds_Behind_Master。Seconds_Behind_Master表示slave上SQL thread與IO thread之間的延遲零远,我們都知道在MySQL的復(fù)制環(huán)境中苗分,slave先從master上將binlog拉取到本地(通過IO thread),然后通過SQL
thread將binlog重放牵辣,而Seconds_Behind_Master表示本地relaylog中未被執(zhí)行完的那部分的差值摔癣。所以如果slave拉取到本地的relaylog(實(shí)際上就是binlog,只是在slave上習(xí)慣稱呼relaylog而已)都執(zhí)行完纬向,此時(shí)通過show slave status看到的會(huì)是0
Seconds_Behind_Master: 0
MySQLD Exporter中返回的樣本數(shù)據(jù)中通過mysql_slave_status_seconds_behind_master 來獲取相關(guān)狀態(tài)择浊。
# HELP mysql_slave_status_seconds_behind_master Generic metric from SHOW SLAVE STATUS.
# TYPE mysql_slave_status_seconds_behind_master untyped
mysql_slave_status_seconds_behind_master{channel_name="",connection_name="",master_host="172.16.1.1",master_uuid=""} 0
查詢吞吐量:
說到吞吐量,那么我們?nèi)绾螐哪欠矫鎭砗饬磕兀?br> 通常來說我們可以根據(jù)mysql 的插入逾条、查詢近她、刪除、更新等操作來
為了獲取吞吐量膳帕,MySQL 有一個(gè)名為 Questions 的內(nèi)部計(jì)數(shù)器(根據(jù) MySQL
用語粘捎,這是一個(gè)服務(wù)器狀態(tài)變量),客戶端每發(fā)送一個(gè)查詢語句危彩,其值就會(huì)加一攒磨。由 Questions 指標(biāo)帶來的以客戶端為中心的視角常常比相關(guān)的Queries
計(jì)數(shù)器更容易解釋。作為存儲(chǔ)程序的一部分汤徽,后者也會(huì)計(jì)算已執(zhí)行語句的數(shù)量娩缰,以及諸如PREPARE 和 DEALLOCATE PREPARE
指令運(yùn)行的次數(shù),作為服務(wù)器端預(yù)處理語句的一部分谒府∑纯玻可以通過命令來查詢:
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Questions";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Questions | 15071 |
+---------------+-------+
MySQLD Exporter中返回的樣本數(shù)據(jù)中通過mysql_global_status_questions反映當(dāng)前Questions計(jì)數(shù)器的大懈√骸:
# HELP mysql_global_status_questions Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_questions untyped
mysql_global_status_questions 13253
當(dāng)然由于prometheus
具有非常豐富的查詢語言,我們可以通過這個(gè)累加的計(jì)數(shù)器來查詢某一短時(shí)間內(nèi)的查詢?cè)鲩L率情況泰鸡,可以做相關(guān)的閾值告警處理债蓝、例如一下查詢2分鐘時(shí)間內(nèi)的查詢情況:
rate(mysql_global_status_questions[2m])
當(dāng)然上面是總量,我們可以分別從監(jiān)控讀盛龄、寫指令的分解情況饰迹,從而更好地理解數(shù)據(jù)庫的工作負(fù)載、找到可能的瓶頸余舶。通常啊鸭,通常,讀取查詢會(huì)由 Com_select
指標(biāo)抓取匿值,而寫入查詢則可能增加三個(gè)狀態(tài)變量中某一個(gè)的值赠制,這取決于具體的指令:
Writes = Com_insert + Com_update + Com_delete
下面我們通過命令獲取插入的情況:
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Com_insert";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_insert | 10578 |
+---------------+-------+
從MySQLD
Exporter的/metrics返回的監(jiān)控樣本中,可以通過global_status_commands_total獲取當(dāng)前實(shí)例各類指令執(zhí)行的次數(shù):
# HELP mysql_global_status_commands_total Total number of executed MySQL commands.
# TYPE mysql_global_status_commands_total counter
mysql_global_status_commands_total{command="create_trigger"} 0
mysql_global_status_commands_total{command="create_udf"} 0
mysql_global_status_commands_total{command="create_user"} 1
mysql_global_status_commands_total{command="create_view"} 0
mysql_global_status_commands_total{command="dealloc_sql"} 0
mysql_global_status_commands_total{command="delete"} 3369
mysql_global_status_commands_total{command="delete_multi"} 0
慢查詢性能
查詢性能方面挟憔,慢查詢也是查詢告警的一個(gè)重要的指標(biāo)憎妙。MySQL還提供了一個(gè)Slow_queries的計(jì)數(shù)器,當(dāng)查詢的執(zhí)行時(shí)間超過long_query_time的值后曲楚,計(jì)數(shù)器就會(huì)+1,其默認(rèn)值為10秒褥符,可以通過以下指令在MySQL中查詢當(dāng)前l(fā)ong_query_time的設(shè)置:
MariaDB [(none)]> SHOW VARIABLES LIKE 'long_query_time';
+-----------------+-----------+
| Variable_name | Value |
+-----------------+-----------+
| long_query_time | 10.000000 |
+-----------------+-----------+
1 row in set (0.00 sec)
當(dāng)然我們也可以修改時(shí)間
MariaDB [(none)]> SET GLOBAL long_query_time = 5;
Query OK, 0 rows affected (0.00 sec)
然后我們而已通過sql語言查詢MySQL實(shí)例中Slow_queries的數(shù)量:
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Slow_queries";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Slow_queries | 0 |
+---------------+-------+
1 row in set (0.00 sec)
MySQLD
Exporter返回的樣本數(shù)據(jù)中龙誊,通過mysql_global_status_slow_queries指標(biāo)展示當(dāng)前的Slow_queries的值:
# HELP mysql_global_status_slow_queries Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_slow_queries untyped
mysql_global_status_slow_queries 0
同樣的,更具根據(jù)Prometheus 慢查詢語句我們也可以查詢倒他某段時(shí)間內(nèi)的增長率:
rate(mysql_global_status_slow_queries[5m])
連接數(shù)監(jiān)控
監(jiān)控客戶端連接情況相當(dāng)重要喷楣,因?yàn)橐坏┛捎眠B接耗盡趟大,新的客戶端連接就會(huì)遭到拒絕。MySQL 默認(rèn)的連接數(shù)限制為 151铣焊。
MariaDB [(none)]> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 151 |
+-----------------+-------+
當(dāng)然我們可以修改配置文件的形式來增加這個(gè)數(shù)值逊朽。與之對(duì)應(yīng)的就是當(dāng)前連接數(shù)量,當(dāng)我們當(dāng)前連接出來超過系統(tǒng)設(shè)置的最大值之后常會(huì)出現(xiàn)我們看到的Too many
connections(連接數(shù)過多)曲伊,下面我查找一下當(dāng)前連接數(shù):
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Threads_connected";
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_connected | 41 |
+-------------------+-------
當(dāng)然mysql 還提供Threads_running 這個(gè)指標(biāo)叽讳,幫助你分隔在任意時(shí)間正在積極處理查詢的線程與那些雖然可用但是閑置的連接。
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Threads_running";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| Threads_running | 10 |
+-----------------+-------+
如果服務(wù)器真的達(dá)到 max_connections
限制坟募,它就會(huì)開始拒絕新的連接岛蚤。在這種情況下,Connection_errors_max_connections
指標(biāo)就會(huì)開始增加懈糯,同時(shí)涤妒,追蹤所有失敗連接嘗試的Aborted_connects 指標(biāo)也會(huì)開始增加。
MySQLD Exporter返回的樣本數(shù)據(jù)中:
# HELP mysql_global_variables_max_connections Generic gauge metric from SHOW GLOBAL VARIABLES.
# TYPE mysql_global_variables_max_connections gauge
mysql_global_variables_max_connections 151
表示最大連接數(shù)
# HELP mysql_global_status_threads_connected Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_threads_connected untyped
mysql_global_status_threads_connected 41
表示當(dāng)前的連接數(shù)
# HELP mysql_global_status_threads_running Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_threads_running untyped
mysql_global_status_threads_running 1
表示當(dāng)前活躍的連接數(shù)
# HELP mysql_global_status_aborted_connects Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_aborted_connects untyped
mysql_global_status_aborted_connects 31
累計(jì)所有的連接數(shù)
# HELP mysql_global_status_connection_errors_total Total number of MySQL connection errors.
# TYPE mysql_global_status_connection_errors_total counter
mysql_global_status_connection_errors_total{error="internal"} 0
#服務(wù)器內(nèi)部引起的錯(cuò)誤赚哗、如內(nèi)存硬盤等
mysql_global_status_connection_errors_total{error="max_connections"} 0
#超出連接處引起的錯(cuò)誤
當(dāng)然根據(jù)prom表達(dá)式她紫,我們可以查詢當(dāng)前剩余可用的連接數(shù):
mysql_global_variables_max_connections - mysql_global_status_threads_connected
查詢mysq拒絕連接數(shù)
mysql_global_status_aborted_connects
緩沖池情況:
MySQL 默認(rèn)的存儲(chǔ)引擎 InnoDB
使用了一片稱為緩沖池的內(nèi)存區(qū)域硅堆,用于緩存數(shù)據(jù)表與索引的數(shù)據(jù)。緩沖池指標(biāo)屬于資源指標(biāo)贿讹,而非工作指標(biāo)渐逃,前者更多地用于調(diào)查(而非檢測(cè))性能問題。如果數(shù)據(jù)庫性能開始下滑围详,而磁盤
I/O 在不斷攀升朴乖,擴(kuò)大緩沖池往往能帶來性能回升。
默認(rèn)設(shè)置下助赞,緩沖池的大小通常相對(duì)較小买羞,為 128MiB。不過雹食,MySQL 建議可將其擴(kuò)大至專用數(shù)據(jù)庫服務(wù)器物理內(nèi)存的 80% 大小畜普。我們可以查看一下:
MariaDB [(none)]> show global variables like 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
MySQLD Exporter返回的樣本數(shù)據(jù)中,使用mysql_global_variables_innodb_buffer_pool_size來表示群叶。
# HELP mysql_global_variables_innodb_buffer_pool_size Generic gauge metric from SHOW GLOBAL VARIABLES.
# TYPE mysql_global_variables_innodb_buffer_pool_size gauge
mysql_global_variables_innodb_buffer_pool_size 1.34217728e+08
Innodb_buffer_pool_read_requests記錄了正常從緩沖池讀取數(shù)據(jù)的請(qǐng)求數(shù)量吃挑。可以通過以下指令查看
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_read_requests";
+----------------------------------+-------------+
| Variable_name | Value |
+----------------------------------+-------------+
| Innodb_buffer_pool_read_requests | 38465 |
+----------------------------------+-------------+
MySQLD
Exporter返回的樣本數(shù)據(jù)中街立,使用mysql_global_status_innodb_buffer_pool_read_requests來表示舶衬。
# HELP mysql_global_status_innodb_buffer_pool_read_requests Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_buffer_pool_read_requests untyped
mysql_global_status_innodb_buffer_pool_read_requests 2.7711547168e+10
當(dāng)緩沖池?zé)o法滿足時(shí),MySQL只能從磁盤中讀取數(shù)據(jù)赎离。Innodb_buffer_pool_reads即記錄了從磁盤讀取數(shù)據(jù)的請(qǐng)求數(shù)量逛犹。通常來說從內(nèi)存中讀取數(shù)據(jù)的速度要比從磁盤中讀取快很多,因此梁剔,如果Innodb_buffer_pool_reads的值開始增加虽画,可能意味著數(shù)據(jù)庫的性能有問題荣病。
可以通過以下只能查看Innodb_buffer_pool_reads的數(shù)量
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_reads";
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Innodb_buffer_pool_reads | 138 |
+--------------------------+-------+
1 row in set (0.00 sec)
MySQLD
Exporter返回的樣本數(shù)據(jù)中,使用mysql_global_status_innodb_buffer_pool_read_requests來表示脖岛。
# HELP mysql_global_status_innodb_buffer_pool_reads Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_buffer_pool_reads untyped
mysql_global_status_innodb_buffer_pool_reads 138
通過以上監(jiān)控指標(biāo),以及實(shí)際監(jiān)控的場(chǎng)景鸡岗,我們可以利用PromQL快速建立多個(gè)監(jiān)控項(xiàng)编兄。可以查看兩分鐘內(nèi)讀取磁盤的增長率的增長率:
rate(mysql_global_status_innodb_buffer_pool_reads[2m])
官方模板ID
上面是我們簡(jiǎn)單列舉的一些指標(biāo),下面我們使用granafa給 MySQLD_Exporter添加監(jiān)控圖表:
主從主群監(jiān)控(模板7371):
相關(guān)mysql 狀態(tài)監(jiān)控7362:
緩沖池狀態(tài)7365:
簡(jiǎn)單的告警規(guī)則
除了相關(guān)模板之外悯嗓,沒有告警規(guī)則那么我們的監(jiān)控就是不完美的,下面列一下我們的監(jiān)控告警規(guī)則
groups:
- name: MySQL-rules
rules:
- alert: MySQL Status
expr: up == 0
for: 5s
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: MySQL has stop !!!"
description: "檢測(cè)MySQL數(shù)據(jù)庫運(yùn)行狀態(tài)"
- alert: MySQL Slave IO Thread Status
expr: mysql_slave_status_slave_io_running == 0
for: 5s
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: MySQL Slave IO Thread has stop !!!"
description: "檢測(cè)MySQL主從IO線程運(yùn)行狀態(tài)"
- alert: MySQL Slave SQL Thread Status
expr: mysql_slave_status_slave_sql_running == 0
for: 5s
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: MySQL Slave SQL Thread has stop !!!"
description: "檢測(cè)MySQL主從SQL線程運(yùn)行狀態(tài)"
- alert: MySQL Slave Delay Status
expr: mysql_slave_status_sql_delay == 30
for: 5s
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: MySQL Slave Delay has more than 30s !!!"
description: "檢測(cè)MySQL主從延時(shí)狀態(tài)"
- alert: Mysql_Too_Many_Connections
expr: rate(mysql_global_status_threads_connected[5m]) > 200
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: 連接數(shù)過多"
description: "{{$labels.instance}}: 連接數(shù)過多卸察,請(qǐng)?zhí)幚?,(current value is: {{ $value }})"
- alert: Mysql_Too_Many_slow_queries
expr: rate(mysql_global_status_slow_queries[5m]) > 3
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: 慢查詢有點(diǎn)多脯厨,請(qǐng)檢查處理"
description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"
2、添加規(guī)則到prometheus:
rule_files:
- "rules/*.yml"
3坑质、打開web ui我們可以看到規(guī)則生效了:
總結(jié)
到處監(jiān)控mysql的相關(guān)狀態(tài)已經(jīng)完成合武,大家可以根據(jù)mysql更多的監(jiān)控指標(biāo)去完善自己的監(jiān)控,當(dāng)然這一套就是我用在線上環(huán)境的涡扼,可以參考參考稼跳。