[TOC]
前言
關于Actuator:
對Spring Boot監(jiān)控能力有過了解的小伙伴都應該知道Spring Boot Actuator這個子項目伊佃,它為應用提供了強大的監(jiān)控能力忽匈。從Spring Boot 2.x開始炎码,Actuator將底層改為Micrometer,提供了更強、更靈活的監(jiān)控能力。Micrometer是一個監(jiān)控門面,可以類比成監(jiān)控界的 Slf4j 瘾带。借助Micrometer,應用能夠對接各種監(jiān)控系統(tǒng)熟菲,例如本文所要介紹的:Prometheus
關于Prometheus :
Prometheus是一個由SoundCloud開發(fā)的開源系統(tǒng)監(jiān)控+告警+時序列數(shù)據(jù)庫(TSDB)看政,Prometheus大部分組件使用Go語言編寫,是Google BorgMon監(jiān)控系統(tǒng)的開源版本抄罕。目前在CNCF基金會托管允蚣,并已成功孵化。在開源社區(qū)Prometheus目前也是相當活躍呆贿,在性能上Prometheus也足夠支撐上萬臺規(guī)模的集群嚷兔。
Prometheus的功能:
- 用度量名和鍵值對識別時間序列數(shù)據(jù)的多維數(shù)據(jù)模型
- 擁有靈活的查詢語言:PromQL
- 不依賴分布式存儲,單個服務器節(jié)點是自治的
- 通過基于HTTP的pull方式采集時序數(shù)據(jù)
- 可以通過中間網(wǎng)關進行時序列數(shù)據(jù)的推送
- 支持通過服務發(fā)現(xiàn)或者靜態(tài)配置來發(fā)現(xiàn)目標服務對象
- 支持多種多樣的圖表和界面展示做入,比如Grafana等
關于Grafana:
Grafana 是一款采用 GO 語言編寫的開源應用,支持跨平臺度量分析和可視化 + 告警竟块『耍可以通過將采集的數(shù)據(jù)查詢然后可視化地展示,并及時通知浪秘。Grafana 支持多種數(shù)據(jù)源和展示方式蒋情,總而言之是一款強大酷炫的監(jiān)控指標可視化工具埠况。
創(chuàng)建項目
本文的主要目的是實現(xiàn)微服務的監(jiān)控恕出,簡單了解了上述工具的概念后,我們就來動手實踐一下违帆。首先創(chuàng)建一個簡單的Spring Boot項目浙巫,其主要依賴如下:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
- Tips:這里如果想要對接其他的監(jiān)控系統(tǒng),只需要更改依賴的包名刷后。例如想對接
Influx
的畴,則將依賴改為micrometer-registry-influx
即可。
編輯項目配置:
server:
port: 9562
spring:
application:
# 指定應用名
name: prometheus-demo
management:
endpoints:
web:
exposure:
# 將 Actuator 的 /actuator/prometheus 端點暴露出來
include: 'prometheus'
metrics:
tags:
# 為指標設置一個Tag尝胆,這里設置為應用名丧裁,Tag是Prometheus提供的一種能力,從而實現(xiàn)更加靈活的篩選
application: ${spring.application.name}
完成以上步驟后含衔,進行一個簡單的測試煎娇,看看端點是否能正常返回監(jiān)控數(shù)據(jù)。啟動項目贪染,訪問/actuator/prometheus
端點缓呛。正常情況下會返回如下內容:
# HELP process_start_time_seconds Start time of the process since unix epoch.
# TYPE process_start_time_seconds gauge
process_start_time_seconds{application="prometheus-demo",} 1.577697308142E9
# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
# TYPE jvm_buffer_memory_used_bytes gauge
jvm_buffer_memory_used_bytes{application="prometheus-demo",id="mapped",} 0.0
jvm_buffer_memory_used_bytes{application="prometheus-demo",id="direct",} 16384.0
# HELP tomcat_sessions_expired_sessions_total
# TYPE tomcat_sessions_expired_sessions_total counter
tomcat_sessions_expired_sessions_total{application="prometheus-demo",} 0.0
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of minor GC",application="prometheus-demo",cause="Metadata GC Threshold",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.006
jvm_gc_pause_seconds_count{action="end of major GC",application="prometheus-demo",cause="Metadata GC Threshold",} 1.0
jvm_gc_pause_seconds_sum{action="end of major GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.032
jvm_gc_pause_seconds_count{action="end of minor GC",application="prometheus-demo",cause="Allocation Failure",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",application="prometheus-demo",cause="Allocation Failure",} 0.008
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of minor GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.006
jvm_gc_pause_seconds_max{action="end of major GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.032
jvm_gc_pause_seconds_max{action="end of minor GC",application="prometheus-demo",cause="Allocation Failure",} 0.008
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 0.0
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Old Gen",} 1.3801776E7
jvm_memory_used_bytes{application="prometheus-demo",area="nonheap",id="Metaspace",} 3.522832E7
jvm_memory_used_bytes{application="prometheus-demo",area="nonheap",id="Code Cache",} 6860800.0
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Eden Space",} 1.9782928E7
jvm_memory_used_bytes{application="prometheus-demo",area="nonheap",id="Compressed Class Space",} 4825568.0
# HELP logback_events_total Number of error level events that made it to the logs
# TYPE logback_events_total counter
logback_events_total{application="prometheus-demo",level="info",} 7.0
logback_events_total{application="prometheus-demo",level="trace",} 0.0
logback_events_total{application="prometheus-demo",level="warn",} 0.0
logback_events_total{application="prometheus-demo",level="debug",} 0.0
logback_events_total{application="prometheus-demo",level="error",} 0.0
# HELP process_uptime_seconds The uptime of the Java virtual machine
# TYPE process_uptime_seconds gauge
process_uptime_seconds{application="prometheus-demo",} 30.499
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
# TYPE jvm_buffer_count_buffers gauge
jvm_buffer_count_buffers{application="prometheus-demo",id="mapped",} 0.0
jvm_buffer_count_buffers{application="prometheus-demo",id="direct",} 2.0
# HELP system_cpu_count The number of processors available to the Java virtual machine
# TYPE system_cpu_count gauge
system_cpu_count{application="prometheus-demo",} 6.0
# HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads{application="prometheus-demo",} 22.0
# HELP tomcat_sessions_alive_max_seconds
# TYPE tomcat_sessions_alive_max_seconds gauge
tomcat_sessions_alive_max_seconds{application="prometheus-demo",} 0.0
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 1.5204352E7
jvm_memory_committed_bytes{application="prometheus-demo",area="heap",id="PS Old Gen",} 1.31596288E8
jvm_memory_committed_bytes{application="prometheus-demo",area="nonheap",id="Metaspace",} 3.7879808E7
jvm_memory_committed_bytes{application="prometheus-demo",area="nonheap",id="Code Cache",} 6881280.0
jvm_memory_committed_bytes{application="prometheus-demo",area="heap",id="PS Eden Space",} 1.76685056E8
jvm_memory_committed_bytes{application="prometheus-demo",area="nonheap",id="Compressed Class Space",} 5373952.0
# HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
# TYPE jvm_buffer_total_capacity_bytes gauge
jvm_buffer_total_capacity_bytes{application="prometheus-demo",id="mapped",} 0.0
jvm_buffer_total_capacity_bytes{application="prometheus-demo",id="direct",} 16384.0
# HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes{application="prometheus-demo",} 1.3801776E7
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 1.5204352E7
jvm_memory_max_bytes{application="prometheus-demo",area="heap",id="PS Old Gen",} 2.841116672E9
jvm_memory_max_bytes{application="prometheus-demo",area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{application="prometheus-demo",area="nonheap",id="Code Cache",} 2.5165824E8
jvm_memory_max_bytes{application="prometheus-demo",area="heap",id="PS Eden Space",} 1.390411776E9
jvm_memory_max_bytes{application="prometheus-demo",area="nonheap",id="Compressed Class Space",} 1.073741824E9
# HELP jvm_threads_daemon_threads The current number of live daemon threads
# TYPE jvm_threads_daemon_threads gauge
jvm_threads_daemon_threads{application="prometheus-demo",} 18.0
# HELP jvm_threads_states_threads The current number of threads having NEW state
# TYPE jvm_threads_states_threads gauge
jvm_threads_states_threads{application="prometheus-demo",state="runnable",} 8.0
jvm_threads_states_threads{application="prometheus-demo",state="new",} 0.0
jvm_threads_states_threads{application="prometheus-demo",state="timed-waiting",} 2.0
jvm_threads_states_threads{application="prometheus-demo",state="blocked",} 0.0
jvm_threads_states_threads{application="prometheus-demo",state="waiting",} 12.0
jvm_threads_states_threads{application="prometheus-demo",state="terminated",} 0.0
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
# TYPE jvm_gc_memory_promoted_bytes_total counter
jvm_gc_memory_promoted_bytes_total{application="prometheus-demo",} 8296848.0
# HELP tomcat_sessions_active_max_sessions
# TYPE tomcat_sessions_active_max_sessions gauge
tomcat_sessions_active_max_sessions{application="prometheus-demo",} 0.0
# HELP tomcat_sessions_created_sessions_total
# TYPE tomcat_sessions_created_sessions_total counter
tomcat_sessions_created_sessions_total{application="prometheus-demo",} 0.0
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total{application="prometheus-demo",} 1.36924824E8
# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage{application="prometheus-demo",} 0.10024585094452443
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage{application="prometheus-demo",} 0.38661791030714154
# HELP tomcat_sessions_active_current_sessions
# TYPE tomcat_sessions_active_current_sessions gauge
tomcat_sessions_active_current_sessions{application="prometheus-demo",} 0.0
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes{application="prometheus-demo",} 7195.0
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{application="prometheus-demo",exception="None",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**",} 1.0
http_server_requests_seconds_sum{application="prometheus-demo",exception="None",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**",} 0.012429856
# HELP http_server_requests_seconds_max
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{application="prometheus-demo",exception="None",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**",} 0.012429856
# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
# TYPE jvm_gc_max_data_size_bytes gauge
jvm_gc_max_data_size_bytes{application="prometheus-demo",} 2.841116672E9
# HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
# TYPE jvm_threads_live_threads gauge
jvm_threads_live_threads{application="prometheus-demo",} 22.0
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total{application="prometheus-demo",} 1.0
# HELP tomcat_sessions_rejected_sessions_total
# TYPE tomcat_sessions_rejected_sessions_total counter
tomcat_sessions_rejected_sessions_total{application="prometheus-demo",} 0.0
該端點返回的數(shù)據(jù)是Prometheus需要使用的。每一項都有相應的注釋解釋其含義杭隙,相信不難看懂哟绊。例如:
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 0.0
表示:prometheus-demo
應用堆內存中的 PS Survivor Space
區(qū)域占用的空間是 0.0
個字節(jié)。
安裝Prometheus服務
接下來就是需要在服務器上安裝Prometheus服務痰憎,用于從微服務暴露的監(jiān)控端點中采集監(jiān)控數(shù)據(jù)票髓。為了簡單起見,我這里采用docker的安裝方式铣耘,其他安裝方式可以參考 官方安裝文檔洽沟。
首先為Prometheus準備一個配置文件:
[root@localhost ~]# mkdir /etc/prometheus
[root@localhost ~]# vim /etc/prometheus/prometheus.yml
scrape_configs:
# 任意寫,建議英文蜗细,不要包含特殊字符
- job_name: 'spring'
# 多久采集一次數(shù)據(jù)
scrape_interval: 15s
# 采集時的超時時間
scrape_timeout: 10s
# 采集的端點
metrics_path: '/actuator/prometheus'
# 被采集的服務地址玲躯,即微服務的ip及端口
static_configs:
- targets: ['192.168.1.252:9562']
該配置文件的目的是讓Prometheus服務自動每隔15秒請求 http://192.168.1.252:9562/actuator/prometheus
。更多配置項參考:Prometheus Configuration官方文檔
最后通過docker啟動Prometheus服務鳄乏,命令如下:
[root@localhost ~]# docker run -d -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus --config.file=/etc/prometheus/prometheus.yml
啟動成功后跷车,正常情況下訪問http://{ip}:9090
,就可以看到Prometheus的首頁:
點擊 Insert metric at cursor
橱野,即可選擇監(jiān)控指標朽缴;點擊 Graph
,即可讓指標以圖表方式展示水援;點擊Execute
按鈕密强,即可看到類似下圖的結果:
功能說明:
-
Insert metric at cursor
:選擇展示的指標 -
Graph
:讓指標以圖形展示 -
Execute
:繪制指標圖表信息 -
Add Graph
:繪制更多指標圖表
Grafana可視化
上一小節(jié)我們已經(jīng)成功搭建了Prometheus服務茅郎,并簡單介紹了Prometheus自帶的監(jiān)控數(shù)據(jù)可視化界面,然而使用體驗并不好或渤,功能也比較少系冗。下面我們來集成Grafana實現(xiàn)更友好、更貼近生產(chǎn)的監(jiān)控數(shù)據(jù)可視化平臺薪鹦。
同樣需要在服務器上安裝Grafana服務掌敬,為了簡單起見,我這里依舊采用docker的安裝方式池磁。其他安裝方式可以參考 官方安裝文檔奔害。
使用docker只需要一行命令就可以啟動Grafana,如下:
[root@localhost ~]# docker run -d -p 3000:3000 grafana/grafana
配置監(jiān)控數(shù)據(jù)源
Grafana啟動成功后地熄,訪問http://{ip}:3000/login
進行登錄华临,默認賬戶密碼均為admin
:
登錄成功后,首頁如下:
首先需要添加監(jiān)控數(shù)據(jù)的來源端考,點擊首頁中的Add data source
雅潭,即可看到類似如下的界面:
這里點擊Prometheus,即可看到類似如下界面却特,在這里配置Prometheus服務相關的信息:
保存成功后會有如下提示:
創(chuàng)建監(jiān)控Dashboard
點擊導航欄上的 +
按鈕寻馏,并點擊Dashboard
,將會看到類似如下的界面:
點擊 Add Query
核偿,即可看到類似如下的界面:
在紅框標記的位置添加指標查詢诚欠,指標的取值詳見Spring Boot應用的 /actuator/prometheus
端點,例如jvm_memory_used_bytes
漾岳、jvm_threads_states_threads
轰绵、jvm_threads_live_threads
等。
Grafana會給你較好的提示尼荆,并且支持較為復雜的計算左腔,例如聚合、求和捅儒、平均等液样。如果想要繪制多個線條,可點擊Add Query
按鈕巧还。如上圖所示鞭莽,筆者為圖表繪制了兩條線,分別代表daemon以及peak線程麸祷。
點擊下圖的按鈕澎怒,并填入Title,即可設置圖表標題:
若需要為Dashboard添加新的圖表則點擊上圖中的左上角按鈕:
并按下圖步驟操作即可:
如果需要保存該Dashboard阶牍,則點擊右上角的保存按鈕即可:
Dashboard市場
至此喷面,我們已經(jīng)成功將Grafana與Prometheus集成星瘾,實現(xiàn)了較為豐富的圖表展示——將關心的監(jiān)控指標放置到Dashboard上,并且非常靈活惧辈!然而琳状,這個配置的操作雖然不難,但還是挺費時間的盒齿。
那么是否有配置好的又強大念逞、又通用、拿來即用的Dashboard呢县昂?答案是肯定的肮柜!前往 Grafana Lab - Dashboards 陷舅,輸入關鍵詞即可搜索指定Dashboard:
如上圖所示倒彰,可以找到若干款以 Prometheus 作為數(shù)據(jù)源,支持Micrometer的Dashboard莱睁。下面待讳,簡單演示一下如何使用 JVM(Micrometer)
這個Dashboard。點擊 JVM(Micrometer)
進入Dashboard詳情介紹頁仰剿,如下圖所示:
如圖已詳細描述了該Dashboard的特性创淡、配置。其中的management.metrics.tags.application
南吮,前面安裝Prometheus服務時已經(jīng)配置過了琳彩。該頁的右上角用紅框標注的 4701
是一個非常重要的數(shù)字,因為這是該Dashboard的id部凑。
回到Grafana的首頁露乏,我們來導入這個Dashboard,按下圖步驟操作:
輸入后即可看到類似如下的界面涂邀,選擇數(shù)據(jù)源瘟仿,并點擊Import:
此時,即可看到類似如下的界面比勉,我們常關心的指標該Dashboard均已支持:
在頁面上方的選項欄中可以選擇查看不同的服務/應用:
此外劳较,還有一些比較好用的Dashboard,可以自行了解一下這里就不贅述了: