kafka監(jiān)控實(shí)戰(zhàn)(jmxtrans+InfluxDb+Grafana)
轉(zhuǎn)自http://navyaijm.blog.51cto.com/4647068/1958376
前言
從上周一直在調(diào)研找一款好用的kafka監(jiān)控,我測試使用過的KafkaOffsetMonitor
沛申、Burrow
硫眯、kafka-monitor
、Kafka-Manager
炫乓,他們各有優(yōu)缺點(diǎn),具體情況我這里就不展開描述了,大家可以到它們的git上去查看陪毡, 并且它們基本上都是監(jiān)控topic的寫入和讀取等等队萤,沒有提供對于整體集群的監(jiān)控信息轮锥,比如集群的分片、延時要尔、內(nèi)存使用情況等等舍杜,無意中發(fā)現(xiàn)了jmxtrans新娜,jmxtrans它是一個通過jmx采集java應(yīng)用的數(shù)據(jù)采集器,他的輸出可以是Graphite
既绩、StatsD
概龄、Ganglia
、InfluxDb
等等饲握,剛好我們現(xiàn)有的監(jiān)控是通過InfluxDb
做數(shù)據(jù)存儲的私杜,通過Grafana
做展示,下面就給大家介紹一下jmxtrans+InfluxDb+Grafana
監(jiān)控kafka
的整體解決方案救欧,并且不需要任何額外的開發(fā)工作衰粹,完全使用原生的。
環(huán)境介紹
角色
10.10.10.10 InfluxDb
10.10.10.100 Grafana
10.10.30.69 jmxtrans
kafka集群
10.10.20.14 node1
10.10.20.15 node2
10.10.20.16 node3
10.10.20.17 node4
軟件版本
influxdb-1.2.4-1.x86_64
grafana-4.1.1-1484211277.x86_64
jmxtrans-266.rpm
kafka_2.10-0.9.0.0.jar.asc
架構(gòu)圖
__
配置規(guī)劃
- jmxtrans我們可以分別在每臺kafka節(jié)點(diǎn)上部署颜矿,也可以部署到一臺機(jī)器上寄猩,我這里是選擇了后者,因?yàn)槲业募盒∑锝@樣配置文件可以集中管理田篇,如果集群比較大,可以考慮分散部署箍铭。
- 關(guān)于jmxtrans的配置文件泊柬,分全局指標(biāo)(每個kafka節(jié)點(diǎn))和topic指標(biāo),全局指標(biāo)每個節(jié)點(diǎn)一個配置文件诈火,命名規(guī)則:base_10.10.20.14.json兽赁,topic指標(biāo)是每個topic一個配置文件,命名規(guī)則:falcon_monitor_us_17.json
監(jiān)控指標(biāo)
全局指標(biāo)
每秒輸入的流量:
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec"
"attr" : [ "Count" ]
"resultAlias":"BytesInPerSec"
"tags" : {"application" : "BytesInPerSec"}
每秒輸入的流量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec"
"attr" : [ "Count" ]
"resultAlias":"BytesOutPerSec"
"tags" : {"application" : "BytesOutPerSec"}
每秒輸入的流量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec"
"attr" : [ "Count" ]
"resultAlias":"BytesRejectedPerSec"
"tags" : {"application" : "BytesRejectedPerSec"}
每秒的消息寫入總量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec"
"attr" : [ "Count" ]
"resultAlias":"MessagesInPerSec"
"tags" : {"application" : "MessagesInPerSec"}
每秒FetchFollower的請求次數(shù)
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower"
"attr" : [ "Count" ]
"resultAlias":"RequestsPerSec"
"tags" : {"request" : "FetchFollower"}
每秒FetchConsumer的請求次數(shù)
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer"
"attr" : [ "Count" ]
"resultAlias":"RequestsPerSec"
"tags" : {"request" : "FetchConsumer"}
每秒Produce的請求次數(shù)
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce"
"attr" : [ "Count" ]
"resultAlias":"RequestsPerSec"
"tags" : {"request" : "Produce"}
內(nèi)存使用的使用情況
"obj" : "java.lang:type=Memory"
"attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ]
"resultAlias":"MemoryUsage"
"tags" : {"application" : "MemoryUsage"}
GC的耗時和次數(shù)
"obj" : "java.lang:type=GarbageCollector,name=*"
"attr" : [ "CollectionCount","CollectionTime" ]
"resultAlias":"GC"
"tags" : {"application" : "GC"}
線程的使用情況
"obj" : "java.lang:type=Threading"
"attr" : [ "PeakThreadCount","ThreadCount" ]
"resultAlias":"Thread"
"tags" : {"application" : "Thread"}
副本落后主分片的最大消息數(shù)量
"obj" : "kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica"
"attr" : [ "Value" ]
"resultAlias":"ReplicaFetcherManager"
"tags" : {"application" : "MaxLag"}
該broker上的partition的數(shù)量
"obj" : "kafka.server:type=ReplicaManager,name=PartitionCount"
"attr" : [ "Value" ]
"resultAlias":"ReplicaManager"
"tags" : {"application" : "PartitionCount"}
正在做復(fù)制的partition的數(shù)量
"obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions"
"attr" : [ "Value" ]
"resultAlias":"ReplicaManager"
"tags" : {"application" : "UnderReplicatedPartitions"}
Leader的replica的數(shù)量
"obj" : "kafka.server:type=ReplicaManager,name=LeaderCount"
"attr" : [ "Value" ]
"resultAlias":"ReplicaManager"
"tags" : {"application" : "LeaderCount"}
一個請求FetchConsumer耗費(fèi)的所有時間
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer"
"attr" : [ "Count","Max" ]
"resultAlias":"TotalTimeMs"
"tags" : {"application" : "FetchConsumer"}
一個請求FetchFollower耗費(fèi)的所有時間
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower"
"attr" : [ "Count","Max" ]
"resultAlias":"TotalTimeMs"
"tags" : {"application" : "FetchFollower"}
一個請求Produce耗費(fèi)的所有時間
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce"
"attr" : [ "Count","Max" ]
"resultAlias":"TotalTimeMs"
"tags" : {"application" : "Produce"}
topic的監(jiān)控指標(biāo)
falcon_monitor_us每秒的寫入流量
"kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=falcon_monitor_us"
"attr" : [ "Count" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "BytesInPerSec"}
falcon_monitor_us每秒的輸出流量
"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=falcon_monitor_us"
"attr" : [ "Count" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "BytesOutPerSec"}
falcon_monitor_us每秒寫入消息的數(shù)量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=falcon_monitor_us"
"attr" : [ "Count" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "MessagesInPerSec"}
falcon_monitor_us在每個分區(qū)最后的Offset
"obj" : "kafka.log:type=Log,name=LogEndOffset,topic=falcon_monitor_us,partition=*"
"attr" : [ "Value" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "LogEndOffset"}
參數(shù)說明
obj
對應(yīng)jmx的ObjectName冷守,就是我們要監(jiān)控的指標(biāo)
attr
對應(yīng)ObjectName的屬性刀崖,可以理解為我們要監(jiān)控的指標(biāo)的值
resultAlias
對應(yīng)metric 的名稱,在InfluxDb里面就是MEASUREMENTS名
tags
對應(yīng)InfluxDb的tag功能拍摇,對與存儲在同一個MEASUREMENTS里面的不同監(jiān)控指標(biāo)可以做區(qū)分亮钦,我們在用Grafana繪圖的時候會用到,建議對每個監(jiān)控指標(biāo)都打上tags
對于全局監(jiān)控充活,每一個監(jiān)控指標(biāo)對應(yīng)一個MEASUREMENTS蜂莉,所有的kafka節(jié)點(diǎn)同一個監(jiān)控指標(biāo)數(shù)據(jù)寫同一個MEASUREMENTS ,對于topc監(jiān)控的監(jiān)控指標(biāo)混卵,同一個topic所有kafka節(jié)點(diǎn)寫到同一個MEASUREMENTS映穗,并且以topic名稱命名
安裝
kafka
這里不詳細(xì)介紹kafka集群的安裝,主要說一下kafka的啟動方式幕随,因?yàn)槲覀冃枰ㄟ^jmx采集kafka的監(jiān)控數(shù)據(jù)蚁滋,所以在kafka的啟動時候需要啟動jmx端口,啟動方式如下:
cd /data/kafka/bin/
JMX_PORT=9999 nohup ./kafka-server-start.sh ../config/server.properties >/dev/null 2>&1 &
influxDb
yum -y install influxdb ##安裝
/etc/init.d/influxdb start ##啟動服務(wù)
[root@ip-10-10-10-10 jmxtrans]# influx
Connected to http://localhost:8086 version 1.3.2
InfluxDB shell version: 1.3.2
> CREATE USER "root" WITH PASSWORD '123456' WITH ALL PRIVILEGES ##添加一個賬號
>
Grafana
yum -y install grafana ##安裝
/etc/init.d/grafana-server start ##啟動服務(wù)
jmxtrans
wget http://central.maven.org/maven2/org/jmxtrans/jmxtrans/266/jmxtrans-266.rpm
rpm -ivh jmxtrans-266.rpm ##安裝
/etc/init.d/jmxtrans start ##啟動
配置
這里主要介紹jmxtrans采集數(shù)據(jù)的配置文件撰寫和Grafana繪圖的配置注意事項(xiàng),kafka和InfluxDb的配置這里不做描述枢赔。
jmxtrans
- jmxtrans默認(rèn)讀取/var/lib/jmxtrans下的配置文件去采集數(shù)據(jù)的澄阳,所以我們把采集kafka監(jiān)控數(shù)據(jù)的配置文件都在這個目錄下,下面是我的配置文件命名規(guī)范:
[root@ip-10-10-30-69 jmxtrans]# ll
total 96
-rw-r--r-- 1 root root 1657 Aug 18 17:03 article-feedback-10min-json_14.json
-rw-r--r-- 1 root root 1657 Aug 18 17:03 article-feedback-10min-json_15.json
-rw-r--r-- 1 root root 1657 Aug 18 17:04 article-feedback-10min-json_16.json
-rw-r--r-- 1 root root 1657 Aug 18 17:04 article-feedback-10min-json_17.json
-rw-r--r-- 1 root root 8430 Aug 22 08:24 base_10.10.20.14.json
-rw-r--r-- 1 root root 8431 Aug 22 08:24 base_10.10.20.15.json
-rw-r--r-- 1 root root 8431 Aug 22 08:25 base_10.10.20.16.json
-rw-r--r-- 1 root root 8431 Aug 22 08:25 base_10.10.20.17.json
-rw-r--r-- 1 root root 2027 Aug 21 16:19 falcon_monitor_us_14.json
-rw-r--r-- 1 root root 2027 Aug 21 16:20 falcon_monitor_us_15.json
-rw-r--r-- 1 root root 2484 Aug 21 20:58 falcon_monitor_us_16.json
-rw-r--r-- 1 root root 2027 Aug 21 16:20 falcon_monitor_us_17.json
-rw-r--r-- 1 root root 2147 Aug 21 17:43 highgmp-articles-through-primary_14.json
-rw-r--r-- 1 root root 2147 Aug 21 17:46 highgmp-articles-through-primary_15.json
-rw-r--r-- 1 root root 2147 Aug 21 17:46 highgmp-articles-through-primary_16.json
-rw-r--r-- 1 root root 2147 Aug 21 17:47 highgmp-articles-through-primary_17.json
[root@ip-10-10-30-69 jmxtrans]# pwd
/var/lib/jmxtrans
- 全局監(jiān)控的配置文件踏拜,以10.10.20.14為例:
[root@ip-10-10-30-69 jmxtrans]# cat base_10.10.20.14.json
{
"servers" : [ {
"port" : "9999",
"host" : "10.10.20.14",
"queries" : [ {
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"BytesInPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesInPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"BytesOutPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesOutPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"BytesRejectedPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesRejectedPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"MessagesInPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MessagesInPerSec"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer",
"attr" : [ "Count" ],
"resultAlias":"RequestsPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"request" : "FetchConsumer"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower",
"attr" : [ "Count" ],
"resultAlias":"RequestsPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"request" : "FetchFollower"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce",
"attr" : [ "Count" ],
"resultAlias":"RequestsPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"request" : "Produce"}
} ]
},
{
"obj" : "java.lang:type=Memory",
"attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
"resultAlias":"MemoryUsage",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MemoryUsage"}
} ]
},
{
"obj" : "java.lang:type=GarbageCollector,name=*",
"attr" : [ "CollectionCount","CollectionTime" ],
"resultAlias":"GC",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "GC"}
} ]
},
{
"obj" : "java.lang:type=Threading",
"attr" : [ "PeakThreadCount","ThreadCount" ],
"resultAlias":"Thread",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "Thread"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica",
"attr" : [ "Value" ],
"resultAlias":"ReplicaFetcherManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MaxLag"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=PartitionCount",
"attr" : [ "Value" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "PartitionCount"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions",
"attr" : [ "Value" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "UnderReplicatedPartitions"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=LeaderCount",
"attr" : [ "Value" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "LeaderCount"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer",
"attr" : [ "Count","Max" ],
"resultAlias":"TotalTimeMs",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "FetchConsumer"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower",
"attr" : [ "Count","Max" ],
"resultAlias":"TotalTimeMs",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "FetchConsumer"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce",
"attr" : [ "Count","Max" ],
"resultAlias":"TotalTimeMs",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "Produce"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=IsrShrinksPerSec",
"attr" : [ "Count" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "IsrShrinksPerSec"}
} ]
}
]
} ]
}
- topic監(jiān)控的配置文件,以falcon_monitor_us的10.10.20.14節(jié)點(diǎn)為例:
[root@ip-10-10-30-69 jmxtrans]# cat falcon_monitor_us_14.json
{
"servers" : [ {
"port" : "9999",
"host" : "10.10.20.14",
"queries" : [ {
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=falcon_monitor_us",
"attr" : [ "Count" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesInPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=falcon_monitor_us",
"attr" : [ "Count" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesOutPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=falcon_monitor_us",
"attr" : [ "Count" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MessagesInPerSec"}
} ]
},
{
"obj" : "kafka.log:type=Log,name=LogEndOffset,topic=falcon_monitor_us,partition=*",
"attr" : [ "Value" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "LogEndOffset"}
} ]
}
]
} ]
}
Grafana配置
- 添加數(shù)據(jù)源
Url低剔、Database速梗、User、Password需要和jmxtrans采集數(shù)據(jù)配置文件里面的寫一致襟齿,然后點(diǎn)擊Save&Test姻锁,提示成功就正常了
- 創(chuàng)建一個dashboard,然后在這里配置每一個監(jiān)控指標(biāo)的圖
要點(diǎn)說明
- 對于監(jiān)控指標(biāo)為Count的監(jiān)控項(xiàng)猜欺,需要通過Grafana做計算得到我們想要的監(jiān)控位隶,比如BytesInPerSec這個指標(biāo),它的監(jiān)控值是一個累計值开皿,我們想要取到每秒的流量涧黄,肯定需要計算,(本次采集的值-上次采集的值)/60 ,jmxtrans是一分鐘采集一次數(shù)據(jù)赋荆,具體配置參考下面截圖:
因?yàn)槲覀兪且环昼姴杉淮螖?shù)據(jù)笋妥,所以group by 和derivative選1分鐘;因?yàn)槲覀円棵氲牧髁空叮詍ath這里除以60
- X軸的單位選擇春宣,比如流量的單位、時間的單位嫉你、每秒消息的個數(shù)無單位等等月帝,下面分布舉一個例子介紹說明
設(shè)置流量的單位 ,點(diǎn)擊需要設(shè)置的圖幽污,選擇"Edit"進(jìn)入編輯頁面嚷辅,切到Axes這個tab頁,Unit--》data(Metric)--》bytes
單位 油挥,點(diǎn)擊需要設(shè)置的圖潦蝇,選擇"Edit"進(jìn)入編輯頁面,切到Axes這個tab頁深寥,Unit--》time--》milliseconds(ms)
設(shè)置按原始值展示攘乒,無單位 ,點(diǎn)擊需要設(shè)置的圖惋鹅,選擇"Edit"進(jìn)入編輯頁面则酝,切到Axes這個tab頁,Unit--》none--》none
收獲總結(jié)
- 關(guān)于jmx收集了kafka的那些指標(biāo),對應(yīng)的值都是那些類型沽讹,對應(yīng)這個問題走了很多彎路般卑,各種谷歌百度拿到了有人整理過的,一個一個試爽雄,發(fā)現(xiàn)很多不能用蝠检,要不就是寫的是錯誤的,要不就是版本不同挚瘟,寫法不一樣叹谁,最后看到了jconsole這個工具,他可以連接到本地或者遠(yuǎn)程的jmx端口乘盖,能看到在收集的所有指標(biāo)焰檩,在windows下裝好jdk,在bin目錄你可以找到這個工具订框。
關(guān)于consumer的延時析苫,關(guān)官方介紹有一個type是 type=consumer-fetch-manager-metrics的指標(biāo),但是我這通過jconsole連進(jìn)來死活沒有找到穿扳,如果親們有使用這套監(jiān)控方案的衩侥,求幫忙解惑我的這個問題,謝了纵揍,官網(wǎng)監(jiān)控指標(biāo)如下:
[http://kafka.apache.org/documentation/#monitoring]