環(huán)境:
golang | sarama | kafka |
---|---|---|
1.15 | 1.19 | v2.5.0枢希,kafka三節(jié)點(diǎn)集群,partition數(shù)為3 |
現(xiàn)象:golang微服務(wù)內(nèi)存占用超過1G,查看日志發(fā)現(xiàn)大量kafka相關(guān)錯誤日志鼓拧,繼而查看kafka集群,其中一個kafka節(jié)點(diǎn)容器掛掉了越妈。
疑問 為什么kafka集群只有一個broker掛了毁枯,客戶端就大量報錯呢
使用pprof查看內(nèi)存占用
通過beego admin頁面獲取 mem-1.memprof
go tool pprof mem-1.memprof
web // 使用web命令查看內(nèi)存占用情
可以看到調(diào)用棧為 withRecover
> backgroundMetadataUpdataer
> refreshMeaatdata
> RefreshMetada
> tryRefreshMetadata
> ...
定位問題
- 通過搜索sarama源碼,backgroundMetadataUpdataer 這個函數(shù)叮称,只有在sarama NewClient的時候調(diào)用种玛。
- go業(yè)務(wù)代碼在創(chuàng)建Consumer的時候,最終會調(diào)用到sarama的NewClient
- 業(yè)務(wù)代碼中瓤檐,創(chuàng)建consumer出錯會間隔10s一直重試
- backgroundMetadataUpdataer 中創(chuàng)建一個超時】】定時器赂韵,時間間是RefreshFrequency,默認(rèn)10min 挠蛉,所以backgroundMetadataUpdataer 會阻塞10min
// go業(yè)務(wù)代碼中祭示,創(chuàng)建consumer如果出錯會間隔10s一直重試
for {
consumer, err = cluster.NewConsumer(k.addr, group, topics, k.ccfg)
if err != nil {
logs.Error("new kafka consumer is error:", err.Error())
time.Sleep(10 * time.Second)
continue
}
logs.Info("new kafka consumer is success!")
break
}
sarama-cluster: NewClient
func NewConsumer(addrs []string, groupID string, topics []string, config *Config) (*Consumer, error) {
client, err := NewClient(addrs, config)
if err != nil {
return nil, err
}
...
}
func (client *client) backgroundMetadataUpdater() {
defer close(client.closed)
if client.conf.Metadata.RefreshFrequency == time.Duration(0) {
return
}
ticker := time.NewTicker(client.conf.Metadata.RefreshFrequency)
defer ticker.Stop()
for {
select {
case <-ticker.C:
if err := client.refreshMetadata(); err != nil {
Logger.Println("Client background metadata update:", err)
}
case <-client.closer:
return
}
}
}
為什么kafka集群只有一個broker,但是NewClient確失敗了谴古?
在kafka容器里查看topic, 發(fā)現(xiàn)Replicas和Isr只有一個质涛,找到kafka官方配置說明,自動生成的topic需要配置default.replication.factor這個參數(shù)掰担,才會生成3副本汇陆。
Review the following settings in the Advanced kafka-broker category, and modify as needed:
auto.create.topics.enable
Enable automatic creation of topics on the server. If this property is set to true, then attempts to produce, consume, or fetch metadata for a nonexistent topic automatically create the topic with the default replication factor and number of partitions. The default is enabled.
default.replication.factor
Specifies default replication factors for automatically created topics. For high availability production systems, you should set this value to at least 3.
num.partitions
Specifies the default number of log partitions per topic, for automatically created topics. The default value is 1. Change this setting based on the requirements related to your topic and partition design.