定義
一千個(gè)人眼里有一千個(gè)哈姆雷特。如果說誰最有資格定義kafka是什么温鸽,那么肯定是官方文檔:
Apache Kafka? is a distributed streaming platform.
官方還對(duì)流平臺(tái)進(jìn)行了定義--流平臺(tái)有三大關(guān)鍵能力(A streaming platform has three key capabilities):
- Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
- Store streams of records in a fault-tolerant durable way.
- Process streams of records as they occur.
第一個(gè)特性是類MQ的發(fā)布訂閱特性手负,第二個(gè)特性就是具備容錯(cuò)的存儲(chǔ)能力姑尺,第三個(gè)特性是處理數(shù)據(jù)。所以kafka可以替代ActiveMQ這類消息中間件统捶。另外我們看一下官方對(duì)kafka的定位柄粹,如下圖所示:
kafka幾個(gè)重要的概念:
- Kafka is run as a cluster on one or more servers that can span multiple datacenters.
- The Kafka cluster stores streams of records in categories called topics.
- Each record consists of a key, a value, and a timestamp.
架構(gòu)
kafka架構(gòu)如下圖所示驻右,消息中間件的本質(zhì)就是:生產(chǎn)-存儲(chǔ)-消費(fèi)。由下圖可知堪夭,在kafka的架構(gòu)設(shè)計(jì)里,無論是生產(chǎn)者恨豁,還是消費(fèi)者,還是消息存儲(chǔ)菊匿,都可以水平擴(kuò)容從而提高整個(gè)集群的處理能力计福,生來就是分布式系統(tǒng)。另外疹蛉,圖中沒有展示出來的kafka另一個(gè)很重要的特性力麸,那就是副本,在創(chuàng)建topic的時(shí)候指定分區(qū)數(shù)量的同時(shí)克蚂,還可以指定副本的數(shù)量(副本最大數(shù)量不允許超過broker的數(shù)量埃叭,否則會(huì)報(bào)錯(cuò):Replication factor: 2 larger than available brokers: 1)。各個(gè)副本之間只有一個(gè)leader赤屋,其他是follow,只有l(wèi)eader副本提供讀寫服務(wù)媚媒,follow副本只是冷備涩僻,當(dāng)leader掛掉會(huì)從follow中選舉一個(gè)leader,從而達(dá)到高可用嵌巷。
圖片來源于
https://en.wikipedia.org/wiki/File:Overview_of_Apache_Kafka.svg
topic
下圖是topic的解剖圖室抽,kafka只有topic的概念狠半,沒有類似ActiveMQ中的Queue(一對(duì)一)的概念(ActiveMQ既有Topic又有Queue)颤难。一個(gè)topic可以有若干個(gè)分區(qū)已维,且分區(qū)可以動(dòng)態(tài)修改,但是只允許增加不允許減少垛耳。每個(gè)分區(qū)中的消息是有序的堂鲜。各個(gè)分區(qū)之間的消息是無序的。新消息采用追加的方式寫入缔莲,這種順序?qū)懭敕绞匠兆啵瑥亩筴afka的吞吐能力非常強(qiáng)大(一些驗(yàn)證表名順序?qū)懭氪疟P的速度超過隨機(jī)寫入內(nèi)存)。
topic定義
官方定義:A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
例如訂單支付成功后擅憔,發(fā)送名為TOPIC_PAYMENT_ORDER_SUCCESS
檐晕,積分系統(tǒng)可以接收這個(gè)topic,給用戶送積分辟灰。會(huì)員系統(tǒng)可以接收這個(gè)topic,增加會(huì)員成長值笛洛。支付寶里的螞蟻莊園還有支付成功后送飼料等乃坤。-
磁盤&內(nèi)存速度對(duì)比
由下圖可知沟蔑,順序?qū)懭氪疟P的速度(Sequential, disk)為53.2M,而隨機(jī)寫入內(nèi)存的速度(Random, memory)為36.7M厅须。
磁盤&內(nèi)存速度對(duì)比
圖片來源于網(wǎng)絡(luò):http://searene.me/2017/07/09/Why-is-Kafka-so-fast/
durable
kafka對(duì)消息日志的存儲(chǔ)策略為:The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
即無論如何朗和,kafka會(huì)持久化保存所有消息,無論它們是否已經(jīng)被消費(fèi)千埃。而kafka消息日志保留策略通過配置決定(以log.retention
開頭的一些配置忆植,例如log.retention.ms
,log.retention.minutes
朝刊,log.retention.hours
拾氓,log.retention.bytes
),例如配置有效期兩天咙鞍,那么兩天內(nèi)這些消息日志都能通過offset訪問奶陈。到期后,kafka會(huì)刪除這些消息日志文件釋放磁盤空間吃粒。
consumer
kafka消費(fèi)topic中某個(gè)分區(qū)示意圖如下,至于kafka如何在各個(gè)topic的各個(gè)分區(qū)中選擇某個(gè)分區(qū)事示,后面的文章會(huì)提到僻肖。由下圖可知,消費(fèi)者通過offset定位并讀取消息劝堪,且各個(gè)消費(fèi)者持有的offset是自己的消費(fèi)進(jìn)度揉稚。
consumer group
- each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
- If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.
- If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
即對(duì)于訂閱了某個(gè)topic的consumer group下的所有consumer搀玖,任意一條消息只會(huì)被其中一個(gè)consumer消費(fèi)。如果有多個(gè)consumer group,各個(gè)consumer group之間互不干擾含末。consumer group示意圖如下所示即舌,某個(gè)topic消息有4個(gè)分區(qū):P0, P1, P2, P3。Consumer Group A中有兩個(gè)consumer:C1和C2沼撕。Consumer Group B中有4個(gè)consumer:C3芜飘,C4,C5和C6笼沥。如果現(xiàn)在生產(chǎn)者發(fā)送了一條消息娶牌,那么這條消息只會(huì)被Consumer Group A中的C1和C2之中某個(gè)消費(fèi)者消費(fèi)到,以及被Consumer Group B中的C3诗良,C4,C5和C6之中某個(gè)消費(fèi)者消費(fèi)到舞骆。