Kafka Notes

Basic

Messaging System + Storage System + Stream Processing

APIs

Kafka has four core APIs:

  • The Producer API allows an application to publish a stream of records to one or more Kafka topics.
  • The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
  • The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
  • The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

Topics and Logs

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.

For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. If the retention period of a record is expired, then the record would be discarded to free up space.

Partitions and Distribution

The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism (multiple consumers).

The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance (the key parameter is "request.required.acks" which is set by producer).

Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader (base on zookeeper). Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

Segment consist of partition.Segment is format in log file.

Producers

Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the record).

Consumers

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.

The partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group.

As a Messaging System

Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each record goes to one of them; in publish-subscribe the record is broadcast to all consumers. Each of these two models has a strength and a weakness. The strength of queuing is that it allows you to divide up the processing of data over multiple consumer instances, which lets you scale your processing. Unfortunately, queues aren't multi-subscriber—once one process reads the data it's gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of scaling processing since every message goes to every subscriber.

The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer group allows you to divide up processing over a collection of processes (the members of the consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple consumer groups.

As a Storage System

Data written to Kafka is written to disk and replicated for fault-tolerance. Kafka allows producers to wait on acknowledgement so that a write isn't considered complete until it is fully replicated and guaranteed to persist even if the server written to fails.

Stream Processing

In Kafka a stream processor is anything that takes continual streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末鹰霍,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子翎蹈,更是在濱河造成了極大的恐慌,老刑警劉巖腾誉,帶你破解...
    沈念sama閱讀 216,372評論 6 498
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異废睦,居然都是意外死亡施符,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,368評論 3 392
  • 文/潘曉璐 我一進(jìn)店門慈迈,熙熙樓的掌柜王于貴愁眉苦臉地迎上來若贮,“玉大人,你說我怎么就攤上這事痒留∏绰螅” “怎么了?”我有些...
    開封第一講書人閱讀 162,415評論 0 353
  • 文/不壞的土叔 我叫張陵伸头,是天一觀的道長匾效。 經(jīng)常有香客問我,道長恤磷,這世上最難降的妖魔是什么面哼? 我笑而不...
    開封第一講書人閱讀 58,157評論 1 292
  • 正文 為了忘掉前任,我火速辦了婚禮扫步,結(jié)果婚禮上魔策,老公的妹妹穿的比我還像新娘。我一直安慰自己河胎,他們只是感情好闯袒,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,171評論 6 388
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著仿粹,像睡著了一般搁吓。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上吭历,一...
    開封第一講書人閱讀 51,125評論 1 297
  • 那天堕仔,我揣著相機(jī)與錄音,去河邊找鬼晌区。 笑死摩骨,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的朗若。 我是一名探鬼主播恼五,決...
    沈念sama閱讀 40,028評論 3 417
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼哭懈!你這毒婦竟也來了灾馒?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 38,887評論 0 274
  • 序言:老撾萬榮一對情侶失蹤遣总,失蹤者是張志新(化名)和其女友劉穎睬罗,沒想到半個月后轨功,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,310評論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡容达,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,533評論 2 332
  • 正文 我和宋清朗相戀三年古涧,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片花盐。...
    茶點(diǎn)故事閱讀 39,690評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡羡滑,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出算芯,到底是詐尸還是另有隱情柒昏,我是刑警寧澤,帶...
    沈念sama閱讀 35,411評論 5 343
  • 正文 年R本政府宣布也祠,位于F島的核電站昙楚,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏诈嘿。R本人自食惡果不足惜堪旧,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,004評論 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望奖亚。 院中可真熱鬧淳梦,春花似錦、人聲如沸昔字。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,659評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽作郭。三九已至陨囊,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間夹攒,已是汗流浹背蜘醋。 一陣腳步聲響...
    開封第一講書人閱讀 32,812評論 1 268
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留咏尝,地道東北人压语。 一個月前我還...
    沈念sama閱讀 47,693評論 2 368
  • 正文 我出身青樓,卻偏偏與公主長得像编检,于是被迫代替她去往敵國和親胎食。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,577評論 2 353

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,322評論 0 10
  • 一半是陰允懂,一半是晴厕怜,陰則缺,晴則溢,陰晴圓缺粥航,從不虧欠誰舵揭。 忽地一聲驚雷起,從此美名天下?lián)P躁锡。寂寞仙子廣袖散。 待到...
    乂已生輝閱讀 743評論 7 29
  • 雪置侍, 來的那么突然映之, 不一會兒, 世界變了天蜡坊。 田野杠输, 白茫茫一片, 眼前秕衙, 雪花鋪滿了臉蠢甲。
    橘子洲的魚閱讀 170評論 4 3
  • /** * 值類型演示 * 前面介紹的基本數(shù)據(jù)類型都是值類型,到目前為止据忘,我們學(xué)過的引用類型只有字符串和數(shù)組鹦牛, *...
    29e0c7456d81閱讀 132評論 0 0
  • POST /goform/ser2netconfigAT HTTP/1.1Host: 192.168.16.254...
    GC風(fēng)暴閱讀 8,804評論 0 0