Kafka為什么會有這么高吞吐?
Kafka的發(fā)送邏輯和TCP的像極了豹芯,當客戶端調用了producer.send(msg)后,Kafka的主線程并不會著急直接調用網絡底層將消息發(fā)送給Kafka Broker快集,而是將消息放入一個叫RecordAccumulator的數據結構中秘遏。
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
serializedValue, headers, interceptCallback, remainingWaitMs);
其實放入RecordAccumulator中只是第一步舀瓢,接下去真實的發(fā)送邏輯甚至不在當前的主線程中园担,所以發(fā)送邏輯整體是以異步調用的方式來組織的旨椒。當消息真正被網絡層發(fā)送并且得到Broker的成功反饋后祝峻,是通過Future的形式來通知回調魔吐,所以為了不丟失異步鏈路,在放入RecordAccumulator后莱找,有個RecordAppendResult的返回值酬姆。
回過來再看下RecordAccumulator這個數據結構。
如下圖所示奥溺,RecordAccumulator整體是一個ConcurrentMap<TopicPartition, Deque<ProducerBatch>>混合數據機構辞色,Key就是TopicPartition,Value是一個雙向隊列Deque浮定,隊列的成員是一個個ProducerBatch相满。
舉個栗子,如果是發(fā)送TopicPartition(topic1:0)的消息桦卒,邏輯可以簡述為立美,首先去找TopicPartition(topic1:0)這個Key所對應的那個Deque隊列(如果沒有則創(chuàng)建一個),然后從Deque中拿到最后一個ProducerBatch對象方灾,最后將消息放入最后一個ProducerBatch中悯辙。
private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers,
Callback callback, Deque<ProducerBatch> deque) {
ProducerBatch last = deque.peekLast();
if (last != null) {
FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());
if (future == null)
last.closeForRecordAppends();
else
return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);
}
return null;
}
可見ProducerBatch也是一個容器型數據結構,從下面的代碼可以看出迎吵,消息的數據是按順序放入(MemoryRecordsBuilder recordsBuilder)中躲撰,消息的事件回調future是按順序放入(List<Thunk> thunks)中。
public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {
return null;
} else {
Long checksum = this.recordsBuilder.append(timestamp, key, value, headers);
this.maxRecordSize = Math.max(this.maxRecordSize, AbstractRecords.estimateSizeInBytesUpperBound(magic(),
recordsBuilder.compressionType(), key, value, headers));
this.lastAppendTime = now;
FutureRecordMetadata future = new FutureRecordMetadata(this.produceFuture, this.recordCount,
timestamp, checksum,
key == null ? -1 : key.length,
value == null ? -1 : value.length);
// we have to keep every future returned to the users in case the batch needs to be
// split to several new batches and resent.
thunks.add(new Thunk(callback, future));
this.recordCount++;
return future;
}
}
至此击费,放入RecordAccumulator的過程算是講完了拢蛋,下一篇聊下從RecordAccumulator拿出來。
在結束這篇前蔫巩,有幾點注意下谆棱,Map是Concurrent系的,所以在TopicPartition級別是可以安全并發(fā)put圆仔、get垃瞧、remove它的Deque。但是當涉及到的是同一個TopicPartition時坪郭,操縱的其實是同一個Deque个从,而Deque不是一個并發(fā)安全的集合,所以在對某一個具體的Deque進行增刪改時,需要使用鎖嗦锐。
Deque<ProducerBatch> dq = getOrCreateDeque(tp);
synchronized (dq) {
// Need to check if producer is closed again after grabbing the dequeue lock.
if (closed)
throw new KafkaException("Producer closed while send in progress");
RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
if (appendResult != null) {
// Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
return appendResult;
}
MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));
dq.addLast(batch);
incomplete.add(batch);
// Don't deallocate this buffer in the finally block as it's being used in the record batch
buffer = null;
return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}