Kafka的日志清理-LogCleaner

這里說的日志,是指Kafka保存寫入消息的文件;
Kafka日志清除策略包括中間:
1. 基于時間和大小的刪除策略;
2. Compact清理策略;
我們這里主要介紹基于Compact策略的Log Clean;

Compact策略說明

Kafka官網(wǎng)介紹: Log compaction;
Compact就是壓縮, 只能針對特定的topic應(yīng)用此策略,即寫入的message都帶有Key, 合并相同Key的message, 只留下最新的message;
在壓縮過程中, 針對message的payload為null的也將會去除掉;
官網(wǎng)上扒了一張圖, 大家先感受下:

110.png

日志清理過程中的狀態(tài)

主要涉及三種狀態(tài): LogCleaningInProgress, LogCleaningAborted,和LogCleaningPaused, 從字面上就很容易理解是什么意思,下面是源碼中的注釋:

If a partition is to be cleaned, it enters the LogCleaningInProgress state.

While a partition is being cleaned, it can be requested to be aborted and paused. Then the partition first enters

the LogCleaningAborted state. Once the cleaning task is aborted, the partition enters the LogCleaningPaused state.

While a partition is in the LogCleaningPaused state, it won't be scheduled for cleaning again, until cleaning is requested to be resumed.

LogCleanerManager類管理所有清理的log的狀態(tài)及轉(zhuǎn)換:

def abortCleaning(topicAndPartition: TopicAndPartition)
def abortAndPauseCleaning(topicAndPartition: TopicAndPartition)
def resumeCleaning(topicAndPartition: TopicAndPartition)
def checkCleaningAborted(topicAndPartition: TopicAndPartition)

要清理的日志的選取

因?yàn)檫@個compact清理過程涉及到log和index等文件的重寫,比較耗IO, 因此kafka會作流控, 每次compact時都會先按規(guī)則確定要清理哪些TopicAndPartiton的log;
使用LogToClean類來表示要被清理的Log:

private case class LogToClean(topicPartition: TopicAndPartition, log: Log, firstDirtyOffset: Long) extends Ordered[LogToClean] {
  val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size).sum
  val dirtyBytes = log.logSegments(firstDirtyOffset, math.max(firstDirtyOffset, log.activeSegment.baseOffset)).map(_.size).sum
  val cleanableRatio = dirtyBytes / totalBytes.toDouble
  def totalBytes = cleanBytes + dirtyBytes
  override def compare(that: LogToClean): Int = math.signum(this.cleanableRatio - that.cleanableRatio).toInt
}

firstDirtyOffset:表示本次清理的起始點(diǎn), 其前邊的offset將被作清理,與在其后的message作key的合并;
val cleanableRatio = dirtyBytes / totalBytes.toDouble, 需要清理的log的比例,這個值越大,越可能被最后選中作清理;
每次清理完,要更新當(dāng)前已經(jīng)清理到的位置, 記錄在cleaner-offset-checkpoint文件中,作為下一次清理時生成firstDirtyOffset的參考;

def updateCheckpoints(dataDir: File, update: Option[(TopicAndPartition,Long)]) {
    inLock(lock) {
      val checkpoint = checkpoints(dataDir)
      val existing = checkpoint.read().filterKeys(logs.keys) ++ update
      checkpoint.write(existing)
    }
  }

選出最需要清理的日志:

def grabFilthiestLog(): Option[LogToClean] = {
    inLock(lock) {
      val lastClean = allCleanerCheckpoints()
      val dirtyLogs = logs.filter {
        case (topicAndPartition, log) => log.config.compact  // skip any logs marked for delete rather than dedupe
      }.filterNot {
        case (topicAndPartition, log) => inProgress.contains(topicAndPartition) // skip any logs already in-progress
      }.map {
        case (topicAndPartition, log) => // create a LogToClean instance for each
          // if the log segments are abnormally truncated and hence the checkpointed offset
          // is no longer valid, reset to the log starting offset and log the error event
          val logStartOffset = log.logSegments.head.baseOffset
          val firstDirtyOffset = {
            val offset = lastClean.getOrElse(topicAndPartition, logStartOffset)
            if (offset < logStartOffset) {
              error("Resetting first dirty offset to log start offset %d since the checkpointed offset %d is invalid."
                    .format(logStartOffset, offset))
              logStartOffset
            } else {
              offset
            }
          }
          LogToClean(topicAndPartition, log, firstDirtyOffset)
      }.filter(ltc => ltc.totalBytes > 0) // skip any empty logs

      this.dirtiestLogCleanableRatio = if (!dirtyLogs.isEmpty) dirtyLogs.max.cleanableRatio else 0
      // and must meet the minimum threshold for dirty byte ratio
      val cleanableLogs = dirtyLogs.filter(ltc => ltc.cleanableRatio > ltc.log.config.minCleanableRatio)
      if(cleanableLogs.isEmpty) {
        None
      } else {
        val filthiest = cleanableLogs.max
        inProgress.put(filthiest.topicPartition, LogCleaningInProgress)
        Some(filthiest)
      }
    }
  }

代碼看著多,實(shí)在比較簡單:

從所有的Log中產(chǎn)生出 LogToClean對象列表;
從1中獲得的LogToClean列表中過濾過cleanableRatio大于config中配置的清理比率的LogToClean;
從2中獲取的LogToClean列表中取cleanableRatio最大的,即為當(dāng)前最需要被清理的.

先放兩張網(wǎng)上扒來的圖:

111.png

這里的CleanerPoint就是我們上面說的firstDirtyOffset;
Log Tail中的key將被合并到 LogHead中,實(shí)際上因?yàn)闃?gòu)建OffsetMap是在Log Head部分,因此合并Key的部分還包括構(gòu)建OffsetMap最后到達(dá)的Offset位置;

下面這個是整個壓縮合并的過程, Kafka的代碼就是把這個過程翻譯成Code

112.png

構(gòu)建OffsetMap

構(gòu)建上面圖111.png中LogHead部分的所有日志的OffsetMap, 此Map中的key即為message.key的hash值, value即為當(dāng)前message的offset
實(shí)現(xiàn):

private[log] def buildOffsetMap(log: Log, start: Long, end: Long, map: OffsetMap): Long = {
    map.clear()
    val dirty = log.logSegments(start, end).toSeq
    info("Building offset map for log %s for %d segments in offset range [%d, %d).".format(log.name, dirty.size, start, end))
    
    // Add all the dirty segments. We must take at least map.slots * load_factor,
    // but we may be able to fit more (if there is lots of duplication in the dirty section of the log)
    var offset = dirty.head.baseOffset
    require(offset == start, "Last clean offset is %d but segment base offset is %d for log %s.".format(start, offset, log.name))
    val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt
    var full = false
    for (segment <- dirty if !full) {
      checkDone(log.topicAndPartition)
      val segmentSize = segment.nextOffset() - segment.baseOffset

      require(segmentSize <= maxDesiredMapSize, "%d messages in segment %s/%s but offset map can fit only %d. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads".format(segmentSize,  log.name, segment.log.file.getName, maxDesiredMapSize))
      if (map.size + segmentSize <= maxDesiredMapSize)
        offset = buildOffsetMapForSegment(log.topicAndPartition, segment, map)
      else
        full = true
    }
    info("Offset map for log %s complete.".format(log.name))
    offset
  }

順序讀取每個LogSegment, 將相關(guān)信息put到OffsetMap, 其中的key是message.key的hash值, 這個地方有個坑,如果出現(xiàn)了hash碰撞怎么?
build的OffsetMap有大小限制, 不能超過val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt.

重新分組需要清理的LogSegments

因?yàn)閴嚎s清理后,原來的單個LogSegment勢必大小要減少,因此需要重新分組來為重寫Log和Index文件作準(zhǔn)備;
分組的規(guī)則也很簡單: 根據(jù)segmentsize和indexsize進(jìn)行分組,這個分組是每一組的segmentsize不能超過segmentSize的配置大小,indexfile不能超過配置的最大indexsize的大小,同時條數(shù)不能超過int.maxvalue.

private[log] def groupSegmentsBySize(segments: Iterable[LogSegment], maxSize: Int, maxIndexSize: Int): List[Seq[LogSegment]] = {
    var grouped = List[List[LogSegment]]()
    var segs = segments.toList
    while(!segs.isEmpty) {
      var group = List(segs.head)
      var logSize = segs.head.size
      var indexSize = segs.head.index.sizeInBytes
      segs = segs.tail
      while(!segs.isEmpty &&
            logSize + segs.head.size <= maxSize &&
            indexSize + segs.head.index.sizeInBytes <= maxIndexSize &&
            segs.head.index.lastOffset - group.last.index.baseOffset <= Int.MaxValue) {
        group = segs.head :: group
        logSize += segs.head.size
        indexSize += segs.head.index.sizeInBytes
        segs = segs.tail
      }
      grouped ::= group.reverse
    }
    grouped.reverse
  }

按上面重新分成的組作真正的清理工作

清理的過程,遍歷所有需要清理的LogSegment, 按一定的規(guī)則過濾出需要保留的msg重定入新的Log文件中;
符合下列規(guī)則的message將被保留
1. message的key在OffsetMap中能找到,同時當(dāng)前的message的offset不小于offsetMap中存儲的offset;
2. 這個segment的最后修改時間大于最大的保留時間,同時這個消息的value是有效的value,即不為null;

private def shouldRetainMessage(source: kafka.log.LogSegment,
                                  map: kafka.log.OffsetMap,
                                  retainDeletes: Boolean,
                                  entry: kafka.message.MessageAndOffset): Boolean = {
    val key = entry.message.key
    if (key != null) {
      val foundOffset = map.get(key)
      /* two cases in which we can get rid of a message:
       *   1) if there exists a message with the same key but higher offset
       *   2) if the message is a delete "tombstone" marker and enough time has passed
       */
      val redundant = foundOffset >= 0 && entry.offset < foundOffset
      val obsoleteDelete = !retainDeletes && entry.message.isNull
      !redundant && !obsoleteDelete
    } else {
      stats.invalidMessage()
      false
    }
  }

清理:
1. 實(shí)現(xiàn)下就是按上面的規(guī)則過濾過需要保留的message后重寫Log和Index文件過程;
2. 具體寫文件的流程可參考 Kafka中Message存儲相關(guān)類大揭密

private[log] def cleanSegments(log: Log,
                                 segments: Seq[LogSegment], 
                                 map: OffsetMap, 
                                 deleteHorizonMs: Long) {
    // create a new segment with the suffix .cleaned appended to both the log and index name
    val logFile = new File(segments.head.log.file.getPath + Log.CleanedFileSuffix)
    logFile.delete()
    val indexFile = new File(segments.head.index.file.getPath + Log.CleanedFileSuffix)
    indexFile.delete()
    val messages = new FileMessageSet(logFile, fileAlreadyExists = false, initFileSize = log.initFileSize(), preallocate = log.config.preallocate)
    val index = new OffsetIndex(indexFile, segments.head.baseOffset, segments.head.index.maxIndexSize)
    val cleaned = new LogSegment(messages, index, segments.head.baseOffset, segments.head.indexIntervalBytes, log.config.randomSegmentJitter, time)

    try {
      // clean segments into the new destination segment
      for (old <- segments) {
        val retainDeletes = old.lastModified > deleteHorizonMs
        info("Cleaning segment %s in log %s (last modified %s) into %s, %s deletes."
            .format(old.baseOffset, log.name, new Date(old.lastModified), cleaned.baseOffset, if(retainDeletes) "retaining" else "discarding"))
        cleanInto(log.topicAndPartition, old, cleaned, map, retainDeletes)
      }

      // trim excess index
      index.trimToValidSize()

      // flush new segment to disk before swap
      cleaned.flush()

      // update the modification date to retain the last modified date of the original files
      val modified = segments.last.lastModified
      cleaned.lastModified = modified

      // swap in new segment
      info("Swapping in cleaned segment %d for segment(s) %s in log %s.".format(cleaned.baseOffset, segments.map(_.baseOffset).mkString(","), log.name))
      log.replaceSegments(cleaned, segments)
    } catch {
      case e: LogCleaningAbortedException =>
        cleaned.delete()
        throw e
    }
  }

Kafka源碼分析-匯總

最后編輯于：2017.12.10 19:10:31

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末坠陈，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌昔逗，老刑警劉巖佑笋，帶你破解...
沈念sama閱讀 210,978評論 6贊 490
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件骤肛，死亡現(xiàn)場離奇詭異俗慈，居然都是意外死亡扮惦，警方通過查閱死者的電腦和手機(jī)艇搀，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 89,954評論 2贊 384
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門尿扯，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人焰雕，你說我怎么就攤上這事衷笋。” “怎么了矩屁？”我有些...
開封第一講書人閱讀 156,623評論 0贊 345
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵辟宗，是天一觀的道長。經(jīng)常有香客問我吝秕，道長泊脐，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 56,324評論 1贊 282
?港島之戀（遺憾婚禮）
正文為了忘掉前任烁峭，我火速辦了婚禮晨抡，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘则剃。我一直安慰自己耘柱，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 65,390評論 5贊 384
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布棍现。她就那樣靜靜地躺著调煎，像睡著了一般。火紅的嫁衣襯著肌膚如雪己肮。梳的紋絲不亂的頭發(fā)上士袄，一...
開封第一講書人閱讀 49,741評論 1贊 289
城市分裂傳說
那天悲关，我揣著相機(jī)與錄音，去河邊找鬼娄柳。笑死寓辱，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的赤拒。我是一名探鬼主播秫筏，決...
沈念sama閱讀 38,892評論 3贊 405
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼挎挖！你這毒婦竟也來了这敬？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,655評論 0贊 266
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤蕉朵，失蹤者是張志新（化名）和其女友劉穎崔涂，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體始衅，經(jīng)...
沈念sama閱讀 44,104評論 1贊 303
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡冷蚂，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,451評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了汛闸。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片帝雇。...
茶點(diǎn)故事閱讀 38,569評論 1贊 340
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖蛉拙，靈堂內(nèi)的尸體忽然破棺而出尸闸，到底是詐尸還是另有隱情，我是刑警寧澤孕锄，帶...
沈念sama閱讀 34,254評論 4贊 328
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布吮廉，位于F島的核電站，受9級特大地震影響畸肆，放射性物質(zhì)發(fā)生泄漏宦芦。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,834評論 3贊 312
男人毒藥：我在死后第九天來索命
文/蒙蒙一轴脐、第九天我趴在偏房一處隱蔽的房頂上張望调卑。院中可真熱鬧，春花似錦大咱、人聲如沸恬涧。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,725評論 0贊 21
一樁弒父案碴巾，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽溯捆。三九已至，卻和暖如春厦瓢，著一層夾襖步出監(jiān)牢的瞬間提揍，已是汗流浹背啤月。一陣腳步聲響...
開封第一講書人閱讀 31,950評論 1贊 264
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留劳跃，地道東北人谎仲。一個月前我還...
沈念sama閱讀 46,260評論 2贊 360
代替公主和親
正文我出身青樓，卻偏偏與公主長得像刨仑，于是被迫代替她去往敵國和親郑诺。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 43,446評論 2贊 348

Kafka的日志清理-LogCleaner

Compact策略說明

日志清理過程中的狀態(tài)

要清理的日志的選取

先放兩張網(wǎng)上扒來的圖:

構(gòu)建OffsetMap

重新分組需要清理的LogSegments

按上面重新分成的組作真正的清理工作

Kafka源碼分析-匯總

推薦閱讀更多精彩內(nèi)容