Kafka的日志清理-LogCleaner

  • 這里說的日志,是指Kafka保存寫入消息的文件;
  • Kafka日志清除策略包括中間:
    1. 基于時間和大小的刪除策略;
    2. Compact清理策略;
  • 我們這里主要介紹基于Compact策略的Log Clean;

Compact策略說明
  • Kafka官網(wǎng)介紹: Log compaction;
  • Compact就是壓縮, 只能針對特定的topic應(yīng)用此策略,即寫入的message都帶有Key, 合并相同Keymessage, 只留下最新的message;
  • 在壓縮過程中, 針對message的payload為null的也將會去除掉;
  • 官網(wǎng)上扒了一張圖, 大家先感受下:
110.png
日志清理過程中的狀態(tài)
  • 主要涉及三種狀態(tài): LogCleaningInProgress, LogCleaningAborted,和LogCleaningPaused, 從字面上就很容易理解是什么意思,下面是源碼中的注釋:
  • If a partition is to be cleaned, it enters the LogCleaningInProgress state.
  • While a partition is being cleaned, it can be requested to be aborted and paused. Then the partition first enters
  • the LogCleaningAborted state. Once the cleaning task is aborted, the partition enters the LogCleaningPaused state.
  • While a partition is in the LogCleaningPaused state, it won't be scheduled for cleaning again, until cleaning is requested to be resumed.
  • LogCleanerManager類 管理所有清理的log的狀態(tài)及轉(zhuǎn)換:
def abortCleaning(topicAndPartition: TopicAndPartition)
def abortAndPauseCleaning(topicAndPartition: TopicAndPartition)
def resumeCleaning(topicAndPartition: TopicAndPartition)
def checkCleaningAborted(topicAndPartition: TopicAndPartition) 
要清理的日志的選取
  • 因?yàn)檫@個compact清理過程涉及到log和index等文件的重寫,比較耗IO, 因此kafka會作流控, 每次compact時都會先按規(guī)則確定要清理哪些TopicAndPartiton的log;
  • 使用LogToClean類來表示要被清理的Log:
private case class LogToClean(topicPartition: TopicAndPartition, log: Log, firstDirtyOffset: Long) extends Ordered[LogToClean] {
  val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size).sum
  val dirtyBytes = log.logSegments(firstDirtyOffset, math.max(firstDirtyOffset, log.activeSegment.baseOffset)).map(_.size).sum
  val cleanableRatio = dirtyBytes / totalBytes.toDouble
  def totalBytes = cleanBytes + dirtyBytes
  override def compare(that: LogToClean): Int = math.signum(this.cleanableRatio - that.cleanableRatio).toInt
}
  1. firstDirtyOffset:表示本次清理的起始點(diǎn), 其前邊的offset將被作清理,與在其后的messagekey的合并;
  2. val cleanableRatio = dirtyBytes / totalBytes.toDouble, 需要清理的log的比例,這個值越大,越可能被最后選中作清理;
  3. 每次清理完,要更新當(dāng)前已經(jīng)清理到的位置, 記錄在cleaner-offset-checkpoint文件中,作為下一次清理時生成firstDirtyOffset的參考;
def updateCheckpoints(dataDir: File, update: Option[(TopicAndPartition,Long)]) {
    inLock(lock) {
      val checkpoint = checkpoints(dataDir)
      val existing = checkpoint.read().filterKeys(logs.keys) ++ update
      checkpoint.write(existing)
    }
  }
  • 選出最需要清理的日志:
def grabFilthiestLog(): Option[LogToClean] = {
    inLock(lock) {
      val lastClean = allCleanerCheckpoints()
      val dirtyLogs = logs.filter {
        case (topicAndPartition, log) => log.config.compact  // skip any logs marked for delete rather than dedupe
      }.filterNot {
        case (topicAndPartition, log) => inProgress.contains(topicAndPartition) // skip any logs already in-progress
      }.map {
        case (topicAndPartition, log) => // create a LogToClean instance for each
          // if the log segments are abnormally truncated and hence the checkpointed offset
          // is no longer valid, reset to the log starting offset and log the error event
          val logStartOffset = log.logSegments.head.baseOffset
          val firstDirtyOffset = {
            val offset = lastClean.getOrElse(topicAndPartition, logStartOffset)
            if (offset < logStartOffset) {
              error("Resetting first dirty offset to log start offset %d since the checkpointed offset %d is invalid."
                    .format(logStartOffset, offset))
              logStartOffset
            } else {
              offset
            }
          }
          LogToClean(topicAndPartition, log, firstDirtyOffset)
      }.filter(ltc => ltc.totalBytes > 0) // skip any empty logs

      this.dirtiestLogCleanableRatio = if (!dirtyLogs.isEmpty) dirtyLogs.max.cleanableRatio else 0
      // and must meet the minimum threshold for dirty byte ratio
      val cleanableLogs = dirtyLogs.filter(ltc => ltc.cleanableRatio > ltc.log.config.minCleanableRatio)
      if(cleanableLogs.isEmpty) {
        None
      } else {
        val filthiest = cleanableLogs.max
        inProgress.put(filthiest.topicPartition, LogCleaningInProgress)
        Some(filthiest)
      }
    }
  }

代碼看著多,實(shí)在比較簡單:

  1. 從所有的Log中產(chǎn)生出 LogToClean對象列表;
  2. 從1中獲得的LogToClean列表中過濾過cleanableRatio大于config中配置的清理比率的LogToClean;
  3. 從2中獲取的LogToClean列表中取cleanableRatio最大的,即為當(dāng)前最需要被清理的.
先放兩張網(wǎng)上扒來的圖:
111.png
  1. 這里的CleanerPoint就是我們上面說的firstDirtyOffset;
  2. Log Tail中的key將被合并到 LogHead中,實(shí)際上因?yàn)闃?gòu)建OffsetMap是在Log Head部分,因此合并Key的部分還包括構(gòu)建OffsetMap最后到達(dá)的Offset位置;

下面這個是整個壓縮合并的過程, Kafka的代碼就是把這個過程翻譯成Code

112.png

構(gòu)建OffsetMap
  • 構(gòu)建上面圖111.png中LogHead部分的所有日志的OffsetMap, 此Map中的key即為message.key的hash值, value即為當(dāng)前message的offset
  • 實(shí)現(xiàn):
private[log] def buildOffsetMap(log: Log, start: Long, end: Long, map: OffsetMap): Long = {
    map.clear()
    val dirty = log.logSegments(start, end).toSeq
    info("Building offset map for log %s for %d segments in offset range [%d, %d).".format(log.name, dirty.size, start, end))
    
    // Add all the dirty segments. We must take at least map.slots * load_factor,
    // but we may be able to fit more (if there is lots of duplication in the dirty section of the log)
    var offset = dirty.head.baseOffset
    require(offset == start, "Last clean offset is %d but segment base offset is %d for log %s.".format(start, offset, log.name))
    val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt
    var full = false
    for (segment <- dirty if !full) {
      checkDone(log.topicAndPartition)
      val segmentSize = segment.nextOffset() - segment.baseOffset

      require(segmentSize <= maxDesiredMapSize, "%d messages in segment %s/%s but offset map can fit only %d. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads".format(segmentSize,  log.name, segment.log.file.getName, maxDesiredMapSize))
      if (map.size + segmentSize <= maxDesiredMapSize)
        offset = buildOffsetMapForSegment(log.topicAndPartition, segment, map)
      else
        full = true
    }
    info("Offset map for log %s complete.".format(log.name))
    offset
  }
  1. 順序讀取每個LogSegment, 將相關(guān)信息put到OffsetMap, 其中的keymessage.key的hash值, 這個地方有個坑,如果出現(xiàn)了hash碰撞怎么?
  2. build的OffsetMap有大小限制, 不能超過val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt.
重新分組需要清理的LogSegments
  • 因?yàn)閴嚎s清理后,原來的單個LogSegment勢必大小要減少,因此需要重新分組來為重寫LogIndex文件作準(zhǔn)備;
  • 分組的規(guī)則也很簡單: 根據(jù)segmentsizeindexsize進(jìn)行分組,這個分組是每一組的segmentsize不能超過segmentSize的配置大小,indexfile不能超過配置的最大indexsize的大小,同時條數(shù)不能超過int.maxvalue.
private[log] def groupSegmentsBySize(segments: Iterable[LogSegment], maxSize: Int, maxIndexSize: Int): List[Seq[LogSegment]] = {
    var grouped = List[List[LogSegment]]()
    var segs = segments.toList
    while(!segs.isEmpty) {
      var group = List(segs.head)
      var logSize = segs.head.size
      var indexSize = segs.head.index.sizeInBytes
      segs = segs.tail
      while(!segs.isEmpty &&
            logSize + segs.head.size <= maxSize &&
            indexSize + segs.head.index.sizeInBytes <= maxIndexSize &&
            segs.head.index.lastOffset - group.last.index.baseOffset <= Int.MaxValue) {
        group = segs.head :: group
        logSize += segs.head.size
        indexSize += segs.head.index.sizeInBytes
        segs = segs.tail
      }
      grouped ::= group.reverse
    }
    grouped.reverse
  }
按上面重新分成的組作真正的清理工作
  • 清理的過程,遍歷所有需要清理的LogSegment, 按一定的規(guī)則過濾出需要保留的msg重定入新的Log文件中;
  • 符合下列規(guī)則的message將被保留
    1. messagekeyOffsetMap中能找到,同時當(dāng)前的messageoffset不小于offsetMap中存儲的offset;
    2. 這個segment的最后修改時間大于最大的保留時間,同時這個消息的value是有效的value,即不為null;
private def shouldRetainMessage(source: kafka.log.LogSegment,
                                  map: kafka.log.OffsetMap,
                                  retainDeletes: Boolean,
                                  entry: kafka.message.MessageAndOffset): Boolean = {
    val key = entry.message.key
    if (key != null) {
      val foundOffset = map.get(key)
      /* two cases in which we can get rid of a message:
       *   1) if there exists a message with the same key but higher offset
       *   2) if the message is a delete "tombstone" marker and enough time has passed
       */
      val redundant = foundOffset >= 0 && entry.offset < foundOffset
      val obsoleteDelete = !retainDeletes && entry.message.isNull
      !redundant && !obsoleteDelete
    } else {
      stats.invalidMessage()
      false
    }
  }
private[log] def cleanSegments(log: Log,
                                 segments: Seq[LogSegment], 
                                 map: OffsetMap, 
                                 deleteHorizonMs: Long) {
    // create a new segment with the suffix .cleaned appended to both the log and index name
    val logFile = new File(segments.head.log.file.getPath + Log.CleanedFileSuffix)
    logFile.delete()
    val indexFile = new File(segments.head.index.file.getPath + Log.CleanedFileSuffix)
    indexFile.delete()
    val messages = new FileMessageSet(logFile, fileAlreadyExists = false, initFileSize = log.initFileSize(), preallocate = log.config.preallocate)
    val index = new OffsetIndex(indexFile, segments.head.baseOffset, segments.head.index.maxIndexSize)
    val cleaned = new LogSegment(messages, index, segments.head.baseOffset, segments.head.indexIntervalBytes, log.config.randomSegmentJitter, time)

    try {
      // clean segments into the new destination segment
      for (old <- segments) {
        val retainDeletes = old.lastModified > deleteHorizonMs
        info("Cleaning segment %s in log %s (last modified %s) into %s, %s deletes."
            .format(old.baseOffset, log.name, new Date(old.lastModified), cleaned.baseOffset, if(retainDeletes) "retaining" else "discarding"))
        cleanInto(log.topicAndPartition, old, cleaned, map, retainDeletes)
      }

      // trim excess index
      index.trimToValidSize()

      // flush new segment to disk before swap
      cleaned.flush()

      // update the modification date to retain the last modified date of the original files
      val modified = segments.last.lastModified
      cleaned.lastModified = modified

      // swap in new segment
      info("Swapping in cleaned segment %d for segment(s) %s in log %s.".format(cleaned.baseOffset, segments.map(_.baseOffset).mkString(","), log.name))
      log.replaceSegments(cleaned, segments)
    } catch {
      case e: LogCleaningAbortedException =>
        cleaned.delete()
        throw e
    }
  }

Kafka源碼分析-匯總

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末坠陈,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌昔逗,老刑警劉巖佑笋,帶你破解...
    沈念sama閱讀 210,978評論 6 490
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件骤肛,死亡現(xiàn)場離奇詭異俗慈,居然都是意外死亡扮惦,警方通過查閱死者的電腦和手機(jī)艇搀,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 89,954評論 2 384
  • 文/潘曉璐 我一進(jìn)店門尿扯,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人焰雕,你說我怎么就攤上這事衷笋。” “怎么了矩屁?”我有些...
    開封第一講書人閱讀 156,623評論 0 345
  • 文/不壞的土叔 我叫張陵辟宗,是天一觀的道長。 經(jīng)常有香客問我吝秕,道長泊脐,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 56,324評論 1 282
  • 正文 為了忘掉前任烁峭,我火速辦了婚禮晨抡,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘则剃。我一直安慰自己耘柱,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,390評論 5 384
  • 文/花漫 我一把揭開白布棍现。 她就那樣靜靜地躺著调煎,像睡著了一般。 火紅的嫁衣襯著肌膚如雪己肮。 梳的紋絲不亂的頭發(fā)上士袄,一...
    開封第一講書人閱讀 49,741評論 1 289
  • 那天悲关,我揣著相機(jī)與錄音,去河邊找鬼娄柳。 笑死寓辱,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的赤拒。 我是一名探鬼主播秫筏,決...
    沈念sama閱讀 38,892評論 3 405
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼挎挖!你這毒婦竟也來了这敬?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 37,655評論 0 266
  • 序言:老撾萬榮一對情侶失蹤蕉朵,失蹤者是張志新(化名)和其女友劉穎崔涂,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體始衅,經(jīng)...
    沈念sama閱讀 44,104評論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡冷蚂,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,451評論 2 325
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了汛闸。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片帝雇。...
    茶點(diǎn)故事閱讀 38,569評論 1 340
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖蛉拙,靈堂內(nèi)的尸體忽然破棺而出尸闸,到底是詐尸還是另有隱情,我是刑警寧澤孕锄,帶...
    沈念sama閱讀 34,254評論 4 328
  • 正文 年R本政府宣布吮廉,位于F島的核電站,受9級特大地震影響畸肆,放射性物質(zhì)發(fā)生泄漏宦芦。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,834評論 3 312
  • 文/蒙蒙 一轴脐、第九天 我趴在偏房一處隱蔽的房頂上張望调卑。 院中可真熱鬧,春花似錦大咱、人聲如沸恬涧。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,725評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽溯捆。三九已至,卻和暖如春厦瓢,著一層夾襖步出監(jiān)牢的瞬間提揍,已是汗流浹背啤月。 一陣腳步聲響...
    開封第一講書人閱讀 31,950評論 1 264
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留劳跃,地道東北人谎仲。 一個月前我還...
    沈念sama閱讀 46,260評論 2 360
  • 正文 我出身青樓,卻偏偏與公主長得像刨仑,于是被迫代替她去往敵國和親郑诺。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,446評論 2 348

推薦閱讀更多精彩內(nèi)容

  • Design 1. Motivation 我們設(shè)計Kafka用來作為統(tǒng)一的平臺來處理大公司可能擁有的所有實(shí)時數(shù)據(jù)源...
    BlackManba_24閱讀 1,361評論 0 8
  • 轉(zhuǎn)帖:原文地址http://www.infoq.com/cn/articles/depth-interpretat...
    端木軒閱讀 2,322評論 0 19
  • kafka的定義:是一個分布式消息系統(tǒng)贸人,由LinkedIn使用Scala編寫,用作LinkedIn的活動流(Act...
    時待吾閱讀 5,309評論 1 15
  • kafka數(shù)據(jù)可靠性深度解讀 Kafka起初是由LinkedIn公司開發(fā)的一個分布式的消息系統(tǒng)佃声,后成為Apache...
    it_zzy閱讀 2,002評論 2 20
  • Spring Cloud為開發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見模式的工具(例如配置管理艺智,服務(wù)發(fā)現(xiàn),斷路器圾亏,智...
    卡卡羅2017閱讀 134,628評論 18 139