更多關(guān)注spark streaming源碼分析之流程詳解
DStreamGraph的作用是什么呢?
- DStreamGraph通過持有所有的inputstream和outputstream鲤看,劃分提交job
- 清理卫袒,spark streaming中一直接收數(shù)據(jù)秀存,會不會把內(nèi)存撐爆清蚀?checkpoint的data什么時候清理碰辅?什么時候更新汁雷?
//1. add input/output DS
def addInputStream(inputStream: InputDStream[_]) {
this.synchronized {
inputStream.setGraph(this)
inputStreams += inputStream
}
}
//output 同上
//2. 調(diào)用持有的output ds實例提交作業(yè)
def generateJobs(time: Time): Seq[Job] = {
.....
val jobs = this.synchronized {
outputStreams.flatMap { outputStream =>
val jobOption = outputStream.generateJob(time)
jobOption.foreach(_.setCallSite(outputStream.creationSite))
jobOption
}
}
jobs
}
//3.清理,spark streaming中一直接收數(shù)據(jù)轩褐,會不會把內(nèi)存撐爆椎咧?checkpoint的data什么時候清理?什么時候更新把介?等等見博客:
def clearMetadata(time: Time)
def updateCheckpointData(time: Time)
def clearCheckpointData(time: Time)
def restoreCheckpointData()