kubernetes版本:1.13.2
接兩節(jié):
kubernetes垃圾回收器GarbageCollector Controller源碼分析(一)
kubernetes垃圾回收器GarbageCollector Controller源碼分析(二)
主要步驟
GarbageCollector Controller源碼主要分為以下幾部分:
-
monitors
作為生產(chǎn)者將變化的資源放入graphChanges
隊列友浸;同時restMapper
定期檢測集群內(nèi)資源類型性芬,刷新monitors
-
runProcessGraphChanges
從graphChanges
隊列中取出變化的item
,根據(jù)情況放入attemptToDelete
隊列往弓; -
runProcessGraphChanges
從graphChanges
隊列中取出變化的item
旦装,根據(jù)情況放入attemptToOrphan
隊列页衙; -
runAttemptToDeleteWorker
從attemptToDelete
隊列取出,嘗試刪除垃圾資源阴绢; -
runAttemptToOrphanWorker
從attemptToOrphan
隊列取出店乐,處理該孤立的資源;
上一節(jié)分析了第2,3部分呻袭,本節(jié)分析第4眨八、5部分。
終結器
在閱讀以下代碼時棒妨,有必要先了解一下終結器踪古。
對象的終結器是在對象刪除之前需要執(zhí)行的邏輯,所有的對象在刪除之前券腔,它的終結器字段必須為空伏穆,終結器提供了一個通用的 API,它的功能不只是用于阻止級聯(lián)刪除纷纫,還能過通過它在對象刪除之前加入鉤子:
type ObjectMeta struct {
// ...
Finalizers []string
}
終結器在對象被刪之前運行枕扫,每當終結器成功運行之后,就會將它自己從 Finalizers 數(shù)組中刪除辱魁,當最后一個終結器被刪除之后烟瞧,API Server 就會刪除該對象。
在默認情況下染簇,刪除一個對象會刪除它的全部依賴参滴,但是我們在一些特定情況下我們只是想刪除當前對象本身并不想造成復雜的級聯(lián)刪除,垃圾回收機制在這時引入了 OrphanFinalizer锻弓,它會在對象被刪除之前向 Finalizers 數(shù)組添加或者刪除 OrphanFinalizer砾赔。
該終結器會監(jiān)聽對象的更新事件并將它自己從它全部依賴對象的 OwnerReferences 數(shù)組中刪除,與此同時會刪除所有依賴對象中已經(jīng)失效的 OwnerReferences 并將 OrphanFinalizer 從 Finalizers 數(shù)組中刪除。
通過 OrphanFinalizer 我們能夠在刪除一個 Kubernetes 對象時保留它的全部依賴暴心,為使用者提供一種更靈活的辦法來保留和刪除對象妓盲。
同時,也希望可以看一下"垃圾回收"官網(wǎng)文檔:
垃圾收集
attemptToDelete隊列
來到代碼$GOPATH\src\k8s.io\kubernetes\pkg\controller\garbagecollector\garbagecollector.go中:
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
for gc.attemptToDeleteWorker() {
}
}
從attemptToDelete隊列中取出資源专普,調用gc.attemptToDeleteItem(n)處理悯衬,期間如果出現(xiàn)error,則通過rateLimited重新加回attemptToDelete隊列檀夹。
func (gc *GarbageCollector) attemptToDeleteWorker() bool {
//從隊列里取出需要嘗試刪除的資源
item, quit := gc.attemptToDelete.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToDelete.Done(item)
n, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
err := gc.attemptToDeleteItem(n)
if err != nil {
if _, ok := err.(*restMappingError); ok {
// There are at least two ways this can happen:
// 1. The reference is to an object of a custom type that has not yet been
// recognized by gc.restMapper (this is a transient error).
// 2. The reference is to an invalid group/version. We don't currently
// have a way to distinguish this from a valid type we will recognize
// after the next discovery sync.
// For now, record the error and retry.
klog.V(5).Infof("error syncing item %s: %v", n, err)
} else {
utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
}
// retry if garbage collection of an object failed.
// 如果對象的垃圾收集失敗筋粗,則重試。
gc.attemptToDelete.AddRateLimited(item)
} else if !n.isObserved() {
// requeue if item hasn't been observed via an informer event yet.
// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
// see https://issue.k8s.io/56121
klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
gc.attemptToDelete.AddRateLimited(item)
}
return true
}
關鍵方法attemptToDeleteItem:
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
klog.V(2).Infof("processing item %s", item.identity)
// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
// item資源被標記為正在刪除,即deletionTimestamp不為nil;且不是正在刪除從資源(這個從上一節(jié)可以看出,只有item被foreground方式刪除時,deletingDependents才會被設置為true)
// item在刪除中,且為Orphan和Background方式刪除則直接返回
if item.isBeingDeleted() && !item.isDeletingDependents() {
klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
return nil
}
// TODO: It's only necessary to talk to the API server if this is a
// "virtual" node. The local graph could lag behind the real status, but in
// practice, the difference is small.
//根據(jù)item里的信息獲取object對象體
latest, err := gc.getObject(item.identity)
switch {
case errors.IsNotFound(err):
// the GraphBuilder can add "virtual" node for an owner that doesn't
// exist yet, so we need to enqueue a virtual Delete event to remove
// the virtual node from GraphBuilder.uidToNode.
klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
case err != nil:
return err
}
//uid不匹配
if latest.GetUID() != item.identity.UID {
klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
//因為我們手動插入刪除事件以刪除此節(jié)點击胜,我們不需要將其作為虛擬節(jié)點跟蹤并在attemptToDelete中重新排隊
item.markObserved()
return nil
}
// TODO: attemptToOrphanWorker() routine is similar. Consider merging
// attemptToOrphanWorker() into attemptToDeleteItem() as well.
// item的從資源正在刪除中,同時刪除其從資源
if item.isDeletingDependents() {
return gc.processDeletingDependentsItem(item)
}
// compute if we should delete the item
// 獲取該object里metadata.ownerReference
// 計算我們是否應刪除該項目
ownerReferences := latest.GetOwnerReferences()
if len(ownerReferences) == 0 {
//沒有owner的不用處理
klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
return nil
}
//solid(owner存在,owner沒被刪或者終結器不為foregroundDeletion Finalizer); dangling(owner不存在)
// waitingForDependentsDeletion(owner存在,owner的deletionTimestamp為非nil亏狰,并且有foregroundDeletion Finalizer)owner列表
solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
if err != nil {
return err
}
klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)
switch {
//item對象的owner存在,且不是正在刪除
case len(solid) != 0:
klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", solid, item.identity)
if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
return nil
}
klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
// waitingForDependentsDeletion needs to be deleted from the
// ownerReferences, otherwise the referenced objects will be stuck with
// the FinalizerDeletingDependents and never get deleted.
// waitingForDependentsDeletion需要從 ownerReferences中刪除,否則引用的對象將被
// FinalizerDeletingDependents所卡住偶摔,并且永遠不會被刪除。
//需要移除的ownerUids
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
//拼接patch請求參數(shù)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
//發(fā)送patch請求
_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
})
return err
//item對象的owner正在被刪除; 且item有從資源
case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
deps := item.getDependents()
// 遍歷item從資源
for _, dep := range deps {
if dep.isDeletingDependents() {
// this circle detection has false positives, we need to
// apply a more rigorous detection if this turns out to be a
// problem.
// there are multiple workers run attemptToDeleteItem in
// parallel, the circle detection can fail in a race condition.
klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
// 生成一個補丁促脉,該補丁會取消設置item所有ownerReferences的BlockOwnerDeletion字段,避免阻塞item的owner刪除
patch, err := item.unblockOwnerReferencesStrategicMergePatch()
if err != nil {
return err
}
//執(zhí)行patch
if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
return err
}
break
}
}
//item對象的至少一個owner具有foregroundDeletion Finalizer辰斋,并且該對象本身具有依賴項,因此它將在Foreground中刪除
klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
// the deletion event will be observed by the graphBuilder, so the item
// will be processed again in processDeletingDependentsItem. If it
// doesn't have dependents, the function will remove the
// FinalizerDeletingDependents from the item, resulting in the final
// deletion of the item.
// graphBuilder將觀察刪除事件瘸味,因此將在processDeletingDependentsItem中再次處理該項目宫仗。
// 如果沒有依賴項,該函數(shù)將從項中刪除foregroundDeletion Finalizer旁仿,最終刪除item藕夫。
policy := metav1.DeletePropagationForeground
return gc.deleteObject(item.identity, &policy)
default:
// item doesn't have any solid owner, so it needs to be garbage
// collected. Also, none of item's owners is waiting for the deletion of
// the dependents, so set propagationPolicy based on existing finalizers.
// item沒有任何實體所有者,因此需要收集垃圾 枯冈。此外毅贮,項目的所有者都沒有等待刪除
// 依賴項,因此請根據(jù)現(xiàn)有的終結器設置propagationPolicy尘奏。
var policy metav1.DeletionPropagation
switch {
case hasOrphanFinalizer(latest):
// if an existing orphan finalizer is already on the object, honor it.
//如果現(xiàn)有的孤兒終結器已經(jīng)在對象上滩褥,請尊重它。
policy = metav1.DeletePropagationOrphan
case hasDeleteDependentsFinalizer(latest):
// if an existing foreground finalizer is already on the object, honor it.
//如果現(xiàn)有的前景終結器已經(jīng)在對象上炫加,請尊重它瑰煎。
policy = metav1.DeletePropagationForeground
default:
// otherwise, default to background.
//否則,默認為背景俗孝。
policy = metav1.DeletePropagationBackground
}
klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
//刪除孤兒對象
return gc.deleteObject(item.identity, &policy)
}
}
主要做以下事情:
1酒甸、item在刪除中,且為Orphan和Background方式刪除則直接返回赋铝;
2插勤、item是foreground方式刪除時,調用processDeletingDependentsItem去處理阻塞其刪除的從資源,將其放到attemptToDelete隊列饮六;
3其垄、獲取item的owner對象集,調用classifyReferences將owner集合分為3類卤橄,分別為solid(owner存在或者終結器不為foregroundDeletion的owner集合), dangling(已經(jīng)不存在了的owner集群), waitingForDependentsDeletion(owner的deletionTimestamp為非nil绿满,并且為foregroundDeletion終結器的owner集合)
4、switch第一個case:solid集合不為空窟扑,即item存在沒被刪除的owner喇颁。當dangling和waitingForDependentsDeletion都為空,則直接返回嚎货;當dangling或waitingForDependentsDeletion不為空橘霎,合并兩個集合uid,執(zhí)行patch請求殖属,將這些uid對應的ownerReferences從item中刪除
5姐叁、switch第二個case:waitingForDependentsDeletion集合不為空,且item有從資源洗显。即item的owner不存在外潜,或正在被foregroundDeletion方式刪除,如果item的從資源正在刪除依賴項挠唆,則取消阻止item的owner刪除处窥,給item執(zhí)行patch請求,最終采用foregroundDeletion方式刪除item玄组;
6滔驾、switch第三個case:以上條件不符合時,則直接根據(jù)item中的終結器刪除item俄讹,默認為Background方式刪除哆致。
往細了說,processDeletingDependentsItem方法獲取item從資源中BlockOwnerDeletion為true的ownerReferences集合颅悉,如果為空沽瞭,則移除item的foregroundDeletion終結器。否則遍歷剩瓶,將未開始刪除的依賴項的從資源dep加入到嘗試刪除隊列attemptToDelete驹溃。
//等待其依賴項被刪除的進程項
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
//阻塞item資源刪除的從資源列表
blockingDependents := item.blockingDependents()
//沒有阻塞item資源刪除的從資源,則移除item資源的foregroundDeletion終結器
if len(blockingDependents) == 0 {
klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
}
//遍歷阻塞item資源刪除的從資源
for _, dep := range blockingDependents {
// 如果dep的從資源沒有開始刪除,則將dep加入到嘗試刪除隊列中
if !dep.isDeletingDependents() {
klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
//將從資源加入刪除隊列
gc.attemptToDelete.Add(dep)
}
}
return nil
}
gc.classifyReferences(item, ownerReferences)方法:遍歷了item的owner列表,調用isDangling方法將已不存在的owner加入到isDangling列表延曙;owner正在被刪除,且owner有foregroundDeletion終結器的加入到waitingForDependentsDeletion列表豌鹤;owner沒開始刪或者終結器不為foregroundDeletion的加入到solid列表。
// 將latestReferences分為三類:
// solid:所有者存在枝缔,且不是waitingForDependentsDeletion
// dangling懸空:所有者不存在
// waitingForDependentsDeletion: 所有者存在布疙,其deletionTimestamp為非nil蚊惯,并且有FinalizerDeletingDependents
func (gc *GarbageCollector) classifyReferences(item *node, latestReferences []metav1.OwnerReference) (
solid, dangling, waitingForDependentsDeletion []metav1.OwnerReference, err error) {
//遍歷該node的owner
for _, reference := range latestReferences {
//獲取owner是否存在;isDangling為true表示不存在,發(fā)生err則最終將該item加入AddRateLimited attemptToDelete隊列
isDangling, owner, err := gc.isDangling(reference, item)
if err != nil {
return nil, nil, nil, err
}
//將不存在的owner加入dangling切片
if isDangling {
dangling = append(dangling, reference)
continue
}
//owner存在,獲取accessor
ownerAccessor, err := meta.Accessor(owner)
if err != nil {
return nil, nil, nil, err
}
//owner正在被刪除,且owner有foregroundDeletion Finalizer
if ownerAccessor.GetDeletionTimestamp() != nil && hasDeleteDependentsFinalizer(ownerAccessor) {
//owner將等待依賴刪除;收集等待刪除依賴的owner列表
waitingForDependentsDeletion = append(waitingForDependentsDeletion, reference)
} else {
//owner沒被刪或者終結器不為foregroundDeletion Finalizer
solid = append(solid, reference)
}
}
return solid, dangling, waitingForDependentsDeletion, nil
}
gc.isDangling(reference, item)方法:先從absentOwnerCache緩存中根據(jù)owner uid獲取owner是否存在;如果緩存中沒有灵临,則根據(jù)ownerReferences中的參數(shù)截型,構建參數(shù),調用apiserver接口獲取owner對象是否能查到儒溉。查到如果uid不匹配宦焦,加入absentOwnerCache緩存,并返回false顿涣。
// isDangling檢查引用是否指向不存在的對象波闹。 如果isDangling在API服務器上查找引用的對象,它也返回其最新狀態(tài)涛碑。
func (gc *GarbageCollector) isDangling(reference metav1.OwnerReference, item *node) (
dangling bool, owner *unstructured.Unstructured, err error) {
if gc.absentOwnerCache.Has(reference.UID) {
klog.V(5).Infof("according to the absentOwnerCache, object %s's owner %s/%s, %s does not exist", item.identity.UID, reference.APIVersion, reference.Kind, reference.Name)
return true, nil, nil
}
// TODO: we need to verify the reference resource is supported by the
// system. If it's not a valid resource, the garbage collector should i)
// ignore the reference when decide if the object should be deleted, and
// ii) should update the object to remove such references. This is to
// prevent objects having references to an old resource from being
// deleted during a cluster upgrade.
resource, namespaced, err := gc.apiResource(reference.APIVersion, reference.Kind)
if err != nil {
return false, nil, err
}
// TODO: It's only necessary to talk to the API server if the owner node
// is a "virtual" node. The local graph could lag behind the real
// status, but in practice, the difference is small.
owner, err = gc.dynamicClient.Resource(resource).Namespace(resourceDefaultNamespace(namespaced, item.identity.Namespace)).Get(reference.Name, metav1.GetOptions{})
switch {
case errors.IsNotFound(err):
gc.absentOwnerCache.Add(reference.UID)
klog.V(5).Infof("object %s's owner %s/%s, %s is not found", item.identity.UID, reference.APIVersion, reference.Kind, reference.Name)
return true, nil, nil
case err != nil:
return false, nil, err
}
if owner.GetUID() != reference.UID {
klog.V(5).Infof("object %s's owner %s/%s, %s is not found, UID mismatch", item.identity.UID, reference.APIVersion, reference.Kind, reference.Name)
gc.absentOwnerCache.Add(reference.UID)
return true, nil, nil
}
return false, owner, nil
}
attemptToOrphan隊列
來到代碼:
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
for gc.attemptToOrphanWorker() {
}
}
死循環(huán)一直從attemptToOrphan隊列中獲取item資源精堕,調用gc.orphanDependents(owner.identity, dependents)方法,從item從資源中刪掉該item的ownerReferences蒲障,期間如果發(fā)生錯誤歹篓,則通過rateLimited重新加回attemptToOrphan隊列。最后移除item中的orphan終結器揉阎。
// attemptToOrphanWorker將一個節(jié)點從attemptToOrphan中取出滋捶,然后根據(jù)GC維護的圖找到它的依賴項,然后將其從其依賴項的
// OwnerReferences中刪除余黎,最后更新item以刪除孤兒終結器。如果這些步驟中的任何一個失敗载萌,則將節(jié)點添加回attemptToOrphan惧财。
func (gc *GarbageCollector) attemptToOrphanWorker() bool {
item, quit := gc.attemptToOrphan.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToOrphan.Done(item)
owner, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
// we don't need to lock each element, because they never get updated
owner.dependentsLock.RLock()
dependents := make([]*node, 0, len(owner.dependents))
for dependent := range owner.dependents {
dependents = append(dependents, dependent)
}
owner.dependentsLock.RUnlock()
// 處理孤兒
err := gc.orphanDependents(owner.identity, dependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
return true
}
// update the owner, remove "orphaningFinalizer" from its finalizers list
// 移除item的orphan終結器
err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
}
return true
}
gc.orphanDependents(owner.identity, dependents)方法:遍歷item的從資源,并發(fā)的執(zhí)行patch請求扭仁,刪除從資源中和item同uid的ownerReferences垮衷,將error加入到errCh channel中,最后給調用者返回error列表:
// dependents are copies of pointers to the owner's dependents, they don't need to be locked.
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
errCh := make(chan error, len(dependents))
wg := sync.WaitGroup{}
wg.Add(len(dependents))
for i := range dependents {
go func(dependent *node) {
defer wg.Done()
// the dependent.identity.UID is used as precondition
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
})
// note that if the target ownerReference doesn't exist in the
// dependent, strategic merge patch will NOT return an error.
if err != nil && !errors.IsNotFound(err) {
errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
}
}(dependents[i])
}
wg.Wait()
close(errCh)
var errorsSlice []error
for e := range errCh {
errorsSlice = append(errorsSlice, e)
}
if len(errorsSlice) != 0 {
return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
}
klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
return nil
}
deleteOwnerRefStrategicMergePatch方法:拼接patch請求參數(shù)乖坠。該方法同樣的搀突,在處理attemptToDelete死循中,第一個switch case處被調用熊泵。
func deleteOwnerRefStrategicMergePatch(dependentUID types.UID, ownerUIDs ...types.UID) []byte {
var pieces []string
//拼接需要刪除的uid
for _, ownerUID := range ownerUIDs {
pieces = append(pieces, fmt.Sprintf(`{"$patch":"delete","uid":"%s"}`, ownerUID))
}
//拼接patch請求參數(shù)
patch := fmt.Sprintf(`{"metadata":{"ownerReferences":[%s],"uid":"%s"}}`, strings.Join(pieces, ","), dependentUID)
return []byte(patch)
}
回到初衷
中間件redis容器化后仰迁,在測試環(huán)境上部署的redis集群,在kubernetes apiserver重啟后顽分,redis集群被異常刪除(包括redis exporter statefulset徐许、redis statefulset)。
原因定位
在開發(fā)環(huán)境上經(jīng)多次復現(xiàn)卒蘸,apiserver重啟后雌隅,通過查詢redis operator日志,并沒有發(fā)現(xiàn)主動去刪除redis集群(redis statefulset)、監(jiān)控實例(redis exporter)恰起。進一步去查看kube-controller-manager的日志修械,將其日志級別設置--v=5,繼續(xù)復現(xiàn)检盼,最終在kube-controller-manager日志中發(fā)現(xiàn)如下日志:
可以看到肯污,垃圾回收器garbage collector在處理redis exporter statefulset時,發(fā)現(xiàn)其加了ownerReferences梯皿,在exporter所在分區(qū)(monitoring)查詢其owner——redisCluster對象redis-0826仇箱,而redisCluster對象redis-0826存在于kube-system分區(qū),所以在monitoring分區(qū)查詢到的是404 Not Found东羹,garbage collector會將該owner不存在信息(uid)存入緩存absentOwnerCache剂桥。
因redis exporter statefulset的owner不存在,所以gc認為需要回收垃圾属提,故將其刪除掉权逗。同理,當處理redis statefulset時冤议,從緩存中發(fā)現(xiàn)owner不存在斟薇,也會回收垃圾,將其刪除掉恕酸。
經(jīng)過多次復現(xiàn)故障堪滨,發(fā)現(xiàn)重啟kube-controller-manager時有概率復現(xiàn)。(Apiserver的重啟時蕊温,kube-controller-manager在連接apiserver失敗多次后袱箱,也會發(fā)生自重啟),之所以是概率問題义矛,這和garbage collector將資源對象加入attemptToDelete隊列的順序有關:
先同步monitoring分區(qū)的exporter statefulset发笔,后同步kube-system分區(qū)的redis statefulset,就會出現(xiàn)該故障凉翻;反之就不會出現(xiàn)故障了讨,這取決于garbage collector啟動時全量獲取集群內(nèi)資源(listwatch)的順序。
在apiserver和kube-controller-manager正常運行時不出現(xiàn)該故障制轰,可以從garbage collector源碼中看到以下代碼邏輯:
Garbage collector中維護一個父子關系圖表前计,controller-manager啟動時該圖里節(jié)點是不存在的,會走上圖switch的第一個case艇挨,之后圖形成之后残炮,會走第二個case。第二個case里只有在owner發(fā)生變化時才會觸發(fā)將資源對象加入attemptToDelete隊列缩滨,所以在各個組件正常運行時沒有出現(xiàn)該故障势就。
獲取圖表的接口地址泉瞻,IP和端口都是controller-manager的,可以重定向到tmp.dot文件
dot.exe
curl http://127.0.0.1:10252/debug/controllers/garbagecollector/graph
curl http://127.0.0.1:10252/debug/controllers/garbagecollector/graph?uid=11211212edsaddkqedmk12
之后用可視化工具Graphviz軟件苞冯,進入到bin目錄下袖牙,執(zhí)行以下命令生成svg文件,用瀏覽器打開舅锄,Graphviz和dot的使用可以自行谷歌鞭达。
dot -Tsvg -o graph2.svg tmp.dot
解決方法
在redis operator創(chuàng)建redis集群時,將exporter放到和redis同一分區(qū)皇忿。
思考反思
1畴蹭、出現(xiàn)該故障,主要是因進行了跨命名空間owner引用鳍烁。在使用垃圾回收機制時叨襟,應該盡量參考kubernetes官方網(wǎng)站中的說明.
如下,官網(wǎng)中說明了owner引用在設計時就不允許跨namespace使用幔荒,這意味著:
1)命名空間范圍的從屬只能指定同一命名空間中的所有者糊闽,以及群集范圍的所有者。
2)群集作用域的從屬只能指定群集作用域的所有者爹梁,而不能指定命名空間作用域的所有者右犹。
參考文檔
垃圾回收官方文檔:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/
詳解 Kubernetes 垃圾收集器的實現(xiàn)原理:
https://draveness.me/kubernetes-garbage-collector#
本公眾號免費提供csdn下載服務,海量IT學習資源姚垃,如果你準備入IT坑念链,勵志成為優(yōu)秀的程序猿,那么這些資源很適合你积糯,包括但不限于java钓账、go、python絮宁、springcloud、elk服协、嵌入式 绍昂、大數(shù)據(jù)、面試資料偿荷、前端 等資源窘游。同時我們組建了一個技術交流群,里面有很多大佬跳纳,會不定時分享技術文章忍饰,如果你想來一起學習提高,可以公眾號后臺回復【2】寺庄,免費邀請加技術交流群互相學習提高艾蓝,會不定期分享編程IT相關資源力崇。
掃碼關注,精彩內(nèi)容第一時間推給你