Kubernetes Cluster Autoscaler

當(dāng)我們使用 Kubernetes 部署應(yīng)用后霜幼，會(huì)發(fā)現(xiàn)如果用戶增長(zhǎng)速度超過(guò)預(yù)期季春，以至于計(jì)算資源不夠時(shí)馒胆，你會(huì)怎么做呢徙菠？Kubernetes 給出的解決方案就是：自動(dòng)伸縮（auto-scaling）讯沈，通過(guò)自動(dòng)伸縮組件之間的配合，可以 7*24 小時(shí)的監(jiān)控著你的集群婿奔，動(dòng)態(tài)變化負(fù)載缺狠，以適應(yīng)你的用戶需求。

自動(dòng)伸縮組件

水平自動(dòng)伸縮（Horizontal Pod Autoscaler萍摊，HPA）

HPA 可以基于實(shí)時(shí)的 CPU 利用率自動(dòng)伸縮 Replication Controller挤茄、Deployment 和 Replica Set 中的 Pod 數(shù)量。也可以通過(guò)搭配 Metrics Server 基于其他的度量指標(biāo)冰木。

垂直自動(dòng)伸縮（Vertical Pod Autoscaler穷劈，VPA）

VPA 可以基于 Pod 的使用資源來(lái)自動(dòng)設(shè)置 Pod 所需資源并且能夠在運(yùn)行時(shí)自動(dòng)調(diào)整資源。

集群自動(dòng)伸縮（Cluster Autoscaler踊沸，CA）

CA 是一個(gè)可以自動(dòng)伸縮集群 Node 的組件歇终。如果集群中有未被調(diào)度的 Pod，它將會(huì)自動(dòng)擴(kuò)展 Node 來(lái)使 Pod 可用逼龟，或是在發(fā)現(xiàn)集群中的 Node 資源使用率過(guò)低時(shí)评凝，刪除 Node 來(lái)節(jié)約資源。

插件伸縮（Addon Resizer）

這是一個(gè)小插件腺律，它以 Sidecar 的形式來(lái)垂直伸縮與自己同一個(gè)部署中的另一個(gè)容器奕短，目前唯一的策略就是根據(jù)集群中節(jié)點(diǎn)的數(shù)量來(lái)進(jìn)行線性擴(kuò)展。通常與 Metrics Server 配合使用匀钧，以保證其可以負(fù)擔(dān)不斷擴(kuò)大的整個(gè)集群的 metrics API 服務(wù)篡诽。

通過(guò) HPA 伸縮無(wú)狀態(tài)應(yīng)用，VPA 伸縮有狀態(tài)應(yīng)用榴捡，CA 保證計(jì)算資源，它們的配合使用朱浴，構(gòu)成了一個(gè)完整的自動(dòng)伸縮解決方案吊圾。

Cluster Autoscaler 詳細(xì)介紹

上面介紹的四個(gè)組件中达椰，HPA 是在 kubernetes 代碼倉(cāng)庫(kù)中的，隨著 kubernetes 的版本進(jìn)行更新發(fā)布项乒，不需要部署啰劲，可以直接使用。其他的三個(gè)組件都在官方社區(qū)維護(hù)的倉(cāng)庫(kù)中檀何，Cluster Autoscaler 的 v1.0(GA) 版本已經(jīng)隨著 kubernetes 1.8 一起發(fā)布蝇裤，剩下兩個(gè)則還是 beta 版本。

部署

Cluster Autoscaler 通常需要搭配云廠商使用频鉴，它提供了 Cloud Provider 接口供各個(gè)云廠商接入栓辜，云廠商通過(guò)伸縮組（Scaling Group）或節(jié)點(diǎn)池（Node Pool）的功能對(duì) ECS 類產(chǎn)品節(jié)點(diǎn)進(jìn)行增加刪除等操作。

目前（v1.18.1）已接入的云廠商：

Alicloud：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/alicloud/README.md

Aws：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md

Azure：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md

Baiducloud：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/baiducloud/README.md

Digitalocean：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/digitalocean/README.md

GoogleCloud GCE：https://kubernetes.io/docs/tasks/administer-cluster/cluster-management/#upgrading-google-compute-engine-clusters

GoogleCloud GKE：https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler

OpenStack Magnum：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/magnum/README.md

Packet：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/packet/README.md

啟動(dòng)參數(shù)：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca

工作原理

Cluster Autoscaler 抽象出了一個(gè) NodeGroup 的概念垛孔，與之對(duì)應(yīng)的是云廠商的伸縮組服務(wù)藕甩。Cluster Autoscaler 通過(guò) CloudProvider 提供的 NodeGroup 計(jì)算集群內(nèi)節(jié)點(diǎn)資源，以此來(lái)進(jìn)行伸縮周荐。

在啟動(dòng)后狭莱，Cluster Autoscaler 會(huì)定期（默認(rèn) 10s）檢查未調(diào)度的 Pod 和 Node 的資源使用情況，并進(jìn)行相應(yīng)的 Scale UP 和 Scale Down 操作概作。

Scale UP

當(dāng) Cluster Autoscaler 發(fā)現(xiàn)有 Pod 由于資源不足而無(wú)法調(diào)度時(shí)腋妙，就會(huì)通過(guò)調(diào)用 Scale UP 執(zhí)行擴(kuò)容操作。

在 Scale UP 中會(huì)只會(huì)計(jì)算在 NodeGroup 中存在的 Node讯榕，我們可以將 Worker Node 統(tǒng)一交由伸縮組進(jìn)行管理骤素。并且由于伸縮組非同步加入的特性，也會(huì)考慮到 Upcoming Node瘩扼。

為了業(yè)務(wù)需要谆甜，集群中可能會(huì)有不同規(guī)格的 Node，我們可以創(chuàng)建多個(gè) NodeGroup集绰，在擴(kuò)容時(shí)會(huì)根據(jù) --expander 選項(xiàng)配置指定的策略规辱，選擇一個(gè)擴(kuò)容的節(jié)點(diǎn)組，支持如下五種策略：

random：隨機(jī)選擇一個(gè) NodeGroup栽燕。如果未指定罕袋，則默認(rèn)為此策略。
most-pods：選擇能夠調(diào)度最多 Pod 的 NodeGroup碍岔，比如有的 Pod 未調(diào)度是因?yàn)?nodeSelector浴讯，此策略會(huì)優(yōu)先選擇能滿足的 NodeGroup 來(lái)保證大多數(shù)的 Pod 可以被調(diào)度。
least-waste：為避免浪費(fèi)蔼啦，此策略會(huì)優(yōu)先選擇能滿足 Pod 需求資源的最小資源類型的 NodeGroup榆纽。
price：根據(jù) CloudProvider 提供的價(jià)格模型，選擇最省錢的 NodeGroup。
priority：通過(guò)配置優(yōu)先級(jí)來(lái)進(jìn)行選擇奈籽，用起來(lái)比較麻煩饥侵，需要額外的配置，可以看文檔衣屏。

如果有需要卤恳，也可以平衡相似 NodeGroup 中的 Node 數(shù)量搪哪，避免 NodeGroup 達(dá)到 MaxSize 而導(dǎo)致無(wú)法加入新 Node膳殷。通過(guò) --balance-similar-node-groups 選項(xiàng)配置别智，默認(rèn)為 false。

再經(jīng)過(guò)一系列的操作后钻弄，最終計(jì)算出要擴(kuò)容的 Node 數(shù)量及 NodeGroup佃却，使用 CloudProvider 執(zhí)行 IncreaseSize 操作，增加云廠商的伸縮組大小斧蜕，從而完成擴(kuò)容操作双霍。

文字表達(dá)能力不足，如果有不清晰的地方批销，可以參考下面的 ScaleUP 源碼解析洒闸。

Scale Down

縮容是一個(gè)可選的功能，通過(guò) --scale-down-enabled 選項(xiàng)配置均芽，默認(rèn)為 true丘逸。

在 Cluster Autoscaler 監(jiān)控 Node 資源時(shí)，如果發(fā)現(xiàn)有 Node 滿足以下三個(gè)條件時(shí)掀宋，就會(huì)標(biāo)記這個(gè) Node 為 unneeded：

Node 上運(yùn)行的所有的 Pod 的 Cpu 和內(nèi)存之和小于該 Node 可分配容量的 50%深纲。可通過(guò) --scale-down-utilization-threshold 選項(xiàng)改變這個(gè)配置劲妙。
Node 上所有的 Pod 都可以被調(diào)度到其他節(jié)點(diǎn)湃鹊。
Node 沒(méi)有表示不可縮容的 annotaition。

如果一個(gè) Node 被標(biāo)記為 unneeded 超過(guò) 10 分鐘（可通過(guò) --scale-down-unneeded-time 選項(xiàng)配置）镣奋，則使用 CloudProvider 執(zhí)行 DeleteNodes 操作將其刪除币呵。一次最多刪除一個(gè) unneeded Node，但空 Node 可以批量刪除侨颈，每次最多刪除 10 個(gè)（通過(guò) ----max-empty-bulk-delete 選項(xiàng)配置）余赢。

實(shí)際上并不是只有這一個(gè)判定條件，還會(huì)有其他的條件來(lái)阻止刪除這個(gè) Node哈垢，比如 NodeGroup 已達(dá)到 MinSize妻柒，或在過(guò)去的 10 分鐘內(nèi)有過(guò)一次 Scale UP 操作（通過(guò) --scale-down-delay-after-add 選項(xiàng)配置）等等，更詳細(xì)可查看文檔耘分。

Cluster Autoscaler 的工作機(jī)制很復(fù)雜举塔，但其中大部分都能通過(guò) flags 進(jìn)行配置绑警，如果有需要，請(qǐng)?jiān)敿?xì)閱讀文檔：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md

如何實(shí)現(xiàn) CloudProvider

如果使用上述中已實(shí)現(xiàn)接入的云廠商啤贩，只需要通過(guò) --cloud-provider 選項(xiàng)指定來(lái)自哪個(gè)云廠商就可以待秃，如果想要對(duì)接自己的 IaaS 或有特定的業(yè)務(wù)邏輯，就需要自己實(shí)現(xiàn) CloudProvider Interface 與 NodeGroupInterface痹屹。并將其注冊(cè)到 builder 中，用于通過(guò) --cloud-provider 參數(shù)指定枉氮。

builder 在 cloudprovider/builder 中的 builder_all.go 中注冊(cè)志衍，也可以在其中新建一個(gè)自己的 build，通過(guò) go 文件的 +build 編譯參數(shù)來(lái)指定使用的 CloudProvider聊替。

CloudProvider 接口與 NodeGroup 接口在 cloud_provider.go 中定義楼肪，其中需要注意的是 Refresh 方法，它會(huì)在每一次循環(huán)（默認(rèn) 10 秒）的開(kāi)始時(shí)調(diào)用惹悄，可在此時(shí)請(qǐng)求接口并刷新 NodeGroup 狀態(tài)春叫，通常的做法是增加一個(gè) manager 用于管理狀態(tài)。有不理解的部分可參考其他 CloudProvider 的實(shí)現(xiàn)泣港。


type CloudProvider interface {
    // Name returns name of the cloud provider.
    Name() string

    // NodeGroups returns all node groups configured for this cloud provider.
    // 會(huì)在一此循環(huán)中多次調(diào)用此方法暂殖，所以不適合每次都請(qǐng)求云廠商服務(wù)，可以在 Refresh 時(shí)存儲(chǔ)狀態(tài)
    NodeGroups() []NodeGroup

    // NodeGroupForNode returns the node group for the given node, nil if the node
    // should not be processed by cluster autoscaler, or non-nil error if such
    // occurred. Must be implemented.
    // 同上
    NodeGroupForNode(*apiv1.Node) (NodeGroup, error)

    // Pricing returns pricing model for this cloud provider or error if not available.
    // Implementation optional.
    // 如果不使用 price expander 就可以不實(shí)現(xiàn)此方法
    Pricing() (PricingModel, errors.AutoscalerError)

    // GetAvailableMachineTypes get all machine types that can be requested from the cloud provider.
    // Implementation optional.
    // 沒(méi)用当纱，不需要實(shí)現(xiàn)
    GetAvailableMachineTypes() ([]string, error)

    // NewNodeGroup builds a theoretical node group based on the node definition provided. The node group is not automatically
    // created on the cloud provider side. The node group is not returned by NodeGroups() until it is created.
    // Implementation optional.
    // 通常情況下呛每，不需要實(shí)現(xiàn)此方法，但如果你需要 ClusterAutoscaler 創(chuàng)建一個(gè)默認(rèn)的 NodeGroup 的話坡氯，也可以實(shí)現(xiàn)晨横。
    // 但其實(shí)更好的做法是將默認(rèn) NodeGroup 寫(xiě)入云端的伸縮組
    NewNodeGroup(machineType string, labels map[string]string, systemLabels map[string]string,
        taints []apiv1.Taint, extraResources map[string]resource.Quantity) (NodeGroup, error)

    // GetResourceLimiter returns struct containing limits (max, min) for resources (cores, memory etc.).
    // 資源限制對(duì)象，會(huì)在 build 時(shí)傳入箫柳，通常情況下不需要更改手形，除非在云端有顯示的提示用戶更改的地方，否則使用時(shí)會(huì)迷惑用戶
    GetResourceLimiter() (*ResourceLimiter, error)

    // GPULabel returns the label added to nodes with GPU resource.
    // GPU 相關(guān)悯恍，如果集群中有使用 GPU 資源库糠，需要返回對(duì)應(yīng)內(nèi)容。 hack: we assume anything which is not cpu/memory to be a gpu.
    GPULabel() string

    // GetAvailableGPUTypes return all available GPU types cloud provider supports.
    // 同上
    GetAvailableGPUTypes() map[string]struct{}

    // Cleanup cleans up open resources before the cloud provider is destroyed, i.e. go routines etc.
    // CloudProvider 只會(huì)在啟動(dòng)時(shí)被初始化一次坪稽，如果每次循環(huán)后有需要清除的內(nèi)容曼玩，在這里處理
    Cleanup() error

    // Refresh is called before every main loop and can be used to dynamically update cloud provider state.
    // In particular the list of node groups returned by NodeGroups can change as a result of CloudProvider.Refresh().
    // 會(huì)在 StaticAutoscaler RunOnce 中被調(diào)用
    Refresh() error
}
// NodeGroup contains configuration info and functions to control a set
// of nodes that have the same capacity and set of labels.
type NodeGroup interface {
    // MaxSize returns maximum size of the node group.
    MaxSize() int

    // MinSize returns minimum size of the node group.
    MinSize() int

    // TargetSize returns the current target size of the node group. It is possible that the
    // number of nodes in Kubernetes is different at the moment but should be equal
    // to Size() once everything stabilizes (new nodes finish startup and registration or
    // removed nodes are deleted completely). Implementation required.
    // 響應(yīng)的是伸縮組的節(jié)點(diǎn)數(shù)，并不一定與 kubernetes 中的節(jié)點(diǎn)數(shù)保持一致
    TargetSize() (int, error)

    // IncreaseSize increases the size of the node group. To delete a node you need
    // to explicitly name it and use DeleteNode. This function should wait until
    // node group size is updated. Implementation required.
    // 擴(kuò)容的方法窒百，增加伸縮組的節(jié)點(diǎn)數(shù)
    IncreaseSize(delta int) error

    // DeleteNodes deletes nodes from this node group. Error is returned either on
    // failure or if the given node doesn't belong to this node group. This function
    // should wait until node group size is updated. Implementation required.
    // 刪除的節(jié)點(diǎn)一定要在該節(jié)點(diǎn)組中
    DeleteNodes([]*apiv1.Node) error

    // DecreaseTargetSize decreases the target size of the node group. This function
    // doesn't permit to delete any existing node and can be used only to reduce the
    // request for new nodes that have not been yet fulfilled. Delta should be negative.
    // It is assumed that cloud provider will not delete the existing nodes when there
    // is an option to just decrease the target. Implementation required.
    // 當(dāng) ClusterAutoscaler 發(fā)現(xiàn) kubernetes 節(jié)點(diǎn)數(shù)與伸縮組的節(jié)點(diǎn)數(shù)長(zhǎng)時(shí)間不一致黍判，會(huì)調(diào)用此方法來(lái)調(diào)整
    DecreaseTargetSize(delta int) error

    // Id returns an unique identifier of the node group.
    Id() string

    // Debug returns a string containing all information regarding this node group.
    Debug() string

    // Nodes returns a list of all nodes that belong to this node group.
    // It is required that Instance objects returned by this method have Id field set.
    // Other fields are optional.
    // This list should include also instances that might have not become a kubernetes node yet.
    // 返回伸縮組中的所有節(jié)點(diǎn)，哪怕它還沒(méi)有成為 kubernetes 的節(jié)點(diǎn)
    Nodes() ([]Instance, error)

    // TemplateNodeInfo returns a schedulernodeinfo.NodeInfo structure of an empty
    // (as if just started) node. This will be used in scale-up simulations to
    // predict what would a new node look like if a node group was expanded. The returned
    // NodeInfo is expected to have a fully populated Node object, with all of the labels,
    // capacity and allocatable information as well as all pods that are started on
    // the node by default, using manifest (most likely only kube-proxy). Implementation optional.
    // ClusterAutoscaler 會(huì)將節(jié)點(diǎn)信息與節(jié)點(diǎn)組對(duì)應(yīng)篙梢，來(lái)判斷資源條件顷帖，如果是一個(gè)空的節(jié)點(diǎn)組，那么就會(huì)通過(guò)此方法來(lái)虛擬一個(gè)節(jié)點(diǎn)信息。
    TemplateNodeInfo() (*schedulernodeinfo.NodeInfo, error)

    // Exist checks if the node group really exists on the cloud provider side. Allows to tell the
    // theoretical node group from the real one. Implementation required.
    Exist() bool

    // Create creates the node group on the cloud provider side. Implementation optional.
    // 與 CloudProvider.NewNodeGroup 配合使用
    Create() (NodeGroup, error)

    // Delete deletes the node group on the cloud provider side.
    // This will be executed only for autoprovisioned node groups, once their size drops to 0.
    // Implementation optional.
    Delete() error

    // Autoprovisioned returns true if the node group is autoprovisioned. An autoprovisioned group
    // was created by CA and can be deleted when scaled to 0.
    Autoprovisioned() bool
}

ScaleUP 源碼解析


func ScaleUp(context *context.AutoscalingContext, processors *ca_processors.AutoscalingProcessors, clusterStateRegistry *clusterstate.ClusterStateRegistry, unschedulablePods []*apiv1.Pod, nodes []*apiv1.Node, daemonSets []*appsv1.DaemonSet, nodeInfos map[string]*schedulernodeinfo.NodeInfo, ignoredTaints taints.TaintKeySet) (*status.ScaleUpStatus, errors.AutoscalerError) {
    
    ......
    // 驗(yàn)證當(dāng)前集群中所有 ready node 是否來(lái)自于 nodeGroups贬墩，取得所有非組內(nèi)的 node
    nodesFromNotAutoscaledGroups, err := utils.FilterOutNodesFromNotAutoscaledGroups(nodes, context.CloudProvider)
    if err != nil {
        return &status.ScaleUpStatus{Result: status.ScaleUpError}, err.AddPrefix("failed to filter out nodes which are from not autoscaled groups: ")
    }

    nodeGroups := context.CloudProvider.NodeGroups()
    gpuLabel := context.CloudProvider.GPULabel()
    availableGPUTypes := context.CloudProvider.GetAvailableGPUTypes()

    // 資源限制對(duì)象榴嗅，會(huì)在 build cloud provider 時(shí)傳入
    // 如果有需要可在 CloudProvider 中自行更改，但不建議改動(dòng)陶舞，會(huì)對(duì)用戶造成迷惑
    resourceLimiter, errCP := context.CloudProvider.GetResourceLimiter()
    if errCP != nil {
        return &status.ScaleUpStatus{Result: status.ScaleUpError}, errors.ToAutoscalerError(
            errors.CloudProviderError,
            errCP)
    }

    // 計(jì)算資源限制
    // nodeInfos 是所有擁有節(jié)點(diǎn)組的節(jié)點(diǎn)與示例節(jié)點(diǎn)的映射
    // 示例節(jié)點(diǎn)會(huì)優(yōu)先考慮真實(shí)節(jié)點(diǎn)的數(shù)據(jù)嗽测，如果 NodeGroup 中還沒(méi)有真實(shí)節(jié)點(diǎn)的部署，則使用 Template 的節(jié)點(diǎn)數(shù)據(jù)
    scaleUpResourcesLeft, errLimits := computeScaleUpResourcesLeftLimits(context.CloudProvider, nodeGroups, nodeInfos, nodesFromNotAutoscaledGroups, resourceLimiter)
    if errLimits != nil {
        return &status.ScaleUpStatus{Result: status.ScaleUpError}, errLimits.AddPrefix("Could not compute total resources: ")
    }

    // 根據(jù)當(dāng)前節(jié)點(diǎn)與 NodeGroups 中的節(jié)點(diǎn)來(lái)計(jì)算會(huì)有多少節(jié)點(diǎn)即將加入集群中
    // 由于云服務(wù)商的伸縮組 increase size 操作并不是同步加入 node肿孵，所以將其統(tǒng)計(jì)唠粥，以便于后面計(jì)算節(jié)點(diǎn)資源
    upcomingNodes := make([]*schedulernodeinfo.NodeInfo, 0)
    for nodeGroup, numberOfNodes := range clusterStateRegistry.GetUpcomingNodes() {
        ......
    }
    klog.V(4).Infof("Upcoming %d nodes", len(upcomingNodes))

    // 最終會(huì)進(jìn)入選擇的節(jié)點(diǎn)組
    expansionOptions := make(map[string]expander.Option, 0)
    ......
    // 出于某些限制或錯(cuò)誤導(dǎo)致不能加入新節(jié)點(diǎn)的節(jié)點(diǎn)組，例如節(jié)點(diǎn)組已達(dá)到 MaxSize
    skippedNodeGroups := map[string]status.Reasons{}
    // 綜合各種情況停做，篩選出節(jié)點(diǎn)組
    for _, nodeGroup := range nodeGroups {
    ......
    }
    if len(expansionOptions) == 0 {
        klog.V(1).Info("No expansion options")
        return &status.ScaleUpStatus{
            Result:                 status.ScaleUpNoOptionsAvailable,
            PodsRemainUnschedulable: getRemainingPods(podEquivalenceGroups, skippedNodeGroups),
            ConsideredNodeGroups:   nodeGroups,
        }, nil
    }

    ......
    // 選擇一個(gè)最佳的節(jié)點(diǎn)組進(jìn)行擴(kuò)容晤愧，expander 用于選擇一個(gè)合適的節(jié)點(diǎn)組進(jìn)行擴(kuò)容，默認(rèn)為 RandomExpander蛉腌，flag: expander
    // random 隨機(jī)選一個(gè)官份，適合只有一個(gè)節(jié)點(diǎn)組
    // most-pods 選擇能夠調(diào)度最多 pod 的節(jié)點(diǎn)組，比如有 noSchedulerPods 是有 nodeSelector 的烙丛，它會(huì)優(yōu)先選擇此類節(jié)點(diǎn)組以滿足大多數(shù) pod 的需求
    // least-waste 優(yōu)先選擇能滿足 pod 需求資源的最小資源類型的節(jié)點(diǎn)組
    // price 根據(jù)價(jià)格模型舅巷，選擇最省錢的
    // priority 根據(jù)優(yōu)先級(jí)選擇
    bestOption := context.ExpanderStrategy.BestOption(options, nodeInfos)
    if bestOption != nil && bestOption.NodeCount > 0 {
    ......
        newNodes := bestOption.NodeCount

        // 考慮到 upcomingNodes, 重新計(jì)算本次新加入節(jié)點(diǎn)
        if context.MaxNodesTotal > 0 && len(nodes)+newNodes+len(upcomingNodes) > context.MaxNodesTotal {
            klog.V(1).Infof("Capping size to max cluster total size (%d)", context.MaxNodesTotal)
            newNodes = context.MaxNodesTotal - len(nodes) - len(upcomingNodes)
            if newNodes < 1 {
                return &status.ScaleUpStatus{Result: status.ScaleUpError}, errors.NewAutoscalerError(
                    errors.TransientError,
                    "max node total count already reached")
            }
        }

        createNodeGroupResults := make([]nodegroups.CreateNodeGroupResult, 0)
    
        // 如果節(jié)點(diǎn)組在云服務(wù)商端處不存在，會(huì)嘗試創(chuàng)建根據(jù)現(xiàn)有信息重新創(chuàng)建一個(gè)云端節(jié)點(diǎn)組
        // 但是目前所有的 CloudProvider 實(shí)現(xiàn)都沒(méi)有允許這種操作蜀变，這好像是個(gè)多余的方法
        // 云服務(wù)商不想悄谐，也不應(yīng)該將云端節(jié)點(diǎn)組的創(chuàng)建權(quán)限交給 ClusterAutoscaler
        if !bestOption.NodeGroup.Exist() {
            oldId := bestOption.NodeGroup.Id()
            createNodeGroupResult, err := processors.NodeGroupManager.CreateNodeGroup(context, bestOption.NodeGroup)
        ......
        }

        // 得到最佳節(jié)點(diǎn)組的示例節(jié)點(diǎn)
        nodeInfo, found := nodeInfos[bestOption.NodeGroup.Id()]
        if !found {
            // This should never happen, as we already should have retrieved
            // nodeInfo for any considered nodegroup.
            klog.Errorf("No node info for: %s", bestOption.NodeGroup.Id())
            return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, errors.NewAutoscalerError(
                errors.CloudProviderError,
                "No node info for best expansion option!")
        }

        // 根據(jù) CPU、Memory及可能存在的 GPU 資源（hack: we assume anything which is not cpu/memory to be a gpu.）库北，計(jì)算出需要多少個(gè) Nodes
        newNodes, err = applyScaleUpResourcesLimits(context.CloudProvider, newNodes, scaleUpResourcesLeft, nodeInfo, bestOption.NodeGroup, resourceLimiter)
        if err != nil {
            return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, err
        }

        // 需要平衡的節(jié)點(diǎn)組
        targetNodeGroups := []cloudprovider.NodeGroup{bestOption.NodeGroup}
        // 如果需要平衡節(jié)點(diǎn)組爬舰，根據(jù) balance-similar-node-groups flag 設(shè)置。
        // 檢測(cè)相似的節(jié)點(diǎn)組寒瓦，并平衡它們之間的節(jié)點(diǎn)數(shù)量
        if context.BalanceSimilarNodeGroups {
        ......
        }
        // 具體平衡策略可以看 (b *BalancingNodeGroupSetProcessor) BalanceScaleUpBetweenGroups 方法
        scaleUpInfos, typedErr := processors.NodeGroupSetProcessor.BalanceScaleUpBetweenGroups(context, targetNodeGroups, newNodes)
        if typedErr != nil {
            return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, typedErr
        }
        klog.V(1).Infof("Final scale-up plan: %v", scaleUpInfos)
        // 開(kāi)始擴(kuò)容情屹，通過(guò) IncreaseSize 擴(kuò)容
        for _, info := range scaleUpInfos {
            typedErr := executeScaleUp(context, clusterStateRegistry, info, gpu.GetGpuTypeForMetrics(gpuLabel, availableGPUTypes, nodeInfo.Node(), nil), now)
            if typedErr != nil {
                return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, typedErr
            }
        }
        ......
    }
    ......
}

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市杂腰，隨后出現(xiàn)的幾起案子垃你，更是在濱河造成了極大的恐慌，老刑警劉巖喂很，帶你破解...
沈念sama閱讀 222,681評(píng)論 6贊 517
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件惜颇，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡少辣，警方通過(guò)查閱死者的電腦和手機(jī)凌摄，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 95,205評(píng)論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)漓帅，“玉大人锨亏，你說(shuō)我怎么就攤上這事痴怨。” “怎么了器予？”我有些...
開(kāi)封第一講書(shū)人閱讀 169,421評(píng)論 0贊 362
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵浪藻，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我乾翔，道長(zhǎng)爱葵，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書(shū)人閱讀 60,114評(píng)論 1贊 300
?港島之戀（遺憾婚禮）
正文為了忘掉前任反浓，我火速辦了婚禮钧惧，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘勾习。我一直安慰自己，他們只是感情好懈玻，可當(dāng)我...
茶點(diǎn)故事閱讀 69,116評(píng)論 6贊 398
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布巧婶。她就那樣靜靜地躺著，像睡著了一般涂乌。火紅的嫁衣襯著肌膚如雪艺栈。梳的紋絲不亂的頭發(fā)上，一...
開(kāi)封第一講書(shū)人閱讀 52,713評(píng)論 1贊 312
城市分裂傳說(shuō)
那天湾盒，我揣著相機(jī)與錄音湿右，去河邊找鬼。笑死罚勾，一個(gè)胖子當(dāng)著我的面吹牛毅人，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播尖殃，決...
沈念sama閱讀 41,170評(píng)論 3贊 422
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼丈莺，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了送丰？” 一聲冷哼從身側(cè)響起缔俄，我...
開(kāi)封第一講書(shū)人閱讀 40,116評(píng)論 0贊 277
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎器躏，沒(méi)想到半個(gè)月后俐载，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 46,651評(píng)論 1贊 320
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡登失，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 38,714評(píng)論 3贊 342
?白月光啟示錄
正文我和宋清朗相戀三年遏佣，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片壁畸。...
茶點(diǎn)故事閱讀 40,865評(píng)論 1贊 353
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡贼急，死狀恐怖茅茂，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情太抓，我是刑警寧澤空闲，帶...
沈念sama閱讀 36,527評(píng)論 5贊 351
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站走敌，受9級(jí)特大地震影響碴倾，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜掉丽，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 42,211評(píng)論 3贊 336
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一跌榔、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧捶障，春花似錦僧须、人聲如沸。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 32,699評(píng)論 0贊 25
一樁弒父案担平，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至锭部，卻和暖如春暂论，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背拌禾。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 33,814評(píng)論 1贊 274
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工取胎，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人湃窍。一個(gè)月前我還...
沈念sama閱讀 49,299評(píng)論 3贊 379
代替公主和親
正文我出身青樓闻蛀，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親坝咐。傳聞我的和親對(duì)象是個(gè)殘疾皇子循榆，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,870評(píng)論 2贊 361