Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.
控制應用實例责嚷,個人租戶或整個服務的實例使用的資源消耗蚪腐。 這可以使系統(tǒng)能夠繼續(xù)運行并達到服務級別的協議,即使是在需求增加對資源造成極大的負擔時项秉。
Context and problem
The load on a cloud application typically varies over time based on the number of active users or the types of activities they are performing. For example, more users are likely to be active during business hours, or the system might be required to perform computationally expensive analytics at the end of each month. There might also be sudden and unanticipated bursts in activity. If the processing requirements of the system exceed the capacity of the resources that are available, it'll suffer from poor performance and can even fail. If the system has to meet an agreed level of service, such failure could be unacceptable.
云應用程序的負載通常隨時間而變化界斜,這取決于活躍用戶的數量或其正在執(zhí)行的動作類型屁奏。例如,更多用戶可能在上班時間內處于活動狀態(tài)色徘,或者可能需要在每個月底執(zhí)行昂貴的分析計算恭金。活動中也可能出現突發(fā)和意料之外的爆發(fā)褂策。如果系統(tǒng)的處理要求超過可用資源的容量横腿,那么它的性能會降低甚至會失敗。如果系統(tǒng)必須達到商定的服務水平辙培,這種失敗可能是不可接受的蔑水。
There're many strategies available for handling varying load in the cloud, depending on the business goals for the application. One strategy is to use autoscaling to match the provisioned resources to the user needs at any given time. This has the potential to consistently meet user demand, while optimizing running costs. However, while autoscaling can trigger the provisioning of additional resources, this provisioning isn't immediate. If demand grows quickly, there can be a window of time where there's a resource deficit.
根據應用程序的業(yè)務目標,有很多策略可用于處理云中的不同負載扬蕊。一種策略是在任何給定的時間使用彈性伸縮將供應用戶所需的資源搀别。這有可能始終滿足用戶需求,同時優(yōu)化運行成本尾抑。然而歇父,雖然彈性伸縮可以觸發(fā)附加資源的配置,但這種配置不是即時的再愈。如果需求快速增長榜苫,可能會出現資源短缺的時間窗口。
Solution
An alternative strategy to autoscaling is to allow applications to use resources only up to a limit, and then throttle them when this limit is reached. The system should monitor how it's using resources so that, when usage exceeds the threshold, it can throttle requests from one or more users. This will enable the system to continue functioning and meet any service level agreements (SLAs) that are in place. For more information on monitoring resource usage, see the Instrumentation and Telemetry Guidance.
彈性伸縮的一種替代策略是允許應用程序使用有限的資源翎冲,然后在達到此限制時進行限流垂睬。 系統(tǒng)應該監(jiān)控資源的使用情況,以便當使用率超過閾值時抗悍,可以抑制來自一個或多個用戶的請求驹饺。 這將使系統(tǒng)能夠繼續(xù)運行并滿足已經制定的任何服務級別協議(SLA)。 有關監(jiān)控資源使用情況的更多信息缴渊,請參閱“儀器與遙測指導”赏壹。
The system could implement several throttling strategies, including:
- Rejecting requests from an individual user who's already accessed system APIs more than n times per second over a given period of time. This requires the system to meter the use of resources for each tenant or user running an application. For more information, see the Service Metering Guidance.
- Disabling or degrading the functionality of selected nonessential services so that essential services can run unimpeded with sufficient resources. For example, if the application is streaming video output, it could switch to a lower resolution.
- Using load leveling to smooth the volume of activity (this approach is covered in more detail by the Queue-based Load Leveling pattern
). In a multi-tenant environment, this approach will reduce the performance for every tenant. If the system must support a mix of tenants with different SLAs, the work for high-value tenants might be performed immediately. Requests for other tenants can be held back, and handled when the backlog has eased. The Priority Queue pattern
could be used to help implement this approach. - Deferring operations being performed on behalf of lower priority applications or tenants. These operations can be suspended or limited, with an exception generated to inform the tenant that the system is busy and that the operation should be retried later.
該系統(tǒng)可以實施幾個節(jié)流策略,其中包括:
- 拒絕來自已經在給定時間段內每秒超過n次訪問系統(tǒng)API的個人用戶衔沼。這要求系統(tǒng)計算每個租戶或運行應用程序的用戶的資源使用情況蝌借。有關詳細信息昔瞧,請參閱“服務計量指導”。
- 禁用或降低所選非必需服務的功能菩佑,使基本服務可以不受阻礙地運行足夠的資源自晰。例如,如果應用程序是流視頻輸出擎鸠,則可以切換到較低的分辨率缀磕。
- 使用負載均衡來平滑活躍數量(此方法通過基于隊列的負載均衡模式更詳細地介紹)。在多租戶環(huán)境中劣光,這種方法將降低每個租戶的性能袜蚕。如果系統(tǒng)必須支持具有不同SLA的混合租戶,高價值租戶的工作可能會立即執(zhí)行绢涡。對其他租戶的要求可以被阻止牲剃,當積壓已經緩解時處理。優(yōu)先級隊列模式可用于幫助實現此方法雄可。
- 延遲執(zhí)行低優(yōu)先級應用或租戶的操作凿傅。這些操作可以被暫停或限制数苫,通過產生異常通知租戶系統(tǒng)忙請稍后重試聪舒。
The figure shows an area graph for resource use (a combination of memory, CPU, bandwidth, and other factors) against time for applications that are making use of three features. A feature is an area of functionality, such as a component that performs a specific set of tasks, a piece of code that performs a complex calculation, or an element that provides a service such as an in-memory cache. These features are labeled A, B, and C.
該圖顯示了利用三個功能的應用程序的資源使用區(qū)域圖(內存,CPU虐急,帶寬和其他因素的組合)與時間的關系箱残。 特征是功能區(qū)域,例如執(zhí)行特定任務集的組件止吁,執(zhí)行復雜計算的代碼片段或提供諸如內存中緩存的服務的元素被辑。 這些特征標記為A,B和C.
The area immediately below the line for a feature indicates the resources that are used by applications when they invoke this feature. For example, the area below the line for Feature A shows the resources used by applications that are making use of Feature A, and the area between the lines for Feature A and Feature B indicates the resources used by applications invoking Feature B. Aggregating the areas for each feature shows the total resource use of the system.
特征線下方的區(qū)域表示應用程序在調用此功能時使用的資源敬惦。 例如盼理,Feature A線下方的區(qū)域顯示了正在使用Feature A的應用程序使用的資源,Feature A和Feature B的行之間的區(qū)域表示應用程序調用Feature B所使用的資源俄删。匯總區(qū)域顯示每個功能系統(tǒng)的總資源使用情況宏怔。
The previous figure illustrates the effects of deferring operations. Just prior to time T1, the total resources allocated to all applications using these features reach a threshold (the limit of resource use). At this point, the applications are in danger of exhausting the resources available. In this system, Feature B is less critical than Feature A or Feature C, so it's temporarily disabled and the resources that it was using are released. Between times T1 and T2, the applications using Feature A and Feature C continue running as normal. Eventually, the resource use of these two features diminishes to the point when, at time T2, there is sufficient capacity to enable Feature B again.
上圖說明了延期操作的效果。 就在T1之前畴椰,分配給使用這些功能的所有應用程序的總資源達到閾值(資源使用限制)臊诊。 在這一點上,應用程序有可能耗盡可用的資源迅矛。 在該系統(tǒng)中,功能B相比Feature A或Feature C而言不太重要潜叛,因此暫時禁用了該功能秽褒,并釋放了它所使用的資源壶硅。 在T1和T2之間,使用Feature A和Feature C的應用程序正常運行销斟。 最終庐椒,這兩個Feature使用的資源在時間點T2減少到有足夠的容量以再次啟用Feature B。
The autoscaling and throttling approaches can also be combined to help keep the applications responsive and within SLAs. If the demand is expected to remain high, throttling provides a temporary solution while the system scales out. At this point, the full functionality of the system can be restored.
彈性伸縮和限流方法也可以組合起來蚂踊,以幫助應用程序保持響應并且符合SLA约谈。 如果需求預期保持高位,節(jié)流將在系統(tǒng)擴展時提供臨時解決方案犁钟。 此時棱诱,可以恢復系統(tǒng)的全部功能。
The next figure shows an area graph of the overall resource use by all applications running in a system against time, and illustrates how throttling can be combined with autoscaling.
下圖顯示了系統(tǒng)中運行的所有應用程序對時間的整體資源使用情況的區(qū)域圖涝动,并說明如何將限流與彈性伸縮
相結合迈勋。
At time T1, the threshold specifying the soft limit of resource use is reached. At this point, the system can start to scale out. However, if the new resources don't become available quickly enough, then the existing resources might be exhausted and the system could fail. To prevent this from occurring, the system is temporarily throttled, as described earlier. When autoscaling has completed and the additional resources are available, throttling can be relaxed.
在時間T1,達到指定資源使用的軟限制的閾值醋粟。 在這一點上靡菇,系統(tǒng)可以開始擴展。 但是米愿,如果新的資源沒有足夠快的可用性厦凤,那么現有資源可能會耗盡,并且系統(tǒng)可能會失敗育苟。 為了防止發(fā)生這種情況较鼓,系統(tǒng)會暫時被限制,如前所述宙搬。 當自動縮放完成并且額外的資源可用時笨腥,可以放寬節(jié)流。
Issues and considerations
You should consider the following points when deciding how to implement this pattern:
- Throttling an application, and the strategy to use, is an architectural decision that impacts the entire design of a system. Throttling should be considered early in the application design process because it isn't easy to add once a system has been implemented.
- Throttling must be performed quickly. The system must be capable of detecting an increase in activity and react accordingly. The system must also be able to revert to its original state quickly after the load has eased. This requires that the appropriate performance data is continually captured and monitored.
- If a service needs to temporarily deny a user request, it should return a specific error code so the client application understands that the reason for the refusal to perform an operation is due to throttling. The client application can wait for a period before retrying the request.
- Throttling can be used as a temporary measure while a system autoscales. In some cases it's better to simply throttle, rather than to scale, if a burst in activity is sudden and isn't expected to be long lived because scaling can add considerably to running costs.
- If throttling is being used as a temporary measure while a system autoscales, and if resource demands grow very quickly, the system might not be able to continue functioning—even when operating in a throttled mode. If this isn't acceptable, consider maintaining larger capacity reserves and configuring more aggressive autoscaling.
在決定如何實現這種模式時勇垛,您應該考慮以下幾點:
- 調整應用程序和使用策略是影響系統(tǒng)整個設計的體系結構決策脖母。在應用程序設計過程中應該考慮調節(jié)節(jié)流,因為系統(tǒng)實施后不容易添加闲孤。
- 調速必須快速執(zhí)行谆级。該系統(tǒng)必須能夠檢測活動的增加并相應地做出反應。在負載緩解之后讼积,系統(tǒng)還必須能夠快速恢復到原來的狀態(tài)肥照。這要求不斷捕獲和監(jiān)視適當的性能數據。
- 如果服務需要臨時拒絕用戶請求勤众,則應返回特定的錯誤代碼舆绎,以便客戶端應用程序了解拒絕執(zhí)行操作的原因是由于限制,客戶端應用程序在重試請求前可以等待一段時間们颜。
- 在系統(tǒng)彈性伸縮時吕朵,限流可用作臨時措施猎醇。在某些情況下,如果突發(fā)事件突然發(fā)生努溃,并且預計不會長時間生活硫嘶,那么限流比彈性伸縮要好,因為擴展會大大增加運行成本梧税。
- 如果在系統(tǒng)彈性伸縮時將限流作為臨時措施使用沦疾,并且如果資源需求增長非常快第队,即使在限流模式下系統(tǒng)也可能無法繼續(xù)運行哮塞。如果這是不可接受的,請考慮維持更大的容量儲備并配置更積極的自動縮放斥铺。
When to use this pattern
Use this pattern:
- To ensure that a system continues to meet service level agreements.
- To prevent a single tenant from monopolizing the resources provided by an application.
- To handle bursts in activity.
- To help cost-optimize a system by limiting the maximum resource levels needed to keep it functioning.
使用此模式:
- 確保系統(tǒng)繼續(xù)符合服務水平協議捣作。
- 防止單一租戶壟斷應用程序提供的資源慰安。
- 處理突發(fā)事件。
- 通過限制正常運行所需的最大資源來優(yōu)化系統(tǒng)成本。
Example
The final figure illustrates how throttling can be implemented in a multi-tenant system. Users from each of the tenant organizations access a cloud-hosted application where they fill out and submit surveys. The application contains instrumentation that monitors the rate at which these users are submitting requests to the application.
下面的圖片說明如何在多租戶系統(tǒng)中實現節(jié)流械媒。 每個租戶的用戶訪問云托管的應用程序蕴轨,他們填寫并提交調查词爬。 該應用程序包含監(jiān)視這些用戶向應用程序提交請求速率的工具镐躲。
In order to prevent the users from one tenant affecting the responsiveness and availability of the application for all other users, a limit is applied to the number of requests per second the users from any one tenant can submit. The application blocks requests that exceed this limit.
為了防止單個租戶的用戶影響所有其他用戶訪問應用程序的響應性和可用性,對每個租戶的用戶每秒可以提交的請求數量進行限制岖常。 應用程序阻止超出此限制的請求驯镊。
Related patterns and guidance
The following patterns and guidance may also be relevant when implementing this pattern:
- Instrumentation and Telemetry Guidance. Throttling depends on gathering information about how heavily a service is being used. Describes how to generate and capture custom monitoring information.
- Service Metering Guidance. Describes how to meter the use of services in order to gain an understanding of how they are used. This information can be useful in determining how to throttle a service.
- Autoscaling Guidance. Throttling can be used as an interim measure while a system autoscales, or to remove the need for a system to autoscale. Contains information on autoscaling strategies.
- Queue-based Load Leveling pattern. Queue-based load leveling is a commonly used mechanism for implementing throttling. A queue can act as a buffer that helps to even out the rate at which requests sent by an application are delivered to a service.
- Priority Queue pattern. A system can use priority queuing as part of its throttling strategy to maintain performance for critical or higher value applications, while reducing the performance of less important applications.
在實現此模式時,以下模式和指導也可能是相關的:
- Instrumentation and Telemetry Guidance竭鞍。限流取決于收集有關服務使用量板惑。介紹如何生成和捕獲自定義監(jiān)視信息。
- 服務計量指導偎快。描述如何計量服務的使用冯乘,以了解如何使用它們。此信息可用于確定如何限制服務晒夹。
- 自動縮放指導裆馒。限流可以作為臨時措施用于系統(tǒng)自動調整,或者不需要系統(tǒng)自動調整丐怯。包含有關自動縮放策略的信息喷好。
- 基于隊列的負載均衡模式《刘危基于隊列的負載均衡是實現限流的常用機制梗搅。隊列可以充當緩沖區(qū),有助于將應用程序發(fā)送的請求的速率均勻地傳遞到服務。
- 優(yōu)先隊列模式无切。系統(tǒng)可以使用優(yōu)先級排隊作為其節(jié)流策略的一部分蟀俊,以維護關鍵或更高價值應用程序的性能,同時降低不太重要的應用程序的性能订雾。