為何要做限制
系統(tǒng)使用下游資源時,需要考慮下游資源所能提供資源能力。對于資源受限谎势、處理能力不是很強的資源應(yīng)當給予保護(在下游資源無法或者短時間內(nèi)無法提升處理性能的情況下)藤韵∨傲ぃ可以使用限流器或者類似保護機制,避免下游服務(wù)崩潰造成整體服務(wù)的不可用泽艘。
常見算法
常見限流算法有兩種:漏桶算法及令牌桶算法欲险。
漏桶算法
leaky_bucket
漏桶算法 Leaky Bucket (令牌桶算法 Token Bucket)學(xué)習(xí)筆記
容量有限的桶,桶底有一個漏洞悉盆,可以保證桶內(nèi)若有水時會以速度t(L/s)流出盯荤。上方有水注入,當注入的水容量達到桶的容量焕盟,且流速超過水流出的速度秋秤,表示桶已滿。桶滿且繼續(xù)以超過t的速度注入水此時有兩種處理方案:
- 此時需停止注水
- 超過的水將溢出丟棄
算法所牽涉的參數(shù):
- 桶的容量
- 流出速度
算法核心:
- 如何判斷容量已滿
- 如何表達漏水
令牌桶算法
令牌桶算法基于這樣的場景的模擬:
有一個裝有token且token數(shù)量固定的桶脚翘,token添加的速率時固定的灼卢,當有請求來(或者數(shù)據(jù)包到達),會檢查下桶中是否包含足夠多的token(一個請求可能需要多個token)来农。對于數(shù)據(jù)包而言鞋真,數(shù)據(jù)包的長度等同于需要獲取的token數(shù)量。即從桶中消費token沃于,若token數(shù)量足夠涩咖,則消費掉,不夠則根據(jù)不同的策略處理(阻塞當前或提前消費等)繁莹。
Guava Rate limiter實現(xiàn)
Guava實現(xiàn)更接近于令牌桶算法:將一秒鐘切割為令牌數(shù)的時間片段檩互,每個時間片段等同于一個token。
關(guān)鍵變量:
-
nextFreeTicketMicros
:表示下一次允許補充許可的時間(時刻)咨演。這個變量的解釋比較拗口闸昨,看下面流程會比較清晰 -
maxPermits
:最大許可數(shù) -
storedPermits
:存儲的許可數(shù),數(shù)量不能超過最大許可數(shù)
實現(xiàn)
這里有一個關(guān)鍵方法(重)同步方法,在初始化以及獲取操作時都會用到:
void resync(long nowMicros) {
// if nextFreeTicket is in the past, resync to now
if (nowMicros > nextFreeTicketMicros) {
double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();
storedPermits = min(maxPermits, storedPermits + newPermits);
nextFreeTicketMicros = nowMicros;
}
}
如果當前時間(不是時刻饵较,而是自創(chuàng)建起所流經(jīng)的時間拍嵌,下同)超過了上一次所設(shè)定的nextFreeTicketMicros
時間,則會重新進行同步:
- 通過計算上一次設(shè)定
nextFreeTicketMicros
到當前時刻的時間差獲取新增的可用許可數(shù)循诉; - 計算可用的許可數(shù):如果新增的許可數(shù)+原有的許可數(shù)小于最大許可數(shù)横辆,則存儲的許可數(shù)增加新增的數(shù)量,否則同步為最大許可數(shù)打洼;
- 同步下一次允許補充許可時間為當前時間
初始化
static RateLimiter create(SleepingStopwatch stopwatch, double permitsPerSecond) {
RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
rateLimiter.setRate(permitsPerSecond);
return rateLimiter;
}
這里使用一個StopWatch
來計時龄糊,主要是獲取自限速器創(chuàng)建所流經(jīng)的時間。
初始化關(guān)鍵變量(其實就是通過resync方法來實現(xiàn)主要邏輯的
):
nextFreeTicketMicros為當前時間募疮;maxPermits為傳入的每秒允許的許可數(shù)炫惩;storedPermits則為0
獲取許可(acquire)
獲取一定數(shù)量的許可,如果獲取不到阿浓,則阻塞相應(yīng)時間他嚷,然后獲取相應(yīng)許可。并返回當前操作所等待的時間芭毙。
- 嘗試
resync
操作 - 返回值所需等待時間設(shè)置為min(
nextFreeTicketMicros
-nowMicros,0) - 實際消耗的許可數(shù):min(請求許可數(shù),存儲許可數(shù)中的小值)筋蓖;
- 需要刷新獲取的許可數(shù)(
freshPermits
):請求許可數(shù)-實際消耗許可數(shù) - 等待時間(
waitMicros
):需要刷新獲取的許可數(shù)(freshPermits
)*每個許可數(shù)所需時間 - 下一次允許補充許可時間(
nextFreeTicketMicros
)同步為:nextFreeTicketMicros+=waitMicros - 更新剩余存儲的許可數(shù):存儲許可數(shù)-本次實際消耗許可數(shù)
根據(jù)resync
方法條件:if (nowMicros > nextFreeTicketMicros)
不難發(fā)現(xiàn),如果申請獲取的許可數(shù)多于剩余可分配的許可數(shù)退敦,更新后的nextFreeTicketMicros
時間會超過nowMicros
粘咖,但是當前請求所需等待時間為0。即對于超量許可申請(大于當前可提供的許可數(shù))侈百,等待操作是在下一次請求時才會發(fā)生瓮下。通俗點說就是:前人挖坑后人跳。
當nextFreeTicketMicros
早于當前時間钝域,且許可數(shù)足夠的情況:
當nextFreeTicketMicros
早于當前讽坏,但是許可數(shù)不夠的情況:
當nextFreeTicketMicros
晚于當前時間,主要是阻塞時間計算例证,許可數(shù)分發(fā)以及時間計算等同上兩場景路呜。
嘗試獲取許可(tryAcquire)
如果nextFreeTicketMicros
-timeout
<=nowMicros
,說明經(jīng)過超時時間內(nèi)也不會有一個許可可以分配(按上描述织咧,只要有許可胀葱,就可用分配,無論申請的數(shù)量有多少)笙蒙,則tryAcquire操作直接返回false抵屿。否則按照acquire操作流程獲取許可信息。
預(yù)熱(warmingup)
首先申請一個容量為100(每秒)的限流器手趣,然后多線程并發(fā)獲取許可,并發(fā)數(shù)量為20,且每個線程只獲取一次绿渣。
附上測試代碼:
public void testCurrent(){
RateLimiter rateLimiter = RateLimiter.create(100);
ExecutorService executorService = Executors.newFixedThreadPool(100);
Runnable runnable = ()->{
if(!rateLimiter.tryAcquire(1,100,TimeUnit.MILLISECONDS)){
System.out.println("F"+Thread.currentThread().getName());
}else {
System.out.println("A"+Thread.currentThread().getName());
}
};
for (int i = 0; i < 20; i++) {
executorService.execute(runnable);
}
try {
executorService.awaitTermination(1,TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
按上算法描述應(yīng)當不會出現(xiàn)F
開頭的輸出朝群,但是實際卻發(fā)現(xiàn)20次輸出基本有小半數(shù)的嘗試獲取失敗:
1489453467102 pool-1-thread-1
1489453467102 pool-1-thread-2
1489453467104 pool-1-thread-3
1489453467104 pool-1-thread-4
1489453467105 pool-1-thread-5
1489453467105 pool-1-thread-6
1489453467105 pool-1-thread-7
1489453467107 pool-1-thread-8
1489453467107 pool-1-thread-9
F 1489453467108 pool-1-thread-15
F 1489453467108 pool-1-thread-16
F 1489453467109 pool-1-thread-17
F 1489453467109 pool-1-thread-18
F 1489453467109 pool-1-thread-19
F 1489453467109 pool-1-thread-20
1489453467219 pool-1-thread-10
1489453467239 pool-1-thread-11
1489453467259 pool-1-thread-12
1489453467274 pool-1-thread-13
1489453467297 pool-1-thread-14
問題來自于初始化時中符,storedPermits
存儲的許可數(shù)為0姜胖,而第一個線程進行獲取時,離初始時時間非常近淀散,導(dǎo)致第一個線程獲取許可后右莱,存儲的可用許可數(shù)并非為聲明的最大許可數(shù),從而導(dǎo)致后續(xù)線程嘗試獲取幾次后會耗盡存儲的許可數(shù)档插,繼而導(dǎo)致tryAcquire
操作失敗慢蜓。
RateLimiter的設(shè)計哲學(xué)
援引自com/google/common/util/concurrent/SmoothRateLimiter.java
Doc說明,非常值得讀一下:
How is the RateLimiter designed, and why?
The primary feature of a RateLimiter is its "stable rate", the maximum rate thatis should allow at normal conditions. This is enforced by "throttling" incomingrequests as needed, i.e. compute, for an incoming request, the appropriate throttle time,and make the calling thread wait as much.
The simplest way to maintain a rate of QPS is to keep the timestamp of the lastgranted request, and ensure that (1/QPS) seconds have elapsed since then. For example,for a rate of QPS=5 (5 tokens per second), if we ensure that a request isn't grantedearlier than 200ms after the last one, then we achieve the intended rate.If a request comes and the last request was granted only 100ms ago, then we wait foranother 100ms. At this rate, serving 15 fresh permits (i.e. for an acquire(15) request)naturally takes 3 seconds.
It is important to realize that such a RateLimiter has a very superficial memoryof the past: it only remembers the last request. What if the RateLimiter was unused fora long period of time, then a request arrived and was immediately granted?This RateLimiter would immediately forget about that past underutilization. This mayresult in either underutilization or overflow, depending on the real world consequencesof not using the expected rate.
Past underutilization could mean that excess resources are available. Then, the RateLimitershould speed up for a while, to take advantage of these resources. This is importantwhen the rate is applied to networking (limiting bandwidth), where past underutilizationtypically translates to "almost empty buffers", which can be filled immediately.
On the other hand, past underutilization could mean that "the server responsible forhandling the request has become less ready for future requests", i.e. its caches becomestale, and requests become more likely to trigger expensive operations (a more extremecase of this example is when a server has just booted, and it is mostly busy with gettingitself up to speed).
To deal with such scenarios, we add an extra dimension, that of "past underutilization",modeled by "storedPermits" variable. This variable is zero when there is nounderutilization, and it can grow up to maxStoredPermits, for sufficiently largeunderutilization. So, the requested permits, by an invocation acquire(permits),are served from:- stored permits (if available)- fresh permits (for any remaining permits)
How this works is best explained with an example:
For a RateLimiter that produces 1 token per second, every secondthat goes by with the RateLimiter being unused, we increase storedPermits by 1.Say we leave the RateLimiter unused for 10 seconds (i.e., we expected a request at timeX, but we are at time X + 10 seconds before a request actually arrives; this isalso related to the point made in the last paragraph), thus storedPermitsbecomes 10.0 (assuming maxStoredPermits >= 10.0). At that point, a request of acquire(3)arrives. We serve this request out of storedPermits, and reduce that to 7.0 (how this istranslated to throttling time is discussed later). Immediately after, assume that anacquire(10) request arriving. We serve the request partly from storedPermits,using all the remaining 7.0 permits, and the remaining 3.0, we serve them by fresh permitsproduced by the rate limiter.
We already know how much time it takes to serve 3 fresh permits: if the rate is"1 token per second", then this will take 3 seconds. But what does it mean to serve 7stored permits? As explained above, there is no unique answer. If we are primarilyinterested to deal with underutilization, then we want stored permits to be given out/faster/ than fresh ones, because underutilization = free resources for the taking.If we are primarily interested to deal with overflow, then stored permits couldbe given out /slower/ than fresh ones. Thus, we require a (different in each case)function that translates storedPermits to throtting time.
This role is played by storedPermitsToWaitTime(double storedPermits, double permitsToTake).The underlying model is a continuous function mapping storedPermits(from 0.0 to maxStoredPermits) onto the 1/rate (i.e. intervals) that is effective at the givenstoredPermits. "storedPermits" essentially measure unused time; we spend unused timebuying/storing permits. Rate is "permits / time", thus "1 / rate = time / permits".Thus, "1/rate" (time / permits) times "permits" gives time, i.e., integrals on thisfunction (which is what storedPermitsToWaitTime() computes) correspond to minimum intervalsbetween subsequent requests, for the specified number of requested permits.
Here is an example of storedPermitsToWaitTime:If storedPermits == 10.0, and we want 3 permits, we take them from storedPermits,reducing them to 7.0, and compute the throttling for these as a call tostoredPermitsToWaitTime(storedPermits = 10.0, permitsToTake = 3.0), which willevaluate the integral of the function from 7.0 to 10.0.
Using integrals guarantees that the effect of a single acquire(3) is equivalentto { acquire(1); acquire(1); acquire(1); }, or { acquire(2); acquire(1); }, etc,since the integral of the function in [7.0, 10.0] is equivalent to the sum of theintegrals of [7.0, 8.0], [8.0, 9.0], [9.0, 10.0] (and so on), no matterwhat the function is. This guarantees that we handle correctly requests of varying weight(permits), /no matter/ what the actual function is - so we can tweak the latter freely.(The only requirement, obviously, is that we can compute its integrals).
Note well that if, for this function, we chose a horizontal line, at height of exactly(1/QPS), then the effect of the function is non-existent: we serve storedPermits atexactly the same cost as fresh ones (1/QPS is the cost for each). We use this trick later.
If we pick a function that goes /below/ that horizontal line, it means that we reducethe area of the function, thus time. Thus, the RateLimiter becomes /faster/ after aperiod of underutilization. If, on the other hand, we pick a function thatgoes /above/ that horizontal line, then it means that the area (time) is increased,thus storedPermits are more costly than fresh permits, thus the RateLimiter becomes/slower/ after a period of underutilization.
Last, but not least: consider a RateLimiter with rate of 1 permit per second, currentlycompletely unused, and an expensive acquire(100) request comes. It would be nonsensicalto just wait for 100 seconds, and /then/ start the actual task. Why wait without doinganything? A much better approach is to /allow/ the request right away (as if it was anacquire(1) request instead), and postpone /subsequent/ requests as needed. In this version,we allow starting the task immediately, and postpone by 100 seconds future requests,thus we allow for work to get done in the meantime instead of waiting idly.
This has important consequences: it means that the RateLimiter doesn't remember the timeof the _last_ request, but it remembers the (expected) time of the next request. Thisalso enables us to tell immediately (see tryAcquire(timeout)) whether a particulartimeout is enough to get us to the point of the next scheduling time, since we alwaysmaintain that. And what we mean by "an unused RateLimiter" is also defined by thatnotion: when we observe that the "expected arrival time of the next request" is actuallyin the past, then the difference (now - past) is the amount of time that the RateLimiterwas formally unused, and it is that amount of time which we translate to storedPermits.(We increase storedPermits with the amount of permits that would have been producedin that idle time). So, if rate == 1 permit per second, and arrivals come exactlyone second after the previous, then storedPermits is _never_ increased -- we would onlyincrease it for arrivals _later_ than the expected one second.