一杜跷、開篇
hystix相信大家都不陌生。github地址:https://github.com/Netflix/Hystrix解取。中文名稱翻譯為刺猬存谎,顧明思議是用來保護我們系統(tǒng)的。在分布式系統(tǒng)中可能會依賴很多服務变勇,當依賴的服務出現(xiàn)異常恤左,接口時延上漲贴唇,超時,很有可能會把上游業(yè)務的接口給拖死飞袋,把線程資源耗盡戳气。我們需要一種機制對依賴服務的可用性做分析,如果依賴服務的失敗率異常巧鸭,能夠做到類似保險絲的作用瓶您,把流量切斷,避免產生更嚴重的故障纲仍。
其中最核心的組建就是里面的斷路器呀袱。我們主要分析兩點:
- 何時決定把斷路器打開
- 當依賴服務恢復的時候如何自動恢復
二、源碼分析
1. 整體流程
我們借用hystrix wiki上的一張圖來簡單了解整個流程郑叠。本次關注的核心點是4,7夜赵。也就是斷路器的實現(xiàn)邏輯。
2. 斷路器實現(xiàn)
斷路器的接口:HystrixCircuitBreaker
/**
* Circuit-breaker logic that is hooked into {@link HystrixCommand} execution and will stop allowing executions if failures have gone past the defined threshold.
* <p>
* The default (and only) implementation will then allow a single retry after a defined sleepWindow until the execution
* succeeds at which point it will again close the circuit and allow executions again.
*/
public interface HystrixCircuitBreaker {
/**
* Every {@link HystrixCommand} requests asks this if it is allowed to proceed or not. It is idempotent and does
* not modify any internal state, and takes into account the half-open logic which allows some requests through
* after the circuit has been opened
*
* @return boolean whether a request should be permitted
*/
boolean allowRequest();
/**
* Whether the circuit is currently open (tripped).
*
* @return boolean state of circuit breaker
*/
boolean isOpen();
/**
* Invoked on successful executions from {@link HystrixCommand} as part of feedback mechanism when in a half-open state.
*/
void markSuccess();
/**
* Invoked on unsuccessful executions from {@link HystrixCommand} as part of feedback mechanism when in a half-open state.
*/
void markNonSuccess();
/**
* Invoked at start of command execution to attempt an execution. This is non-idempotent - it may modify internal
* state.
*/
boolean attemptExecution();
我們重點關注兩個方法allowRequest和isOpen乡革,分別是判斷是否允許流量進來和斷路器開啟關閉的核心接口寇僧。
HystrixCircuitBreaker有兩個實現(xiàn)類。分別是:
- NoOpCircuitBreaker
空實現(xiàn)類 - HystrixCircuitBreakerImpl
默認實現(xiàn)類沸版。本次分析的重點就是這個類嘁傀。
3. 調用棧
我們重點看下applyHystrixSemantics這個方法。
private Observable<R> applyHystrixSemantics(final AbstractCommand<R> _cmd) {
// mark that we're starting execution on the ExecutionHook
// if this hook throws an exception, then a fast-fail occurs with no fallback. No state is left inconsistent
executionHook.onStart(_cmd);
/* determine if we're allowed to execute */
if (circuitBreaker.allowRequest()) {
...
} else {
return handleShortCircuitViaFallback();
}
}
通過斷路器控制是否應該走正常的調用邏輯推穷。
4. 斷路器判斷邏輯
@Override
public boolean allowRequest() {
if (properties.circuitBreakerForceOpen().get()) {
// properties have asked us to force the circuit open so we will allow NO requests
return false;
}
if (properties.circuitBreakerForceClosed().get()) {
// we still want to allow isOpen() to perform it's calculations so we simulate normal behavior
isOpen();
// properties have asked us to ignore errors so we will ignore the results of isOpen and just allow all traffic through
return true;
}
return !isOpen() || allowSingleTest();
}
- 若斷路器關閉心包,則允許訪問。
- 否則嘗試放行一部分流量進來驗收依賴服務是否正常
接下來看isOpen的實現(xiàn)方法馒铃。
@Override
public boolean isOpen() {
if (circuitOpen.get()) {
// if we're open we immediately return true and don't bother attempting to 'close' ourself as that is left to allowSingleTest and a subsequent successful test to close
return true;
}
// we're closed, so let's see if errors have made us so we should trip the circuit open
HealthCounts health = metrics.getHealthCounts();
- 請求總數(shù)沒有達到設置的請求閾值蟹腾,不會打開斷路器。(對于請求太少的場景区宇,失敗率沒有太大意義)
// check if we are past the statisticalWindowVolumeThreshold
if (health.getTotalRequests() < properties.circuitBreakerRequestVolumeThreshold().get()) {
// we are not past the minimum volume threshold for the statisticalWindow so we'll return false immediately and not calculate anything
return false;
}
- 當失敗率大于某個閾值的時候娃殖,把斷路器打開。
if (health.getErrorPercentage() < properties.circuitBreakerErrorThresholdPercentage().get()) {
return false;
} else {
- 這里要考慮并發(fā)的場景议谷,所以使用CAS的操作
// our failure rate is too high, trip the circuit
if (circuitOpen.compareAndSet(false, true)) {
// if the previousValue was false then we want to set the currentTime
- 設置斷路器的開啟時間是為了讓服務在一定的時間范圍內接受少量的流量來決定是否需要把斷路器重新關閉炉爆。
circuitOpenedOrLastTestedTime.set(System.currentTimeMillis());
return true;
} else {
// How could previousValue be true? If another thread was going through this code at the same time a race-condition could have
// caused another thread to set it to true already even though we were in the process of doing the same
// In this case, we know the circuit is open, so let the other thread set the currentTime and report back that the circuit is open
return true;
}
}
}
}
isOpen的邏輯很清晰,簡而言之就是當失敗率大于某個閾值的時候會把斷路器打開卧晓。
接下來我們重點看下allowSingleTest的方法芬首。
public boolean allowSingleTest() {
long timeCircuitOpenedOrWasLastTested = circuitOpenedOrLastTestedTime.get();
// 1) if the circuit is open
// 2) and it's been longer than 'sleepWindow' since we opened the circuit
if (circuitOpen.get() && System.currentTimeMillis() > timeCircuitOpenedOrWasLastTested + properties.circuitBreakerSleepWindowInMilliseconds().get()) {
// We push the 'circuitOpenedTime' ahead by 'sleepWindow' since we have allowed one request to try.
// If it succeeds the circuit will be closed, otherwise another singleTest will be allowed at the end of the 'sleepWindow'.
if (circuitOpenedOrLastTestedTime.compareAndSet(timeCircuitOpenedOrWasLastTested, System.currentTimeMillis())) {
// if this returns true that means we set the time so we'll return true to allow the singleTest
// if it returned false it means another thread raced us and allowed the singleTest before we did
return true;
}
}
return false;
}
邏輯很簡單,就是在一定的時間窗口內只會放行一個請求逼裆。eg.
在23:00 00.000的時間開啟了斷路器郁稍,假設斷路器的時間窗口設置為100ms。則在23:00 00.000~23:00 00.100只會允許一個請求通過胜宇。
這個主要是為了驗證依賴服務是否已經恢復正常耀怜。
三恢着、總結
這篇文章主要簡單分析了斷路器的判斷邏輯。接下來會重點分析下斷路器的數(shù)據(jù)收集的邏輯實現(xiàn)(HystrixCommandMetrics)财破。另外hystirx大量用了命令模式的實現(xiàn)(rxjava)掰派,這塊邏輯也是里面理解起來比較費力的地方。