一景醇、現(xiàn)象
在服務(wù)客戶端中臀稚,每天都發(fā)現(xiàn)有幾個連接服務(wù)端異常的exception。少的時候沒有這樣的exception三痰,但多的時候一天有幾次吧寺,特別是周末的時候窜管,比較多。初步分析是網(wǎng)絡(luò)問題稚机,但是做了網(wǎng)絡(luò)實時監(jiān)控以后幕帆,發(fā)現(xiàn)即使網(wǎng)絡(luò)十分通常,也偶爾會出現(xiàn)這樣的問題赖条,所以網(wǎng)絡(luò)問題被排除了失乾。
出現(xiàn)問題的報錯如下
Caused by: java.net.SocketTimeoutException: Read timed out
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1890)
at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1885)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1884)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1457)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:122)
... 21 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:109)
... 21 more
二、找到問題
繼續(xù)找問題谋币,并且在服務(wù)端做了jstack的監(jiān)控仗扬,發(fā)現(xiàn)每當(dāng)出現(xiàn)這種問題的時候,服務(wù)端的jstack顯示蕾额,出現(xiàn)大量的線程block在了log4j打印日志上早芭,jstack如下:
"http-nio-7000-exec-102" #188 daemon prio=5 os_prio=0 tid=0x00007fb15005d000 nid=0x4ae7 waiting for monitor entry [0x00007fb21c34a000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x000000008002def8> (a org.apache.log4j.Logger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
.........
.........
at com.sun.proxy.$Proxy9.getAobi(Unknown Source)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:306)
at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:202)
at com.caucho.hessian.server.HessianServlet.invoke(HessianServlet.java:389)
at com.caucho.hessian.server.HessianServlet.service(HessianServlet.java:369)
at com.baitian.pay.behaviorlog.BehaviorLogHessianServlet.service(BehaviorLogHessianServlet.java:47)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
- locked <0x0000000082822d40> (a org.apache.tomcat.util.net.NioChannel)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x0000000080bbf3d0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
可見,有大量的線程阻塞了诅蝶,而打印log我們是放在代理上的退个,所以每次請求都會打印一次log,當(dāng)請求密集的時候调炬,就會出現(xiàn)大量線程阻塞在log4j的callAppenders上语盈,分析log4j的源碼如下:
/**
Call the appenders in the hierrachy starting at
<code>this</code>. If no appenders could be found, emit a
warning.
<p>This method calls all the appenders inherited from the
hierarchy circumventing any evaluation of whether to log or not
to log the particular log request.
@param event the event to log. */
public void callAppenders(LoggingEvent event) {
int writes = 0;
for(Category c = this; c != null; c=c.parent) {
// Protected against simultaneous call to addAppender, removeAppender,...
synchronized(c) {
if(c.aai != null) {
writes += c.aai.appendLoopOnAppenders(event);
}
if(!c.additive) {
break;
}
}
}
if(writes == 0) {
repository.emitNoAppenderWarning(this);
}
}
三、問題原因
查看源碼發(fā)現(xiàn)缰泡,log4j在打log的時候用的是古老的synchronized(c) 刀荒,因為行為日志用的是同一個Category,所以所有的服務(wù)端處理請求打出行為log的時候都會競爭這個鎖(在接口做了代理棘钞,打出行為log)缠借。
這就意味著,客戶端的每一次請求宜猜,都需要獲得這個鎖泼返,當(dāng)并發(fā)量大的時候就會出現(xiàn)鎖競爭的問題。
四姨拥、解決問題的方案
當(dāng)前能想到的解決方案有兩個:
- 把日志緩存的隊列里面绅喉,開出單獨的幾個線程負(fù)責(zé)打印log,這樣打印log和業(yè)務(wù)就能分開了叫乌,而不會因為打印log影響到正常的業(yè)務(wù)功能柴罐。
- 升級log4j1.x到log4j2.x,官方已經(jīng)說明憨奸,log4j2.x在多線程打印log的性能是log4j1.x的10倍革屠。原文如下:
Log4j 2 contains next-generation Asynchronous Loggers based on the LMAX Disruptor library. In multi-threaded scenarios Asynchronous Loggers have 10 times higher throughput and orders of magnitude lower latency than Log4j 1.x and Logback.