問題現(xiàn)象
在線上發(fā)布一個java 7服務(wù)的時候,發(fā)現(xiàn)某臺機(jī)器發(fā)布完成后無法正常提供服務(wù)吟税,發(fā)布后出現(xiàn)大量線程被blocked往衷,觸發(fā)了告警:
從監(jiān)控中可以看到,JVM中存活的線程數(shù)量已經(jīng)達(dá)到2k+黑忱,這本身就是不正常的,其次勒魔,有近1.8k的線程被blocked了甫煞,這就說明服務(wù)根本就沒有正常啟動,存在啟動問題冠绢。
問題分析
線程數(shù)量超出正常水平抚吠,和線程blocked是因果關(guān)系,因?yàn)榫€程被blocked了弟胀,所以需要更多的線程來執(zhí)行工作楷力,所以新的線程被不斷的創(chuàng)建出來。
所以需要找出線程被阻塞到了什么地方孵户,通過簡單排查分析萧朝,發(fā)現(xiàn)大量的線程都被阻塞在相同的地方:
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton
(DefaultSingletonBeanRegistry.java: 213)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean
(AbstractBeanFactory.java: 308)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean
(AbstractBeanFactory.java: 197)
來看一下阻塞的地方法代碼:org.springframework.beans.factory.support.DefaultSingletonBeanRegistry#getSingleton(java.lang.String, boolean)
/**
* Return the (raw) singleton object registered under the given name.
* <p>Checks already instantiated singletons and also allows for an early
* reference to a currently created singleton (resolving a circular reference).
* @param beanName the name of the bean to look for
* @param allowEarlyReference whether early references should be created or not
* @return the registered singleton object, or {@code null} if none found
*/
protected Object getSingleton(String beanName, boolean allowEarlyReference) {
Object singletonObject = this.singletonObjects.get(beanName);
if (singletonObject == null && isSingletonCurrentlyInCreation(beanName)) {
synchronized (this.singletonObjects) {
singletonObject = this.earlySingletonObjects.get(beanName);
if (singletonObject == null && allowEarlyReference) {
ObjectFactory<?> singletonFactory = this.singletonFactories.get(beanName);
if (singletonFactory != null) {
singletonObject = singletonFactory.getObject();
this.earlySingletonObjects.put(beanName, singletonObject);
this.singletonFactories.remove(beanName);
}
}
}
}
return (singletonObject != NULL_OBJECT ? singletonObject : null);
}
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry#getSingleton(java.lang.String, boolean)這個方法確實(shí)存在同步代碼,需要執(zhí)行同步代碼的線程需要獲取到鎖才能執(zhí)行夏哭,否則就會被blocked检柬。
分析到這里,我們能確定的事情就是調(diào)用方法org.springframework.beans.factory.support.DefaultSingletonBeanRegistry#getSingleton(java.lang.String, boolean)確實(shí)會產(chǎn)生因競爭同步鎖而導(dǎo)致的線程blocked竖配,但是根據(jù)報警何址,幾乎所有的線程都被blocked了,那就可能存在死鎖問題进胯,導(dǎo)致這個鎖無法被釋放用爪,所以所有訪問該方法的線程都被blocked,為了搞明白具體的原因胁镐,先把線程堆棧轉(zhuǎn)儲下來偎血。
"xxx-13-thread-1" daemon prio=10 tid=0x00007fa3790a1000 nid=0x48b4c waiting for monitor entry [0x00007fa38e17b000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:213)
- waiting to lock <0x0000000727af5b68> (a java.util.concurrent.ConcurrentHashMap)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.aop.aspectj.annotation.BeanFactoryAspectInstanceFactory.getAspectInstance(BeanFactoryAspectInstanceFactory.java:83)
at org.springframework.aop.aspectj.annotation.LazySingletonAspectInstanceFactoryDecorator.getAspectInstance(LazySingletonAspectInstanceFactoryDecorator.java:53)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:627)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:616)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
...
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:736)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
...
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
被blocked的線程棧都和上面貼出的一樣,重點(diǎn)在于:
waiting to lock <0x0000000727af5b68> (a java.util.concurrent.ConcurrentHashMap)
0x0000000727af5b68是對象的地址盯漂,其實(shí)根據(jù)后面的提示(a java.util.concurrent.ConcurrentHashMap)也可以確定是我們上面分析過的那個同步對象烁巫,再來看一下剛才那個同步對象:
/** Cache of singleton objects: bean name --> bean instance */
private final Map<String, Object> singletonObjects = new ConcurrentHashMap<String, Object>(256);
現(xiàn)在,我需要知道是哪個線程占有了對象0x0000000727af5b68的鎖不釋放宠能,導(dǎo)致其他線程被blocked亚隙,為了搜索占有鎖的線程,可以在線程棧轉(zhuǎn)儲文件中搜索關(guān)鍵字:"locked <0x0000000727af5b68>"违崇,根據(jù)對象鎖獲取邏輯阿弃,只可能有一個線程持有該對象鎖诊霹,搜索后,發(fā)現(xiàn)了如下的堆棧:
"main" prio=10 tid=0x00007fa4a0018000 nid=0x4889e runnable [0x00007fa4a8fea000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:494)
at org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap(FieldMetaData.java:49)
...
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at com.sun.proxy.$Proxy346.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.reflect.Proxy.newInstance(Proxy.java:764)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:755)
at org.springframework.aop.framework.JdkDynamicAopProxy.getProxy(JdkDynamicAopProxy.java:122)
at org.springframework.aop.framework.JdkDynamicAopProxy.getProxy(JdkDynamicAopProxy.java:112)
at org.springframework.aop.framework.ProxyFactory.getProxy(ProxyFactory.java:96)
...
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1759)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1696)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1626)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:481)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:312)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
- locked <0x0000000727af5b68> (a java.util.concurrent.ConcurrentHashMap)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:756)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
- locked <0x00000007292f2fb0> (a java.lang.Object)
at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:123)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:666)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:353)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:300)
是main線程渣淳,它持有了對象0x00007fa4a8fea000鎖脾还,并且根據(jù)他的狀態(tài)為RUNNABLE,說明它并沒有被阻塞入愧,也就是這其實(shí)不是死鎖問題鄙漏,估計(jì)是main線程進(jìn)入了死循環(huán)出不來,從而持有的鎖無法釋放棺蛛,導(dǎo)致其他需要對象0x00007fa4a8fea000鎖的線程都被blocked怔蚌。
看到棧頂在java.util.HashMap.put(HashMap.java:494),心里一驚旁赊,感覺發(fā)現(xiàn)了thrift的一個bug了桦踊,這件事情還值得再說一說。
HashMap在至今的java版本中均不是線程安全的终畅,也就是說籍胯,如果你的場景中會存在并發(fā)訪問一個Map,你就不能用HashMap离福,否則會出現(xiàn)或多或少的問題杖狼,我們使用的是Java 7,在Java 7中妖爷,多線程并發(fā)訪問HashMap會存在線程死循環(huán)的問題本刽。
為了說明問題,截取HashMap的put方法代碼如下:
public V put(K key, V value) {
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) { // ----- 494行
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
進(jìn)入死循環(huán)的條件就是當(dāng)前的e.next = e赠涮,也就是某個節(jié)點(diǎn)的next指針指向了自己,導(dǎo)致無限循環(huán)問題暗挑。為了驗(yàn)證這個問題笋除,將堆dump了下來,然后使用Eclipse Memory Analyzer Tool(下文中使用 MAT 來指代該工具)來載入dump下來的堆炸裆,然后點(diǎn)擊下面示意圖中的按鈕獲取到線程列表:
MAT可以將線程的名字垃它,當(dāng)前的堆棧及持有的對象分析出來,對于排查內(nèi)存問題非常的方便烹看,找到main線程:
結(jié)合HashMap的put死循環(huán)代碼国拇,當(dāng)時的e就是0x73ae67608這個java.util.HashMap.Entry,可以看到惯殊,這個java.util.HashMap.Entry的next還是自己酱吝,這樣就導(dǎo)致了執(zhí)行該代碼的main線程死循環(huán)了。
關(guān)于HashMap的死循環(huán)問題是如何產(chǎn)生的土思,可以參考為什么HashMap不線程安全
問題解決
這個HashMap的代碼是thrift的代碼务热,我們可以看看原始代碼:
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//
package org.apache.thrift.meta_data;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;
import org.apache.thrift.TBase;
import org.apache.thrift.TFieldIdEnum;
public class FieldMetaData implements Serializable {
public final String fieldName;
public final byte requirementType;
public final FieldValueMetaData valueMetaData;
private static Map<Class<? extends TBase>, Map<? extends TFieldIdEnum, FieldMetaData>> structMap = new HashMap();
public FieldMetaData(String name, byte req, FieldValueMetaData vMetaData) {
this.fieldName = name;
this.requirementType = req;
this.valueMetaData = vMetaData;
}
public static void addStructMetaDataMap(Class<? extends TBase> sClass, Map<? extends TFieldIdEnum, FieldMetaData> map) {
structMap.put(sClass, map);
}
public static Map<? extends TFieldIdEnum, FieldMetaData> getStructMetaDataMap(Class<? extends TBase> sClass) {
if(!structMap.containsKey(sClass)) {
try {
sClass.newInstance();
} catch (InstantiationException var2) {
throw new RuntimeException("InstantiationException for TBase class: " + sClass.getName() + ", message: " + var2.getMessage());
} catch (IllegalAccessException var3) {
throw new RuntimeException("IllegalAccessException for TBase class: " + sClass.getName() + ", message: " + var3.getMessage());
}
}
return (Map)structMap.get(sClass);
}
}
根據(jù)問題忆嗜,我們知道,解決問題的方式有兩種崎岂,一種是將structMap定義成并發(fā)安全的ConcurrentHashMap捆毫,另一種方法是將訪問structMap的代碼寫成同步的,也就是在操作structMap的方法上(或者代碼段上)加上synchronized關(guān)鍵字冲甘。
此時興奮的我想快去給thrift提個pr绩卤,但是發(fā)現(xiàn)如下的代碼:
可以看到thrift已經(jīng)修復(fù)了該問題,是使用加synchronized關(guān)鍵字的方案來解決的江醇。我們可以升級到0.9.3及之后的版本就可以避免再次發(fā)生這樣的問題濒憋。
這個pr是為了解決THRIFT-1618這個任務(wù)的,為了看看這個問題是否和我們的問題一致嫁审,可以搜索一下這個任務(wù):
可以看到這個任務(wù)的狀態(tài)是CLOSED跋炕,已經(jīng)被解決,問題描述也和我們的狀況一致律适。
結(jié)論
基于上文的分析辐烂,總結(jié)一下,該問題是因?yàn)槎嗑€程并發(fā)訪問HashMap觸發(fā)Java 7 HashMap擴(kuò)容時導(dǎo)致鏈表循環(huán)捂贿,從而線程進(jìn)入死循環(huán)纠修,而死循環(huán)線程持有的對象鎖無法得到釋放,其他請求獲取對象鎖的線程均被blocked?厂僧。將thrift版本升級到0.9.3以上就可以解決這個問題扣草。