探索StringTable提升YGC性能

很久很久以前看過笨神的一篇文章JVM源碼分析之String.intern()導致的YGC不斷變長莽龟，其原因是YGC過程需要對StringTable做掃描蠕嫁，而String.intern()就是在StringTable中保存這個對象的引用，如果String.intern()添加越來越多不同的對象毯盈，那么StringTable就越大剃毒，掃描StringTable的時間就越長，從而導致YGC耗時越長；那么如何確定YGC耗時越來越長是StringTable變大引起的呢赘阀？

介紹一個參數(shù)-XX:+PrintStringTableStatistics益缠，看名字就知道這個參數(shù)的作用了：打印出StringTable的統(tǒng)計信息；再詳細一點描述：在JVM進程退出時基公，打印出StringTable的統(tǒng)計信息到標準日志輸出目錄中幅慌。

JDK版本要求：JDK 7u6 +

驗證問題

驗證代碼如下：

import java.util.UUID;

/**
 * @author afei
 * @version 1.0.0
 * @since 2017年08月16日
 */
public class StringInternTest {
    public static void main(String[] args) throws Exception {
        for (int i=0; i<Integer.MAX_VALUE; i++){
            UUID.randomUUID().toString().intern();
            if (i>=100000 && i%100000==0){
                System.out.println("i="+i);
            }
        }
    }
}

執(zhí)行命令如下：

java -verbose:gc -XX:+PrintGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC  -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintStringTableStatistics -Xmx1g -Xms1g -Xmn64m StringInternTest

從gc日志可以看出YGC時間越來越長：

[GC (Allocation Failure)  52480K->5691K(1042048K), 0.0109821 secs]
i=100000
[GC (Allocation Failure)  65261K->19731K(1042048K), 0.0233471 secs]
i=200000
[GC (Allocation Failure)  72211K->26796K(1042048K), 0.0266068 secs]
i=300000
[GC (Allocation Failure)  79276K->33860K(1042048K), 0.0262006 secs]
[GC (Allocation Failure)  86340K->40924K(1042048K), 0.0295842 secs]
... ...
[GC (Allocation Failure)  192868K->147456K(1042048K), 0.0661785 secs]
i=1400000
[GC (Allocation Failure)  199936K->154521K(1042048K), 0.0685919 secs]
[GC (Allocation Failure)  207001K->161585K(1042048K), 0.0707886 secs]
i=1500000
[GC (Allocation Failure)  214065K->168649K(1042048K), 0.0744149 secs]
[GC (Allocation Failure)  221129K->175714K(1042048K), 0.0766862 secs]
i=1600000
[GC (Allocation Failure)  228194K->182778K(1042048K), 0.0802783 secs]

String.intern() 250w個String對象：

執(zhí)行命令：
java -verbose:gc -XX:+PrintGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintStringTableStatistics -Xmx1g -Xms1g -Xmn64m StringInternTest

當i=2500000，即往StringTable添加了250w個引用時轰豆，kill調這個進程胰伍，能夠看到PrintStringTableStatistics這個參數(shù)作用下輸出的StringTable相關信息（輸出信息中還有SymbolTable ，這篇文章不做討論）：

StringTable statistics:
Number of buckets       :     60013 =    480104 bytes, avg   8.000
Number of entries       :   2568786 =  61650864 bytes, avg  24.000
Number of literals      :   2568786 = 287662080 bytes, avg 111.984
Total footprint         :           = 349793048 bytes
Average bucket size     :    42.804
Variance of bucket size :    43.104
Std. dev. of bucket size:     6.565
Maximum bucket size     :        82

且這時候的YGC時間達到了0.12s：
i=2500000
[GC (Allocation Failure) 320041K->274625K(1042048K), 0.1268211 secs]
[GC (Allocation Failure) 327105K->281693K(1042048K), 0.1236515 secs]

String.intern() 500w個String對象：

執(zhí)行命令：
java -verbose:gc -XX:+PrintGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintStringTableStatistics -Xmx1g -Xms1g -Xmn64m StringInternTest

當i=5000000酸休，即往StringTable添加了500w個引用時骂租，kill調這個進程，輸出結果如下：

StringTable statistics:
Number of buckets       :     60013 =    480104 bytes, avg   8.000
Number of entries       :   5082093 = 121970232 bytes, avg  24.000
Number of literals      :   5082093 = 569152512 bytes, avg 111.992
Total footprint         :           = 691602848 bytes
Average bucket size     :    84.683
Variance of bucket size :    85.084
Std. dev. of bucket size:     9.224
Maximum bucket size     :       123

YGC時間達到了0.24s：
i=5000000
[GC (Allocation Failure) 595600K->550184K(1042048K), 0.2425553 secs]

PrintStringTableStatistics結果解讀：

從PrintStringTableStatistics輸出信息可以看出StringTable的bucket數(shù)量默認為60013 斑司，且每個bucket占用8個字節(jié)（說明：如果是32位系統(tǒng)渗饮，那么每個bucket占用4個字節(jié)）；Number of entries即Hashtable的entry數(shù)量為2568786宿刮，因為我們String.intern( )了250w個不同的String對象抽米；Average bucket size表示表示bucket中LinkedList的平均size，Maximum bucket size 表示bucket中LinkedList最大的size糙置，Average bucket size越大云茸，說明Hashtable碰撞越嚴重，由于bucket數(shù)量固定為60013谤饭，隨著StringTable添加的引用越來越多标捺，碰撞越來越嚴重，YGC時間越來越長揉抵。

String.intern() 250w個String對象&-XX:StringTableSize=2500000：

執(zhí)行命令：
java -verbose:gc -XX:+PrintGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintStringTableStatistics -Xmx1g -Xms1g -Xmn64m -XX:StringTableSize=2500000 StringInternTest

當i=2500000亡容，kill調這個進程，輸出結果如下：

StringTable statistics:
Number of buckets       :   2500000 =  20000000 bytes, avg   8.000
Number of entries       :   2573556 =  61765344 bytes, avg  24.000
Number of literals      :   2573556 = 288196288 bytes, avg 111.984
Total footprint         :           = 369961632 bytes
Average bucket size     :     1.029
Variance of bucket size :     1.028
Std. dev. of bucket size:     1.014
Maximum bucket size     :        10

YGC時間從0.12s下降到了0.09s：
i=2500000
[GC (Allocation Failure) 320216K->274800K(1042048K), 0.0890073 secs]
[GC (Allocation Failure) 327280K->281865K(1042048K), 0.0926348 secs]

String.intern() 500w個String對象&-XX:StringTableSize=5000000：

執(zhí)行命令：
java -verbose:gc -XX:+PrintGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintStringTableStatistics -Xmx1g -Xms1g -Xmn64m -XX:StringTableSize=5000000 StringInternTest

當i=5000000冤今，即往StringTable添加了500w個引用時闺兢，kill調這個進程，輸出結果如下：

StringTable statistics:
Number of buckets       :   5000000 =  40000000 bytes, avg   8.000
Number of entries       :   5151776 = 123642624 bytes, avg  24.000
Number of literals      :   5151776 = 576957008 bytes, avg 111.992
Total footprint         :           = 740599632 bytes
Average bucket size     :     1.030
Variance of bucket size :     1.030
Std. dev. of bucket size:     1.015
Maximum bucket size     :         9

YGC時間從0.24s降到了0.17s：
i=5000000
[GC (Allocation Failure) 595645K->550229K(1042048K), 0.1685862 secs]
[GC (Allocation Failure) 602709K->557293K(1042048K), 0.1706642 secs]

PrintStringTableStatistics&StringTableSize結果解讀：

設置StringTableSize一個合適的值戏罢，即bucket數(shù)量為期望的數(shù)量后屋谭，碰撞的概率明顯降低，由Average bucket size和Maximum bucket size的值明顯小于未配置StringTableSize參數(shù)時的值可知龟糕，且YGC時間也明顯降低桐磁。另外, 最好通過BTrace分析是哪里頻繁調用String.intern(), 確實String.intern()沒有濫用的前提下, 再增大StringTableSize的值。

引申問題

既然StringTable是Hashtable數(shù)據結構讲岁，那為什么不能自己通過rehash擴大bucket數(shù)量來提高性能呢我擂？JVM中StringTable的rehash有點不一樣, JVM中StringTable的rehash不會擴大bucket數(shù)量衬以，而是在bucket不變的前提下，通過一個新的seed嘗試攤平每個bucket中LinkedList的長度（想想也是校摩，如果StringTable能通過rehash擴大bucket數(shù)量看峻，那還要StringTableSize干嘛），rehash大概是一個如下圖所示的過程衙吩，rehash前后bucket數(shù)量不變互妓，這是重點：
假設reash前數(shù)據分布（23,4,8,2,1,5）：

StringTable rehash前.png

reash后可能數(shù)據分布（6,8,8,9,5,7）：

StringTable rehash后.png

對應的源碼在hashtable.cpp中--第一行代碼就是初始化一個新的_seed用于后面的hash值計算：

// Create a new table and using alternate hash code, populate the new table
// with the existing elements.   This can be used to change the hash code
// and could in the future change the size of the table.

template <class T, MEMFLAGS F> void Hashtable<T, F>::move_to(Hashtable<T, F>* new_table) {

  // Initialize the global seed for hashing.
  _seed = AltHashing::compute_seed();
  assert(seed() != 0, "shouldn't be zero");

  int saved_entry_count = this->number_of_entries();

  // Iterate through the table and create a new entry for the new table
  for (int i = 0; i < new_table->table_size(); ++i) {
    for (HashtableEntry<T, F>* p = bucket(i); p != NULL; ) {
      HashtableEntry<T, F>* next = p->next();
      T string = p->literal();
      // Use alternate hashing algorithm on the symbol in the first table
      unsigned int hashValue = string->new_hash(seed());
      // Get a new index relative to the new table (can also change size)
      int index = new_table->hash_to_index(hashValue);
      p->set_hash(hashValue);
      // Keep the shared bit in the Hashtable entry to indicate that this entry
      // can't be deleted.   The shared bit is the LSB in the _next field so
      // walking the hashtable past these entries requires
      // BasicHashtableEntry::make_ptr() call.
      bool keep_shared = p->is_shared();
      this->unlink_entry(p);
      new_table->add_entry(index, p);
      if (keep_shared) {
        p->set_shared();
      }
      p = next;
    }
  }
  // give the new table the free list as well
  new_table->copy_freelist(this);
  assert(new_table->number_of_entries() == saved_entry_count, "lost entry on dictionary copy?");

  // Destroy memory used by the buckets in the hashtable.  The memory
  // for the elements has been used in a new table and is not
  // destroyed.  The memory reuse will benefit resizing the SystemDictionary
  // to avoid a memory allocation spike at safepoint.
  BasicHashtable<F>::free_buckets();
}

結論

YGC耗時的問題確實比較難排查，遍歷StringTable只是其中一部分分井，通過PrintStringTableStatistics參數(shù)可以了解我們應用的StringTable相關統(tǒng)計信息车猬，且可以通過設置合理的StringTableSize值降低碰撞從而減少YGC時間。另一方面尺锚，增大StringTableSize的值有什么影響呢珠闰？需要多消耗一點內存，因為每一個bucket需要8個byte（64位系統(tǒng)）瘫辩。與它帶來的YGC性能提升相比伏嗜，這點內存消耗還是非常值得的。然而StringTable的統(tǒng)計信息需要在JVM退出時才輸出伐厌，不得不說是一個遺憾承绸，哎！

最后編輯于：2018.05.23 19:54:02

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末挣轨，一起剝皮案震驚了整個濱河市军熏，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌卷扮，老刑警劉巖荡澎，帶你破解...
沈念sama閱讀 218,682評論 6贊 507
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異晤锹，居然都是意外死亡摩幔，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,277評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門鞭铆，熙熙樓的掌柜王于貴愁眉苦臉地迎上來或衡，“玉大人，你說我怎么就攤上這事车遂》舛希” “怎么了？”我有些...
開封第一講書人閱讀 165,083評論 0贊 355
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵艰额，是天一觀的道長澄港。經常有香客問我，道長柄沮，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,763評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮祖搓，結果婚禮上狱意，老公的妹妹穿的比我還像新娘。我一直安慰自己拯欧，他們只是感情好详囤，可當我...
茶點故事閱讀 67,785評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著镐作，像睡著了一般藏姐。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上该贾，一...
開封第一講書人閱讀 51,624評論 1贊 305
城市分裂傳說
那天羔杨，我揣著相機與錄音，去河邊找鬼杨蛋。笑死兜材，一個胖子當著我的面吹牛，可吹牛的內容都是我干的逞力。我是一名探鬼主播曙寡，決...
沈念sama閱讀 40,358評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼寇荧！你這毒婦竟也來了举庶？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 39,261評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤揩抡，失蹤者是張志新（化名）和其女友劉穎户侥，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體捅膘，經...
沈念sama閱讀 45,722評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡添祸，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 37,900評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了寻仗。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片刃泌。...
茶點故事閱讀 40,030評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖署尤，靈堂內的尸體忽然破棺而出耙替，到底是詐尸還是另有隱情，我是刑警寧澤曹体，帶...
沈念sama閱讀 35,737評論 5贊 346
?日本核電站爆炸內幕
正文年R本政府宣布俗扇，位于F島的核電站，受9級特大地震影響箕别，放射性物質發(fā)生泄漏铜幽。R本人自食惡果不足惜滞谢，卻給世界環(huán)境...
茶點故事閱讀 41,360評論 3贊 330
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望除抛。院中可真熱鬧狮杨，春花似錦、人聲如沸到忽。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,941評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽喘漏。三九已至护蝶，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間翩迈，已是汗流浹背持灰。一陣腳步聲響...
開封第一講書人閱讀 33,057評論 1贊 270
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留帽馋，地道東北人搅方。一個月前我還...
沈念sama閱讀 48,237評論 3贊 371
代替公主和親
正文我出身青樓，卻偏偏與公主長得像绽族，于是被迫代替她去往敵國和親姨涡。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當晚...
茶點故事閱讀 44,976評論 2贊 355