JVM中的字符串常量池是個(gè)有些玄幻的玩意兒,關(guān)于它的細(xì)節(jié),各類書籍和網(wǎng)站上眾說紛紜钥飞。本文試圖參考盡量權(quán)威的資料,找一個(gè)切入點(diǎn)來理清這團(tuán)亂麻衫嵌。所有參考文檔均有傳送門读宙。
本文提到的JVM就是HotSpot。如果不特別說明楔绞,JDK版本默認(rèn)采用1.8论悴,涉及到對比時(shí)會用1.6和1.7掖棉。
字符串駐留
字符串駐留(String interning)是字符串常量池產(chǎn)生的根本原因。英文維基上提供了非常好的解釋膀估,大意如下:
所謂字符串駐留,是指在系統(tǒng)中耻讽,對每個(gè)字面量唯一的字符串察纯,都只保留唯一的一份副本,稱作“駐留量”(intern)针肥,并且它們都是不可變的饼记。這些彼此不同的字符串被存儲在字符串常量池中。
各編程語言有各自的方法來取得字符串常量池中的駐留量慰枕,或者將一個(gè)字符串駐留具则,比如Java中的String.intern()。在Java中具帮,所有編譯期能確定的字符串也都會自動駐留博肋。
不僅字符串可以駐留。例如在Java中蜂厅,[-128,127]區(qū)間內(nèi)的Integer被緩存在內(nèi)部類IntegerCache中匪凡,這個(gè)類就相當(dāng)于整形常量池。在該區(qū)間內(nèi)兩個(gè)數(shù)值相同的整形值掘猿,在自動裝箱后實(shí)際上指向堆內(nèi)的同一個(gè)Integer對象(也就是駐留量)病游,可以參考Integer.valueOf()方法的源碼。
字符串駐留是設(shè)計(jì)模式中的享元模式(flyweight pattern)的典型實(shí)現(xiàn)稠通,這里就不展開描述了衬衬。
字符串字面量
前面提到了字符串字面量(String literal)的概念。Java語言規(guī)范中說:
字符串字面量是雙引號括起來的0或多個(gè)字符改橘。它是對String類實(shí)例的引用滋尉。
一個(gè)字符串字面量總是引用String類的同一個(gè)實(shí)例。這是因?yàn)樽址置媪恳约白址A勘磉_(dá)式都通過使用String.intern()方法而駐留了唧龄,從而可以共享唯一的實(shí)例兼砖。
字符串字面量和字符串常量表達(dá)式都屬于上面說的“編譯期能確定的字符串”。來看Java語言規(guī)范里的示例:
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == "Hello") + " ");
System.out.print((Other.hello == hello) + " ");
System.out.print((other.Other.hello == hello) + " ");
System.out.print((hello == ("Hel"+"lo")) + " ");
System.out.print((hello == ("Hel"+lo)) + " ");
System.out.println(hello == ("Hel"+lo).intern());
}
}
class Other { static String hello = "Hello"; }
package other;
public class Other { public static String hello = "Hello"; }
輸出是true true true true false true
既棺。這可以說明:
- 字符串常量池在JVM中是全局的讽挟,與類和包的作用域都無關(guān);
- 編譯期不能確定的字符串(如上面的
"Hel"+lo
)丸冕,運(yùn)行期會產(chǎn)生新的String對象(通過反編譯可以看出是通過StringBuilder來拼接的)耽梅。
String.intern()
在JDK中,String.intern()方法是一個(gè)native方法:
/**
* Returns a canonical representation for the string object.
* <p>
* A pool of strings, initially empty, is maintained privately by the
* class {@code String}.
* <p>
* When the intern method is invoked, if the pool already contains a
* string equal to this {@code String} object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {@code String} object is added to the
* pool and a reference to this {@code String} object is returned.
* <p>
* It follows that for any two strings {@code s} and {@code t},
* {@code s.intern() == t.intern()} is {@code true}
* if and only if {@code s.equals(t)} is {@code true}.
* <p>
* All literal strings and string-valued constant expressions are
* interned. String literals are defined in section 3.10.5 of the
* <cite>The Java™ Language Specification</cite>.
*
* @return a string that has the same contents as this string, but is
* guaranteed to be from a pool of unique strings.
*/
public native String intern();
如果逐字解釋這段JavaDoc胖烛,大意是:
String類會維護(hù)一個(gè)私有的眼姐、初始為空的字符串池诅迷。
當(dāng)調(diào)用intern()方法時(shí),如果該池中已經(jīng)存在與本字符串this字面量相同的一個(gè)字符串(用equals()方法判定)众旗,那么就直接返回池中的那個(gè)字符串罢杉。如果不存在,那么this會被加入池中(駐留)贡歧,并返回對它的引用滩租。
對兩個(gè)字符串s和t,當(dāng)且僅當(dāng)s.equals(t)
為真時(shí)利朵,s.intern() == t.intern()
才為真律想。
所有字符串字面量和字符串常量表達(dá)式都會被駐留。
由此可見绍弟,Java的字符串駐留和常量池機(jī)制在JDK源碼里是找不到的技即,它們是由JVM底層來實(shí)現(xiàn)的。
事情沒有那么簡單樟遣,我們要解決以下幾個(gè)問題:
- 字符串常量池位于JVM內(nèi)存空間中的哪個(gè)位置而叼?
- 它里面存儲的是String對象,還是String對象的引用年碘,抑或兩者兼而有之澈歉?
- 它內(nèi)部是如何實(shí)現(xiàn)的,有什么調(diào)節(jié)方法屿衅?
字符串常量池的位置
既然要涉及JVM內(nèi)存空間了埃难,先放上經(jīng)典的圖。
在官方發(fā)布的JDK7 Release Notes中涤久,有這樣一段話:
Area: HotSpot
Synopsis: In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation......(以下略)
RFE: 6962931
大意是:在JDK7中涡尘,駐留字符串不再在永久代上分配,而是在Java堆的主要部分(新生代和老年代)分配响迂。
由此可得考抄,JDK6的字符串常量池位于永久代(它是HotSpot的方法區(qū)實(shí)現(xiàn))。到了JDK7蔗彤,字符串常量池就直接放在堆里川梅。下面用《深入理解Java虛擬機(jī)(第二版)》的經(jīng)典例子來證明。它產(chǎn)生一個(gè)無限遞增的數(shù)字字符串序列然遏,并依次放進(jìn)字符串常量池贫途。
public class OOMExample {
public static void main(String[] args) {
// 使用List保持引用,避免常量池被GC
List<String> list = new ArrayList<String>();
int i = 0;
while (true) {
list.add(String.valueOf(i++).intern());
}
}
}
JVM參數(shù)統(tǒng)一為:
-Xms8m -Xmx8m -XX:PermSize=8m -XX:MaxPermSize=8m -XX:+UseParallelGC -XX:+PrintGCDetails
然后分別在JDK6待侵、7丢早、8的環(huán)境下運(yùn)行,觀察輸出結(jié)果。
- JDK6
[GC [PSYoungGen: 2012K->304K(2368K)] 2012K->420K(7872K), 0.0014317 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC [PSYoungGen: 2352K->320K(2368K)] 2468K->705K(7872K), 0.0013064 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
[GC [PSYoungGen: 1331K->288K(2368K)] 1717K->697K(7872K), 0.0007446 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[Full GC [PSYoungGen: 288K->0K(2368K)] [PSOldGen: 409K->617K(5504K)] 697K->617K(7872K) [PSPermGen: 8191K->8191K(8192K)], 0.0130018 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
[GC [PSYoungGen: 0K->0K(2368K)] 617K->617K(7872K), 0.0001804 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[Full GC [PSYoungGen: 0K->0K(2368K)] [PSOldGen: 617K->471K(5504K)] 617K->471K(7872K) [PSPermGen: 8191K->8180K(8192K)], 0.0134341 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
......
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
at java.lang.String.intern(Native Method)
at me.lmagics.OOMExample.main(OOMExample.java:16)
- JDK7
[GC [PSYoungGen: 2048K->507K(2560K)] 2048K->1651K(8192K), 0.0026340 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC [PSYoungGen: 2555K->501K(2560K)] 3699K->3389K(8192K), 0.0028820 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC [PSYoungGen: 2549K->496K(2560K)] 5437K->5192K(8192K), 0.0038110 secs] [Times: user=0.01 sys=0.01, real=0.01 secs]
[Full GC [PSYoungGen: 496K->0K(2560K)] [ParOldGen: 4696K->5101K(5632K)] 5192K->5101K(8192K) [PSPermGen: 2603K->2602K(8192K)], 0.0622090 secs] [Times: user=0.27 sys=0.00, real=0.06 secs]
[Full GC [PSYoungGen: 2048K->1535K(2560K)] [ParOldGen: 5101K->5180K(5632K)] 7149K->6716K(8192K) [PSPermGen: 2602K->2602K(8192K)], 0.0550730 secs] [Times: user=0.28 sys=0.01, real=0.05 secs]
[Full GC [PSYoungGen: 2048K->2047K(2560K)] [ParOldGen: 5180K->5180K(5632K)] 7228K->7228K(8192K) [PSPermGen: 2602K->2602K(8192K)], 0.0287170 secs] [Times: user=0.14 sys=0.00, real=0.03 secs]
......
[Full GC [PSYoungGen: 2047K->2047K(2560K)] [ParOldGen: 5543K->5543K(5632K)] 7591K->7591K(8192K) [PSPermGen: 2602K->2602K(8192K)], 0.0285530 secs] [Times: user=0.16 sys=0.00, real=0.03 secs]
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
[Full GC [PSYoungGen: 2047K->0K(2560K)] [ParOldGen: 5546K->220K(5632K)] 7594K->220K(8192K) [PSPermGen: 2627K->2627K(8192K)], 0.0052340 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
at java.lang.Integer.toString(Integer.java:331)
at java.lang.String.valueOf(String.java:2954)
at me.lmagics.OOMExample.main(OOMExample.java:16)
- JDK8
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=8m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=8m; support was removed in 8.0
[GC (Allocation Failure) [PSYoungGen: 1536K->482K(2048K)] 1536K->1210K(7680K), 0.0017302 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 2018K->505K(2048K)] 2746K->2581K(7680K), 0.0021425 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 2041K->501K(2048K)] 4117K->3969K(7680K), 0.0021064 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC (Allocation Failure) [PSYoungGen: 2037K->496K(2048K)] 5505K->5276K(7680K), 0.0025973 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
[Full GC (Ergonomics) [PSYoungGen: 496K->0K(2048K)] [ParOldGen: 4780K->5090K(5632K)] 5276K->5090K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0587041 secs] [Times: user=0.30 sys=0.01, real=0.05 secs]
[Full GC (Ergonomics) [PSYoungGen: 1412K->880K(2048K)] [ParOldGen: 5090K->5570K(5632K)] 6503K->6451K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0334546 secs] [Times: user=0.17 sys=0.00, real=0.03 secs]
[Full GC (Ergonomics) [PSYoungGen: 1536K->1535K(2048K)] [ParOldGen: 5570K->5154K(5632K)] 7106K->6690K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0320396 secs] [Times: user=0.15 sys=0.00, real=0.04 secs]
......
[Full GC (Ergonomics) [PSYoungGen: 1535K->1535K(2048K)] [ParOldGen: 5542K->5542K(5632K)] 7078K->7078K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0273170 secs] [Times: user=0.17 sys=0.00, real=0.03 secs]
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
[Full GC (Ergonomics) [PSYoungGen: 1536K->0K(2048K)] [ParOldGen: 5545K->267K(5632K)] 7081K->267K(7680K), [Metaspace: 2677K->2677K(1056768K)], 0.0039194 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
at java.lang.Integer.toString(Integer.java:401)
at java.lang.String.valueOf(String.java:3099)
at me.lmagics.OOMExample.main(OOMExample.java:16)
從以上輸出結(jié)果可以看出:
- JDK6報(bào)永久代OOM怨酝,證明字符串常量池確實(shí)在永久代傀缩;
- JDK7和8均報(bào)超出GC臨界限制。在HotSpot中农猬,一旦JVM檢查到用98%以上的時(shí)間來GC赡艰,而回收了少于2%的堆空間,就會報(bào)這個(gè)錯(cuò)誤斤葱。如果使用參數(shù)
-XX:-UseGCOverheadLimit
來關(guān)閉檢查瞄摊,那么一段時(shí)間后就會拋出常見的“java.lang.OutOfMemoryError: Java heap space”。這證明字符串常量池確實(shí)移動到了堆中苦掘; - JDK8還會報(bào)設(shè)置永久代的參數(shù)無效。這是因?yàn)镴DK8已經(jīng)完全移除了永久代楔壤,改用元空間(Metaspace)來實(shí)現(xiàn)方法區(qū)了鹤啡。在GC日志中也可以看到Metaspace GC的情況。
問:為什么字符串常量池要從永久代移動到堆蹲嚣,并且后來永久代還被元空間替代了递瑰?
答:永久代作為HotSpot方法區(qū)的實(shí)現(xiàn)很不好用,并且其他JVM實(shí)現(xiàn)都沒有永久代隙畜。
根據(jù)Java虛擬機(jī)規(guī)范的規(guī)定:
方法區(qū)存儲了每一個(gè)類的結(jié)構(gòu)信息抖部,例如運(yùn)行時(shí)常量池、字段和方法數(shù)據(jù)议惰、構(gòu)造函數(shù)和普通方法的字節(jié)碼內(nèi)容等等慎颗。
雖然方法區(qū)是堆的邏輯組成部分,但是簡單的虛擬機(jī)實(shí)現(xiàn)可以選擇在這個(gè)區(qū)域不做GC與壓縮言询。
在HotSpot中俯萎,方法區(qū)是存在GC的,就是堆空間的分代GC直接擴(kuò)展過來运杭。由于方法區(qū)內(nèi)的數(shù)據(jù)相對于新生代和老年代來講更“靜態(tài)”一些夫啊,為了保持命名一致性,才把這里叫做“永久代”辆憔。
永久代的不好用主要體現(xiàn)在它難以調(diào)控撇眯。它的內(nèi)存大小是由-XX:PermSize
和-XX:MaxPermSize
兩個(gè)參數(shù)定死的,如果設(shè)定得太小虱咧,當(dāng)常量池過大或者動態(tài)加載類的元數(shù)據(jù)過多時(shí)熊榛,就會直接OOM。如果設(shè)定得太大彤钟,會擠占原本可用于堆的空間来候,也會增大GC的壓力。
另外逸雹,在JDK7時(shí)代就開始推動HotSpot與JRockit兩套虛擬機(jī)的融合营搅,而JRockit是不存在永久代的云挟,因此HotSpot最后也取消了它。新加入的元空間則位于本地內(nèi)存(native memory)中转质,消除了原來的大小限制园欣,變得更加靈活。關(guān)于元空間的更多細(xì)節(jié)就不展開休蟹,請參見這里沸枯。
字符串常量池內(nèi)存儲的是什么
這個(gè)問題因?yàn)椴蝗菀昨?yàn)證,經(jīng)常引起爭吵赂弓。
來看下面一段代碼:
public class StringPoolExample {
public static void main(String[] args) {
String s1 = new String("a"); // #1
s1.intern(); // #2
String s2 = "a"; // #3
System.out.println(s1 == s2);
String s3 = s2 + s2; // #4
s3.intern(); // #5
String s4 = "aa"; // #6
System.out.println(s3 == s4);
}
}
這段代碼在JDK6執(zhí)行绑榴,輸出false false
;但在JDK7/8執(zhí)行盈魁,輸出false true
翔怎。根據(jù)結(jié)果的不同,可以推測出字符串常量池內(nèi)的存儲也發(fā)生了變化杨耙。借助ProcessOn畫圖詳細(xì)分析一下:
- JDK6
在#1語句中赤套,創(chuàng)建了多少個(gè)對象?這是面試中極常見的問題珊膜,答案是2個(gè)容握,堆中及字符串常量池中各一個(gè)。由于"a"是字面量车柠,因此它會自動駐留剔氏。#2語句調(diào)用intern()時(shí),字符串常量池中就已經(jīng)存在它了堪遂。#3語句會直接找到常量池中的"a"介蛉,故s1與s2的引用地址是不同的。
#4語句中溶褪,s3引用的字符串的值不能在編譯期確定币旧,因此生成了一個(gè)新的String對象。使用#5語句調(diào)用intern()時(shí)猿妈,常量池里還不存在"aa"吹菱,將它加入進(jìn)去。#6語句也會直接找到常量池中的"aa"彭则,故s3與s4的引用地址也是不同的鳍刷。
- JDK7/8
#1~#3語句的執(zhí)行結(jié)果與上面相同,不再贅述俯抖。
而#4~#6執(zhí)行完后為什么會返回true输瓜?既然==運(yùn)算符比較的是引用類型的地址,那么只能說明s3和s4的引用地址是一樣的。因此尤揣,上面的圖應(yīng)該做一個(gè)改動:
#5語句在執(zhí)行時(shí)搔啊,堆中存在String對象"aa",但常量池中沒有北戏。這時(shí)不再像JDK6一樣將對象加入常量池负芋,而是將對"aa"的引用加入。該引用與s3引用的對象都是堆中的同一個(gè)String對象嗜愈。這樣旧蛾,#6語句在常量池中找到"aa"時(shí),實(shí)際上是找到了與s3相同的引用蠕嫁,所以s3 == s4是成立的锨天。
結(jié)論:
在JDK6中,字符串常量池里保存的都是String對象剃毒。
在JDK7/8中绍绘,對于字符串字面量(當(dāng)然也包括常量表達(dá)式),常量池里會直接保存String對象迟赃。如果是編譯期不能確定的字符串,調(diào)用intern()方法會使得常量池中保存對堆內(nèi)String對象的引用厂镇,而不會在常量池內(nèi)再生成一個(gè)對象纤壁。之所以做這種改動,可能是考慮到字符串常量池已經(jīng)移動到了堆中捺信,因此沒有必要在池內(nèi)和池外各保留一個(gè)對象酌媒,這樣節(jié)省空間。附上前面一段代碼的反匯編字節(jié)碼迄靠。連同class文件常量池的內(nèi)容一起貼在下面了:
Constant pool:
#1 = Methodref #14.#33 // java/lang/Object."<init>":()V
#2 = Class #34 // java/lang/String
#3 = String #35 // a
#4 = Methodref #2.#36 // java/lang/String."<init>":(Ljava/lang/String;)V
#5 = Methodref #2.#37 // java/lang/String.intern:()Ljava/lang/String;
#6 = Fieldref #38.#39 // java/lang/System.out:Ljava/io/PrintStream;
#7 = Methodref #40.#41 // java/io/PrintStream.println:(Z)V
#8 = Class #42 // java/lang/StringBuilder
#9 = Methodref #8.#33 // java/lang/StringBuilder."<init>":()V
#10 = Methodref #8.#43 // java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#11 = Methodref #8.#44 // java/lang/StringBuilder.toString:()Ljava/lang/String;
#12 = String #45 // aa
#13 = Class #46 // me/lmagics/StringPoolExample
#14 = Class #47 // java/lang/Object
#15 = Utf8 <init>
#16 = Utf8 ()V
#17 = Utf8 Code
#18 = Utf8 LineNumberTable
#19 = Utf8 LocalVariableTable
#20 = Utf8 this
#21 = Utf8 Lme/lmagics/StringPoolExample;
#22 = Utf8 main
#23 = Utf8 ([Ljava/lang/String;)V
#24 = Utf8 args
#25 = Utf8 [Ljava/lang/String;
#26 = Utf8 s1
#27 = Utf8 Ljava/lang/String;
#28 = Utf8 s2
#29 = Utf8 s3
#30 = Utf8 s4
#31 = Utf8 SourceFile
#32 = Utf8 StringPoolExample.java
#33 = NameAndType #15:#16 // "<init>":()V
#34 = Utf8 java/lang/String
#35 = Utf8 a
#36 = NameAndType #15:#48 // "<init>":(Ljava/lang/String;)V
#37 = NameAndType #49:#50 // intern:()Ljava/lang/String;
#38 = Class #51 // java/lang/System
#39 = NameAndType #52:#53 // out:Ljava/io/PrintStream;
#40 = Class #54 // java/io/PrintStream
#41 = NameAndType #55:#56 // println:(Z)V
#42 = Utf8 java/lang/StringBuilder
#43 = NameAndType #57:#58 // append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#44 = NameAndType #59:#50 // toString:()Ljava/lang/String;
#45 = Utf8 aa
#46 = Utf8 me/lmagics/StringPoolExample
#47 = Utf8 java/lang/Object
#48 = Utf8 (Ljava/lang/String;)V
#49 = Utf8 intern
#50 = Utf8 ()Ljava/lang/String;
#51 = Utf8 java/lang/System
#52 = Utf8 out
#53 = Utf8 Ljava/io/PrintStream;
#54 = Utf8 java/io/PrintStream
#55 = Utf8 println
#56 = Utf8 (Z)V
#57 = Utf8 append
#58 = Utf8 (Ljava/lang/String;)Ljava/lang/StringBuilder;
#59 = Utf8 toString
{
public me.lmagics.StringPoolExample();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 3: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lme/lmagics/StringPoolExample;
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=3, locals=5, args_size=1
0: new #2 // class java/lang/String
3: dup
4: ldc #3 // String a
6: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
9: astore_1
10: aload_1
11: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String;
14: pop
15: ldc #3 // String a
17: astore_2
18: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
21: aload_1
22: aload_2
23: if_acmpne 30
26: iconst_1
27: goto 31
30: iconst_0
31: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
34: new #8 // class java/lang/StringBuilder
37: dup
38: invokespecial #9 // Method java/lang/StringBuilder."<init>":()V
41: aload_2
42: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
45: aload_2
46: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
49: invokevirtual #11 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
52: astore_3
53: aload_3
54: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String;
57: pop
58: ldc #12 // String aa
60: astore 4
62: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
65: aload_3
66: aload 4
68: if_acmpne 75
71: iconst_1
72: goto 76
75: iconst_0
76: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
79: return
字符串常量池的實(shí)現(xiàn)方法
想從Oracle/Sun JDK獲取這些信息不太可能秒咨,因此我們可以通過研究OpenJDK的native部分代碼來大致得知字符串常量池的實(shí)現(xiàn)方法。在GitHub上可以找到OpenJDK各版本的源代碼鏡像掌挚,這里選取OpenJDK 7u版本分支雨席,從String類開始入手讀源碼即可。
- openjdk/jdk/src/share/native/java/lang/String.c
#include "jvm.h"
#include "java_lang_String.h"
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
return JVM_InternString(env, this);
}
- openjdk/hotspot/src/share/vm/prims/jvm.h
/*
* java.lang.String
*/
JNIEXPORT jstring JNICALL
JVM_InternString(JNIEnv *env, jstring str);
- openjdk/hotspot/src/share/vm/prims/jvm.cpp
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
JvmtiVMObjectAllocEventCollector oam;
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
- openjdk/hotspot/src/share/vm/classfile/symbolTable.hpp
class StringTable : public Hashtable<oop, mtSymbol> {
friend class VMStructs;
private:
// The string table
static StringTable* _the_table;
// Set if one bucket is out of balance due to hash algorithm deficiency
static bool _needs_rehashing;
// Claimed high water mark for parallel chunked scanning
static volatile int _parallel_claimed_idx;
static oop intern(Handle string_or_null, jchar* chars, int length, TRAPS);
oop basic_add(int index, Handle string_or_null, jchar* name, int len,
unsigned int hashValue, TRAPS);
oop lookup(int index, jchar* chars, int length, unsigned int hashValue);
// Apply the give oop closure to the entries to the buckets
// in the range [start_idx, end_idx).
static void buckets_oops_do(OopClosure* f, int start_idx, int end_idx);
// Unlink the entries to the buckets in the range [start_idx, end_idx).
static void buckets_unlink(BoolObjectClosure* is_alive, int start_idx, int end_idx, int* processed, int* removed);
StringTable() : Hashtable<oop, mtSymbol>((int)StringTableSize,
sizeof (HashtableEntry<oop, mtSymbol>)) {}
StringTable(HashtableBucket<mtSymbol>* t, int number_of_entries)
: Hashtable<oop, mtSymbol>((int)StringTableSize, sizeof (HashtableEntry<oop, mtSymbol>), t,
number_of_entries) {}
public:
// The string table
static StringTable* the_table() { return _the_table; }
static void create_table() {
assert(_the_table == NULL, "One string table allowed.");
_the_table = new StringTable();
}
static void create_table(HashtableBucket<mtSymbol>* t, int length,
int number_of_entries) {
assert(_the_table == NULL, "One string table allowed.");
assert((size_t)length == StringTableSize * sizeof(HashtableBucket<mtSymbol>),
"bad shared string size.");
_the_table = new StringTable(t, number_of_entries);
}
// GC support
// Delete pointers to otherwise-unreachable objects.
static void unlink(BoolObjectClosure* cl) {
int processed = 0;
int removed = 0;
unlink(cl, &processed, &removed);
}
static void unlink(BoolObjectClosure* cl, int* processed, int* removed);
// Serially invoke "f->do_oop" on the locations of all oops in the table.
static void oops_do(OopClosure* f);
// Possibly parallel version of the above
static void possibly_parallel_oops_do(OopClosure* f);
static void possibly_parallel_unlink(BoolObjectClosure* cl, int* processed, int* removed);
// Hashing algorithm, used as the hash value used by the
// StringTable for bucket selection and comparison (stored in the
// HashtableEntry structures). This is used in the String.intern() method.
static unsigned int hash_string(const jchar* s, int len);
// Internal test.
static void test_alt_hash() PRODUCT_RETURN;
// Probing
static oop lookup(Symbol* symbol);
// Interning
static oop intern(Symbol* symbol, TRAPS);
static oop intern(oop string, TRAPS);
static oop intern(const char *utf8_string, TRAPS);
// Debugging
static void verify();
static void dump(outputStream* st);
// Sharing
static void copy_buckets(char** top, char*end) {
the_table()->Hashtable<oop, mtSymbol>::copy_buckets(top, end);
}
static void copy_table(char** top, char*end) {
the_table()->Hashtable<oop, mtSymbol>::copy_table(top, end);
}
static void reverse() {
the_table()->Hashtable<oop, mtSymbol>::reverse();
}
// Rehash the symbol table if it gets out of balance
static void rehash_table();
static bool needs_rehashing() { return _needs_rehashing; }
// Parallel chunked scanning
static void clear_parallel_claimed_index() { _parallel_claimed_idx = 0; }
static int parallel_claimed_index() { return _parallel_claimed_idx; }
};
- openjdk/hotspot/src/share/vm/classfile/symbolTable.cpp
StringTable* StringTable::_the_table = NULL;
oop StringTable::intern(Handle string_or_null, jchar* name,
int len, TRAPS) {
unsigned int hashValue = hash_string(name, len);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, name, len, hashValue);
// Found
if (found_string != NULL) return found_string;
debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(),
"proposed name of symbol must be stable");
Handle string;
// try to reuse the string if possible
if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) {
string = string_or_null;
} else {
string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL);
}
// Grab the StringTable_lock before getting the_table() because it could
// change at safepoint.
MutexLocker ml(StringTable_lock, THREAD);
// Otherwise, add to symbol to table
return the_table()->basic_add(index, string, name, len,
hashValue, CHECK_NULL);
}
oop StringTable::lookup(int index, jchar* name,
int len, unsigned int hash) {
int count = 0;
for (HashtableEntry<oop, mtSymbol>* l = bucket(index); l != NULL; l = l->next()) {
count++;
if (l->hash() == hash) {
if (java_lang_String::equals(l->literal(), name, len)) {
return l->literal();
}
}
}
// If the bucket size is too deep check if this hash code is insufficient.
if (count >= BasicHashtable<mtSymbol>::rehash_count && !needs_rehashing()) {
_needs_rehashing = check_rehash_table(count);
}
return NULL;
}
oop StringTable::basic_add(int index_arg, Handle string, jchar* name,
int len, unsigned int hashValue_arg, TRAPS) {
assert(java_lang_String::equals(string(), name, len),
"string must be properly initialized");
// Cannot hit a safepoint in this function because the "this" pointer can move.
No_Safepoint_Verifier nsv;
// Check if the symbol table has been rehashed, if so, need to recalculate
// the hash value and index before second lookup.
unsigned int hashValue;
int index;
if (use_alternate_hashcode()) {
hashValue = hash_string(name, len);
index = hash_to_index(hashValue);
} else {
hashValue = hashValue_arg;
index = index_arg;
}
// Since look-up was done lock-free, we need to check if another
// thread beat us in the race to insert the symbol.
oop test = lookup(index, name, len, hashValue); // calls lookup(u1*, int)
if (test != NULL) {
// Entry already added
return test;
}
HashtableEntry<oop, mtSymbol>* entry = new_entry(hashValue, string());
add_entry(index, entry);
return string();
}
代碼非常長吠式,并且我也不是C/C++方面的行家陡厘,不過仍然能大致看出來,字符串常量池是用類似HashMap/Hashtable的數(shù)據(jù)結(jié)構(gòu)維護(hù)的特占,名稱為StringTable糙置。
在StringTable::intern()方法中,也可以清晰地看出是目,如果能夠在StringTable中找到目標(biāo)字符串谤饭,就直接返回。否則,通過檢查該字符串的引用是否為null揉抵,可以判斷它是否在堆中已經(jīng)存在亡容。如果存在,就保留一個(gè)它的引用(C++代碼內(nèi)是Handle功舀,即句柄)萍倡,不存在的話就直接創(chuàng)建一個(gè)String對象。最后將引用或?qū)ο蠹尤隨tringTable中辟汰。
StringTable的大辛星谩(也就是hash分桶數(shù))固定為StringTableSize,不會擴(kuò)容帖汞,并采用鏈地址法解決沖突戴而。這就意味著如果進(jìn)入字符串常量池中的String過多,就會產(chǎn)生比較嚴(yán)重的hash沖突翩蘸,再調(diào)用String.intern()方法的耗時(shí)會變長所意。
StringTable的大小是能調(diào)整的。首先通過-XX:+PrintFlagsFinal
參數(shù)催首,可以找出StringTable的默認(rèn)大小扶踊。在JDK7和8中,這個(gè)值都是60013:
uintx StringTableSize = 60013 {product}
而在JDK6和比較舊版本的JDK7中郎任,默認(rèn)值是1009秧耗。顯然60013更合適一些。
通過-XX:StringTableSize
參數(shù)舶治,可以改變StringTable的大小分井,如:
-XX:StringTableSize=75979
如果要手動改變它的大小,一般建議先估算整個(gè)程序中需要駐留的字符串的大致數(shù)量霉猛,然后設(shè)置一個(gè)它2倍左右的素?cái)?shù)(可以減少沖突)尺锚。
另外,通過-XX:+PrintStringTableStatistics
參數(shù)惜浅,還可以得到當(dāng)前JVM中StringTable的統(tǒng)計(jì)信息瘫辩,如:
StringTable statistics:
Number of buckets : 60013 = 480104 bytes, avg 8.000
Number of entries : 1543 = 37032 bytes, avg 24.000
Number of literals : 1543 = 144088 bytes, avg 93.382
Total footprint : = 661224 bytes
Average bucket size : 0.026
Variance of bucket size : 0.026
Std. dev. of bucket size: 0.161
Maximum bucket size : 2
關(guān)于StringTable的更多測試,可以參考這里坛悉。
導(dǎo)出字符串常量池中的內(nèi)容
可以通過HotSpot SA(Serviceability Agent)來實(shí)現(xiàn)杭朱。HotSpot SA是一套用來調(diào)試HotSpot虛擬機(jī)的內(nèi)部代碼,我們常用的jstack吹散、jmap等調(diào)試工具都離不開它弧械。
直接上代碼:
import sun.jvm.hotspot.memory.SystemDictionary;
import sun.jvm.hotspot.oops.InstanceKlass;
import sun.jvm.hotspot.oops.OopField;
import sun.jvm.hotspot.runtime.VM;
import sun.jvm.hotspot.tools.Tool;
public class StringPoolDumpTool extends Tool {
@Override
public void run() {
// Use Reflection-like API to reference String class and String.value field
SystemDictionary dict = VM.getVM().getSystemDictionary();
InstanceKlass stringKlass = (InstanceKlass)dict.find("java/lang/String", null, null);
OopField valueField = (OopField)stringKlass.findField("value", "[C");
// Counters
long[] stats = new long[2];
// Iterate through the String Pool printing out each String object
VM.getVM().getStringTable().stringsDo(s -> {
s.printValueOn(System.out);
System.out.println();
stats[0]++;
stats[1] += s.getObjectSize() + valueField.getValue(s).getObjectSize();
});
System.out.printf("%d strings in pool with total size %d\n", stats[0], stats[1]);
}
public static void main(String[] args) {
// Use default SA tool launcher
new StringPoolDumpTool().execute(args);
}
}
然后執(zhí)行:java -cp $JAVA_HOME/lib/sa-jdi.jar:. StringPoolDumpTool [PID]
,PID是要導(dǎo)出字符串常量池的JVM進(jìn)程ID空民。從執(zhí)行結(jié)果也可以看出刃唐,即使非常簡單的程序中也存在大量的駐留字符串(如上面的StringPoolExample程序羞迷,也有至少700個(gè)),其中也包含像"java"這樣的字符串画饥。
The End
JVM內(nèi)的東西千變?nèi)f化衔瓮,本文幾乎可以肯定有疏漏,歡迎批評指正抖甘,不勝感激~