前言
很久沒有寫過與JVM相關(guān)的文章了箫锤。今天搬磚有點(diǎn)累蔓榄,不太想啃源碼毅整,寫一篇實(shí)用性比較強(qiáng)的吧嗦枢。
在日常工作中灰署,我們?nèi)绻龅絁VM方面的問題泉坐,一般是采用各種現(xiàn)成的工具輔助定位解決为鳄,如VisualVM、JProfiler腕让、Eclipse MAT孤钦、Arthas等。但是纯丸,我們也有必要了解JVM原生提供的那些命令行工具偏形,本文就針對(duì)常用的jstack命令做個(gè)簡單的總結(jié),順便聊一些與Java線程與并發(fā)相關(guān)的周邊知識(shí)觉鼻。今后有時(shí)間的話俊扭,也會(huì)將jmap、jhat坠陈、jstat等命令的用法總結(jié)出來萨惑。
jstack簡介
jstack命令用來生成JVM中的線程快照(thread dump),其中包含有每個(gè)線程的方法調(diào)用棧以及其狀態(tài)仇矾、鎖信息等庸蔼。其用法說明如下所示。
~ jstack -h
Usage:
jstack [-l] <pid>
(to connect to running process)
jstack -F [-m] [-l] <pid>
(to connect to a hung process)
jstack [-m] [-l] <executable> <core>
(to connect to a core file)
jstack [-m] [-l] [server_id@]<remote server IP or hostname>
(to connect to a remote debug server)
Options:
-F to force a thread dump. Use when jstack <pid> does not respond (process is hung)
-m to print both java and native frames (mixed mode)
-l long listing. Prints additional information about locks
-h or -help to print this help message
說明一下三個(gè)參數(shù)的含義:
- -F:如果正常執(zhí)行jstack命令沒有響應(yīng)(比如進(jìn)程hung住了)贮匕,可以加上此參數(shù)強(qiáng)制執(zhí)行thread dump姐仅。
- -m:除了打印Java的方法調(diào)用棧之外,還會(huì)輸出native方法的棧幀刻盐。
- -l:打印與鎖有關(guān)的附加信息掏膏。使用此參數(shù)會(huì)導(dǎo)致JVM停止時(shí)間變長敦锌,在生產(chǎn)環(huán)境需慎用。
jstack是在線程級(jí)別定位JVM問題的利器行冰,但前提是得讀懂thread dump,我們舉例說明伶丐。
線程快照
下面是對(duì)一個(gè)正常運(yùn)行的Spark Streaming作業(yè)執(zhí)行jstack命令產(chǎn)生的線程快照的一部分。
~ jstack 18747
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode):
"SparkUI-1510250" #1510250 daemon prio=5 os_prio=0 tid=0x00007f9a6c00c800 nid=0x45cc waiting on condition [0x00007f9ce86e7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c0420db8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.spark_project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
at org.spark_project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)
at org.spark_project.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)
at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
at java.lang.Thread.run(Thread.java:748)
"shuffle-server-6-7" #190 daemon prio=5 os_prio=0 tid=0x00007f9b44009000 nid=0x4d80 runnable [0x00007f9ce99f8000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000c58e6498> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c59c1528> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c59c1450> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
"SparkUI-84-acceptor-3@3b331d23-ServerConnector@9826a7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #84 daemon prio=3 os_prio=0 tid=0x00007f9d7decc800 nid=0x4500 waiting for monitor entry [0x00007f9d112c8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)
- waiting to lock <0x00000000c045e868> (a java.lang.Object)
at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:373)
at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #25 daemon prio=5 os_prio=0 tid=0x00007f9d7d8da000 nid=0x44c2 in Object.wait() [0x00007f9d19dc0000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000000c0031c98> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3213)
at java.lang.Thread.run(Thread.java:748)
"RecurringTimer - JobGenerator" #120 daemon prio=5 os_prio=0 tid=0x00007f9b04045000 nid=0x4cc3 waiting on condition [0x00007f9d10cd4000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.spark.util.SystemClock.waitTillTime(Clock.scala:63)
at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:93)
at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)
在每個(gè)線程的快照的第一行,包含有線程名漓雅、是否為守護(hù)線程录别、優(yōu)先級(jí)朽色、線程ID等信息组题,第二行則是線程狀態(tài),下面就是方法調(diào)用棧了崔列。下圖是Java線程狀態(tài)轉(zhuǎn)換的示意,老生常談盈咳。
jstack線程快照中的狀態(tài)與圖示相同边翼,只是沒有NEW狀態(tài)而已。我們逐一進(jìn)行分析丈积,在分析之前债鸡,先放出Java管程對(duì)象ObjectMonitor的簡圖∧锼看官也可以通過我之前寫的這篇文章來了解管程。
RUNNABLE
線程正在運(yùn)行间雀。如果在其調(diào)用棧中看到locked <地址>
的提示惹挟,表示該方法正持有鎖缝驳,即該線程位于Owner區(qū)內(nèi)。
BLOCKED
線程處于阻塞狀態(tài)用狱,即正在等待鎖被其他線程釋放。在其調(diào)用棧的棧頂方法會(huì)看到waiting to lock <地址>
的提示摇展,表示該方法試圖持有鎖溺忧,線程正在Entry Set區(qū)等待盯孙。
WAITING
線程處于無限等待的狀態(tài)祟滴。又分為兩種情況:
- on object monitor:線程已經(jīng)獲得鎖,調(diào)用了不帶超時(shí)參數(shù)的Object.wait()/Thread.join()方法骑晶,線程進(jìn)入管程的Wait Set區(qū)埠偿。在其調(diào)用棧中會(huì)看到
locked <地址>
的提示。 - parking:調(diào)用了LockSupport.park()方法羽圃,線程直接進(jìn)入掛起狀態(tài)(park是Unsafe提供的低級(jí)原語)抖剿。在其調(diào)用棧的棧頂方法會(huì)看到
parking to wait for <地址>
的提示。
TIMED_WAITING
線程處于有限等待的狀態(tài)脑融。它分為三種情況缩宜,除了與WAITING相同的on object monitor(獲得鎖并調(diào)用帶超時(shí)的Object.wait()/Thread.join()方法)和parking(調(diào)用帶超時(shí)的LockSupport.parkNanos()/parkUntil()方法)之外,還有一種sleep锻煌,即通過Thread.sleep()使線程睡眠。
通過分析線程快照的狀態(tài)和調(diào)用棧匣沼,可以讓我們快速地定位造成Java程序表現(xiàn)異常的癥結(jié)捂龄,如死鎖、熱鎖(很多線程競(jìng)爭同一塊臨界區(qū)造成大量BLOCKED)唇撬、高CPU占用展融、I/O長時(shí)間阻塞(注意此時(shí)線程狀態(tài)可能是RUNNABLE)等。下面舉兩個(gè)具體的例子耀态。
用jstack診斷死鎖
死鎖(deadlock)是操作系統(tǒng)理論中的基礎(chǔ)概念暂雹,即在并發(fā)環(huán)境下,一個(gè)或多個(gè)線程在等待資源仙逻,但該資源又被其他進(jìn)程所占用的困局涧尿。死鎖的四個(gè)必要條件是:
- 互斥(mutual exclusion)
- 不可搶占(no preemption)
- 持有并等待(hold and wait)
- 循環(huán)等待(circular wait)
下面我們用Java造一個(gè)死鎖,并用jstack來診斷它缺亮。
public class DeadlockExample {
private static final Object lock1 = new Object();
private static final Object lock2 = new Object();
public static void main(String[] args) throws Exception {
new Thread(() -> {
for (int i = 0; i < 100; i++) {
synchronized (lock1) {
System.out.println("thread1 synchronized lock1");
synchronized (lock2) {
System.out.println("thread1 synchronized lock2");
}
}
}
}, "thread1").start();
new Thread(() -> {
for (int i = 0; i < 100; i++) {
synchronized (lock2) {
System.out.println("thread2 synchronized lock2");
synchronized (lock1) {
System.out.println("thread2 synchronized lock1");
}
}
}
}, "thread2").start();
}
}
運(yùn)行之桥言,發(fā)現(xiàn)只輸出了幾句就停止了号阿。
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread2 synchronized lock2
用jstack打印線程快照,節(jié)選部分如下扔涧。
"thread2" #20 prio=5 os_prio=31 tid=0x00007fad74020800 nid=0x6203 waiting for monitor entry [0x0000700006364000]
java.lang.Thread.State: BLOCKED (on object monitor)
at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
- waiting to lock <0x00000007157d2a58> (a java.lang.Object)
- locked <0x00000007157d2a68> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"thread1" #19 prio=5 os_prio=31 tid=0x00007fad7401c000 nid=0x9903 waiting for monitor entry [0x0000700006261000]
java.lang.Thread.State: BLOCKED (on object monitor)
at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
- waiting to lock <0x00000007157d2a68> (a java.lang.Object)
- locked <0x00000007157d2a58> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Found one Java-level deadlock:
=============================
"thread2":
waiting to lock monitor 0x00007fad65004168 (object 0x00000007157d2a58, a java.lang.Object),
which is held by "thread1"
"thread1":
waiting to lock monitor 0x00007fad650056b8 (object 0x00000007157d2a68, a java.lang.Object),
which is held by "thread2"
Java stack information for the threads listed above:
===================================================
"thread2":
at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
- waiting to lock <0x00000007157d2a58> (a java.lang.Object)
- locked <0x00000007157d2a68> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"thread1":
at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
- waiting to lock <0x00000007157d2a68> (a java.lang.Object)
- locked <0x00000007157d2a58> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Found 1 deadlock.
可見弯汰,我們不僅能夠發(fā)現(xiàn)兩個(gè)線程都處于BLOCKED狀態(tài)卤档,并且jstack還直接給出了死鎖的詳細(xì)信息,方便我們修改代碼消除死鎖汤踏。
用jstack診斷高CPU占用
接下來造一個(gè)有死循環(huán)的程序舔腾,模擬異常的CPU占用。
public class InfiniteLoopExample {
private static final Object lock = new Object();
static class InfiniteLoopRunnable implements Runnable {
@Override
public void run() {
synchronized (lock) {
long l = 0;
while (true) {
l++;
}
}
}
}
public static void main(String[] args) throws Exception {
new Thread(new InfiniteLoopRunnable(), "thread1").start();
new Thread(new InfiniteLoopRunnable(), "thread2").start();
}
}
運(yùn)行該程序哗脖,用jps命令找出該Java進(jìn)程的PID,然后利用top -Hp <PID>
命令找出吃CPU最多的那個(gè)線程橱夭。
使用jstack導(dǎo)出線程快照到文件中桑逝。由于線程ID是十六進(jìn)制表示的,所以我們要將線程ID轉(zhuǎn)換成十六進(jìn)制再grep茬暇。
這下我們就可以定位到異常代碼的位置寡喝,并進(jìn)行修改了。
The End
晚安晚安巧骚。