大家使用jstack的時候偶爾會遇到這樣的異常Unable to open socket file……
下面我們依據(jù)openjdk11的code進行分析父能。
從錯誤入手
Unable to open socket file這個錯誤是jstack本身報的,算是一個客戶端行為谣拣。什么情況下會出這個錯誤呢。我們直接用錯誤關(guān)鍵字進行搜索膘掰。
File socket_file = findSocketFile(pid, ns_pid);
socket_path = socket_file.getPath();
if (!socket_file.exists()) {
File f = createAttachFile(pid, ns_pid);
try {
sendQuitTo(pid);
// give the target VM time to start the attach mechanism
final int delay_step = 100;
final long timeout = attachTimeout();
long time_spend = 0;
long delay = 0;
do {
// Increase timeout on each attempt to reduce polling
delay += delay_step;
try {
Thread.sleep(delay);
} catch (InterruptedException x) { }
time_spend += delay;
if (time_spend > timeout/2 && !socket_file.exists()) {
// Send QUIT again to give target VM the last chance to react
sendQuitTo(pid);
}
} while (time_spend <= timeout && !socket_file.exists());
if (!socket_file.exists()) {
throw new AttachNotSupportedException(
String.format("Unable to open socket file %s: " +
"target process %d doesn't respond within %dms " +
"or HotSpot VM not loaded", socket_path, pid,
time_spend));
}
} finally {
f.delete();
}
}
方法內(nèi)容相對比較簡單出嘹,流程如下:
private File findSocketFile(int pid, int ns_pid) {
// A process may not exist in the same mount namespace as the caller.
// Instead, attach relative to the target root filesystem as exposed by
// procfs regardless of namespaces.
String root = "/proc/" + pid + "/root/" + tmpdir;
return new File(root, ".java_pid" + ns_pid);
}
socket的地址其實就是/tmp/.java_pid${ns_pid}
前面的/proc/pid/root/tmp指向的就是/tmp目錄。
通過上面的流程凭需,我們大概可以猜到流程中的quit的信號量辙谜,就是jvm做出對應(yīng)操作的地方俺榆。
JNIEXPORT void JNICALL Java_sun_tools_attach_VirtualMachineImpl_sendQuitTo
(JNIEnv *env, jclass cls, jint pid)
{
if (kill((pid_t)pid, SIGQUIT)) {
JNU_ThrowIOExceptionWithLastError(env, "kill");
}
}
發(fā)送的信號量就是SIGQUIT。
從信號量入手
#define SIGBREAK SIGQUIT
jvm里有一段宏装哆,就是把SIGQUIT都可以用SIGBREAK代替罐脊。
switch (sig) {
case SIGBREAK: {
if (!DisableAttachMechanism && AttachListener::is_init_trigger()) {
continue;
}
VM_PrintThreads op;
...
當(dāng)收到的信號量是SIGQUIT的時候,先看看DisableAttachMechanism蜕琴。如果設(shè)置了-XX:+DisableAttachMechanism萍桌,那這里就直接跳過處理了。初始化socket的流程在后面的AttachListener::is_init_trigger里凌简。
當(dāng)加了-XX:+DisableAttachMechanism后上炎,jstack關(guān)注的socket文件就無法創(chuàng)建了,會一定報錯雏搂。
bool AttachListener::is_init_trigger() {
if (init_at_startup() || is_initialized()) {
return false; // initialized at startup or already initialized
}
...
if (ret == -1) {
log_trace(attach)("Failed to find attach file: %s, trying alternate", fn);
snprintf(fn, sizeof(fn), "%s/.attach_pid%d",
os::get_temp_directory(), os::current_process_id());
RESTARTABLE(::stat64(fn, &st), ret);
if (ret == -1) {
log_debug(attach)("Failed to find attach file: %s", fn);
}
}
if (ret == 0) {
// simple check to avoid starting the attach mechanism when
// a bogus non-root user creates the file
if (os::Posix::matches_effective_uid_or_root(st.st_uid)) {
init();
log_trace(attach)("Attach triggered by %s", fn);
return true;
} else {
log_debug(attach)("File %s has wrong user id %d (vs %d). Attach is not triggered", fn, st.st_uid, geteuid());
}
}
...
}
is_init_trigger會先檢測attach file是否存在藕施,只有存在的情況下,才會有后面的初始化操作凸郑。
init方法中開始啟動Attach Listener線程裳食。并且最終調(diào)用到AttachListener::pd_init()方法中,然后調(diào)用到 LinuxAttachListener::init() 去初始化socket芙沥。最終通過AttachListener::set_initialized();設(shè)置初始化成功標(biāo)志诲祸。這里很重要,回頭再去看is_init_trigger方法的最開始就是檢測標(biāo)志而昨,如果被設(shè)置為成功就不再執(zhí)行了烦绳。也就是說信號量的操作只能初始化一次,后面就再也不會初始化了配紫。
nt LinuxAttachListener::init() {
char path[UNIX_PATH_MAX]; // socket file
char initial_path[UNIX_PATH_MAX]; // socket file during setup
int listener; // listener socket (file descriptor)
// register function to cleanup
::atexit(listener_cleanup);
int n = snprintf(path, UNIX_PATH_MAX, "%s/.java_pid%d",
os::get_temp_directory(), os::current_process_id());
if (n < (int)UNIX_PATH_MAX) {
n = snprintf(initial_path, UNIX_PATH_MAX, "%s.tmp", path);
}
if (n >= (int)UNIX_PATH_MAX) {
return -1;
}
// create the listener socket
listener = ::socket(PF_UNIX, SOCK_STREAM, 0);
if (listener == -1) {
return -1;
}
...
這里就是socket文件的創(chuàng)建過程。
小結(jié)
看完上面的流程午阵,我們大概可以梳理以下幾種情況,我們是會遇到異常的底桂。
- 開啟了-XX:+DisableAttachMechanism植袍。
- 初始化完以后,刪除了/tmp下的socket文件籽懦。
- 程序的各種問題(資源于个,夯死等)導(dǎo)致無法觸發(fā)jvm代碼運行。