本文探究一下僵尸進(jìn)程的產(chǎn)生觉渴,首先會(huì)介紹一下進(jìn)程id相關(guān)的概念爆办,再介紹一下進(jìn)程退出的流程争群,最后介紹一下父進(jìn)程wait的流程峰弹。
進(jìn)程關(guān)系
這里首先需要明確的一個(gè)概念店量,就是在linux里面,線程和進(jìn)程到底是如何區(qū)分的呢鞠呈?
線程和進(jìn)程是操作系統(tǒng)理論中的概念融师,在windows和linux中的實(shí)現(xiàn)可能不同,對(duì)應(yīng)到linux內(nèi)核中蚁吝,進(jìn)程和線程都是用task_struct來表示的旱爆,所以在數(shù)據(jù)結(jié)構(gòu)上linux內(nèi)核并沒有區(qū)分進(jìn)程和線程。
進(jìn)程id窘茁,容易讓人迷惑怀伦,比如 TID,TGID庙曙,PID空镜,PPID,PGID捌朴,SID吴攒。下面的例子列出了一些進(jìn)程的這些id,如用戶態(tài)1號(hào)進(jìn)程systemd砂蔽,內(nèi)核態(tài)進(jìn)程總管kthreadd洼怔,軟中斷進(jìn)程ksoftirqd,還有一些用戶態(tài)程序containerd左驾,docker镣隶,還有三個(gè)僵尸進(jìn)程app-test,通過實(shí)例去觀察能更好的去理解這些id的關(guān)系诡右。
如果進(jìn)程中沒有其他線程安岂,則TID與PID是相同的。
如果是多線程的go程序帆吻,如containerd域那,對(duì)應(yīng)下圖中的TID為493,674猜煮,675次员,676。他們的PID都是493王带,有一個(gè)的TID淑蔚、PID、TGID都是493愕撰,這個(gè)線程可以理解為主線程刹衫,也可以說containerd是一個(gè)有4個(gè)線程的進(jìn)程慰枕,但是在內(nèi)核中實(shí)實(shí)在在的對(duì)應(yīng)了4個(gè)不同的task_struct結(jié)構(gòu)兵扬。
root@iZt4n1u8u50jg1r5n6myn2Z:~# ps -eLo comm:20,tid,pid,tgid,ppid,pgid,sid | column -t
COMMAND TID PID TGID PPID PGID SID
systemd 1 1 1 0 1 1
kthreadd 2 2 2 0 0 0
ksoftirqd/0 10 10 10 2 0 0
migration/1 17 17 17 2 0 0
kcompactd0 28 28 28 2 0 0
containerd 493 493 493 1 493 493
containerd 674 493 493 1 493 493
containerd 675 493 493 1 493 493
containerd 676 493 493 1 493 493
containerd-shim 31824 31824 31824 1 31824 493
containerd-shim 31825 31824 31824 1 31824 493
containerd-shim 31826 31824 31824 1 31824 493
containerd-shim 31827 31824 31824 1 31824 493
containerd-shim 31828 31824 31824 1 31824 493
containerd-shim 65521 65521 65521 1 65521 493
containerd-shim 65522 65521 65521 1 65521 493
containerd-shim 65523 65521 65521 1 65521 493
dockerd 27296 27296 27296 1 27296 27296
dockerd 27297 27296 27296 1 27296 27296
dockerd 27298 27296 27296 1 27296 27296
dockerd 27299 27296 27296 1 27296 27296
docker-proxy 28170 28170 28170 27296 27296 27296
docker-proxy 28171 28170 28170 27296 27296 27296
docker-proxy 28172 28170 28170 27296 27296 27296
docker-proxy 28173 28170 28170 27296 27296 27296
docker-proxy 28174 28170 28170 27296 27296 27296
app-test 65543 65543 65543 65521 65543 65543
app-test <defunct> 65582 65582 65582 65543 65543 65543
app-test <defunct> 65583 65583 65583 65543 65543 65543
app-test <defunct> 65584 65584 65584 65543 65543 65543
https://man7.org/linux/man-pages/man7/credentials.7.html 這個(gè)鏈接介紹了PGID和SID的作用肛循。
觀測(cè)僵尸進(jìn)程
其中的state是Z(zombie)表明這是一個(gè)僵尸進(jìn)程走净,Threads是1柿究,Pid與Tgid相同都可以表明這是一個(gè)單線程的進(jìn)程邮旷。
root@iZt4n1u8u50jg1r5n6myn2Z:/proc# cat /proc/65582/status
Name: app-test
State: Z (zombie)
Tgid: 65582
Ngid: 0
Pid: 65582
PPid: 65543
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 0
Groups:
Threads: 1
接下來從內(nèi)核源碼角度分析下status中State字段的由來,也借此記錄下proc/pid下面的文件夾婶肩,ONE("status")就表示/proc/$pid下面的文件。
// fs/proc/base.c
static const struct pid_entry tgid_base_stuff[] = {
DIR("task", S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations),
DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations),
DIR("map_files", S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations),
DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations),
DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations),
#ifdef CONFIG_NET
DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations),
#endif
REG("environ", S_IRUSR, proc_environ_operations),
REG("auxv", S_IRUSR, proc_auxv_operations),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUSR, proc_pid_personality),
ONE("limits", S_IRUGO, proc_pid_limits),
....
}
status字段對(duì)應(yīng)的函數(shù)是proc_pid_status貌夕。根據(jù)函數(shù)名大概就知道函數(shù)的作用律歼,如task_state就是要對(duì)state字段進(jìn)行賦值。/proc/$Pid下的文件描述了進(jìn)程的詳細(xì)信息啡专,如果網(wǎng)上的資料不足以讓你理解某些字段的含義险毁,那么就需要閱讀源碼去探究一下這些字段的含義了。
int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
struct mm_struct *mm = get_task_mm(task);
seq_puts(m, "Name:\t");
proc_task_name(m, task, true);
seq_putc(m, '\n');
task_state(m, ns, pid, task);
if (mm) {
task_mem(m, mm);
task_core_dumping(m, mm);
mmput(mm);
}
task_sig(m, task);
task_cap(m, task);
task_seccomp(m, task);
task_cpus_allowed(m, task);
cpuset_task_status_allowed(m, task);
task_context_switch_counts(m, task);
return 0;
}
這里只關(guān)注State字段们童,即task_state函數(shù)
static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *p)
{
....
seq_puts(m, "State:\t");
seq_puts(m, get_task_state(p));
....
}
從上訴代碼可以看出畔况,get_task_state的結(jié)果就是/proc/$pid/status中State字段的值。
static const char * const task_state_array[] = {
/* states in TASK_REPORT: */
"R (running)", /* 0x00 */
"S (sleeping)", /* 0x01 */
"D (disk sleep)", /* 0x02 */
"T (stopped)", /* 0x04 */
"t (tracing stop)", /* 0x08 */
"X (dead)", /* 0x10 */
"Z (zombie)", /* 0x20 */
"P (parked)", /* 0x40 */
/* states beyond TASK_REPORT: */
"I (idle)", /* 0x80 */
};
static inline const char *get_task_state(struct task_struct *tsk)
{
return task_state_array[task_state_index(tsk)];
}
只要task_state_index的返回的index是6慧库,對(duì)應(yīng)的字符串就是Z(zobmie)跷跪,繼續(xù)分析task_state_index函數(shù)。代碼中state的值就是tsk->state與tsk->exit_state進(jìn)行位或之后在與TASK_REPORT進(jìn)行位與齐板。
這里直接說下結(jié)果吵瞻,后面會(huì)有說明,其中tsk_state為TASK_DEAD(0x0080)甘磨,tsk->exit_state為EXIT_ZOMBIE(0x0020)橡羞,經(jīng)過fls函數(shù)之后,就是6济舆。
#define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \
TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \
__TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
TASK_PARKED)
static inline unsigned int task_state_index(struct task_struct *tsk)
{
unsigned int tsk_state = READ_ONCE(tsk->state);
unsigned int state = (tsk_state | tsk->exit_state) & TASK_REPORT;
return fls(state);
}
進(jìn)程的退出
有兩個(gè)系統(tǒng)調(diào)用與進(jìn)程主動(dòng)退出有關(guān)卿泽,一個(gè)是exit,一個(gè)是exit_group吗冤。
/*
* this kills every thread in the thread group. Note that any externally
* wait4()-ing process will get the correct exit code - even if this
* thread is not the thread group leader.
*/
SYSCALL_DEFINE1(exit_group, int, error_code)
{
do_group_exit((error_code & 0xff) << 8);
/* NOTREACHED */
return 0;
}
SYSCALL_DEFINE1(exit, int, error_code)
{
do_exit((error_code&0xff)<<8);
}
exit_group和exit都會(huì)調(diào)用do_exit又厉,接下來重點(diǎn)分析do_exit函數(shù),do_exit函數(shù)的參數(shù)是退出碼椎瘟。
# 省略中間的代碼
void __noreturn do_exit(long code)
{
struct task_struct *tsk = current;
exit_signals(tsk); /* sets PF_EXITING */
tsk->exit_code = code;
exit_mm();
exit_sem(tsk);
exit_shm(tsk);
exit_files(tsk);
exit_fs(tsk);
exit_notify(tsk, group_dead);
do_task_dead();
}
其中exit_notify 中會(huì)將進(jìn)程退出狀態(tài)設(shè)置為EXIT_ZOMBIE覆致,do_notify_parent函數(shù)會(huì)返回false,所以autoreap值會(huì)為false肺蔚。
從這里也可以看出煌妈,父進(jìn)程如果做一些特別的設(shè)置,即使父進(jìn)程不調(diào)用wait,子進(jìn)程也不會(huì)成為僵尸進(jìn)程
static void exit_notify(struct task_struct *tsk, int group_dead)
{
// 如果有子進(jìn)程璧诵,會(huì)給子進(jìn)程找新的父進(jìn)程汰蜘。
forget_original_parent(tsk, &dead);
if (unlikely(tsk->ptrace)) {
int sig = thread_group_leader(tsk) &&
thread_group_empty(tsk) &&
!ptrace_reparented(tsk) ?
tsk->exit_signal : SIGCHLD;
autoreap = do_notify_parent(tsk, sig);
} else if (thread_group_leader(tsk)) {
autoreap = thread_group_empty(tsk) &&
do_notify_parent(tsk, tsk->exit_signal);
} else {
autoreap = true;
}
tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
}
bool do_notify_parent(struct task_struct *tsk, int sig)
{
if (!tsk->ptrace && sig == SIGCHLD &&
(psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
(psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
/*
* We are exiting and our parent doesn't care. POSIX.1
* defines special semantics for setting SIGCHLD to SIG_IGN
* or setting the SA_NOCLDWAIT flag: we should be reaped
* automatically and not left for our parent's wait4 call.
* Rather than having the parent do it as a magic kind of
* signal handler, we just set this to tell do_exit that we
* can be cleaned up without becoming a zombie. Note that
* we still call __wake_up_parent in this case, because a
* blocked sys_wait4 might now return -ECHILD.
*
* Whether we send SIGCHLD or not for SA_NOCLDWAIT
* is implementation-defined: we do (if you don't want
* it, just use SIG_IGN instead).
*/
autoreap = true;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
sig = 0;
}
if (valid_signal(sig) && sig)
__group_send_sig_info(sig, &info, tsk->parent);
__wake_up_parent(tsk, tsk->parent);
}
在do_task_dead中會(huì)將tsk的state字段賦值為TASK_DEAD,這樣一來之宿,tsk的state字段和exit_state都已經(jīng)賦值了族操,正好與上面分析的一致,所以/proc/$pid/status中State字段會(huì)為Z(zombie)比被。
void __noreturn do_task_dead(void)
{
/* Causes final put_task_struct in finish_task_switch(): */
set_special_state(TASK_DEAD);
}
父進(jìn)程調(diào)用wait回收子進(jìn)程
程序退出調(diào)用do_exit色难,變成僵尸進(jìn)程后,在內(nèi)核中只留了一個(gè)task_struct結(jié)構(gòu)體還沒有回收等缀。
從邏輯上講枷莉,一般的程序的父子進(jìn)程并不是孤立的,而是有一定的關(guān)系的尺迂,父進(jìn)程需要獲得子進(jìn)程的退出狀態(tài)笤妙,才可以根據(jù)不同的退出狀態(tài)做出不同的響應(yīng),是選擇忽略還是新啟一個(gè)子進(jìn)程呢噪裕?接下來分析wait系統(tǒng)調(diào)用蹲盘。
SYSCALL_DEFINE4(wait4, pid_t, upid, int __user *, stat_addr,
int, options, struct rusage __user *, ru)
{
struct rusage r;
long err = kernel_wait4(upid, stat_addr, options, ru ? &r : NULL);
}
long kernel_wait4(pid_t upid, int __user *stat_addr, int options,
struct rusage *ru)
{
ret = do_wait(&wo);
}
static long do_wait(struct wait_opts *wo)
{
retval = do_wait_thread(wo, tsk);
}
static int wait_consider_task(struct wait_opts *wo, int ptrace,
struct task_struct *p)
{
if (unlikely(exit_state == EXIT_DEAD))
return 0;
if (exit_state == EXIT_ZOMBIE) {
/* we don't reap group leaders with subthreads */
if (!delay_group_leader(p)) {
/*
* A zombie ptracee is only visible to its ptracer.
* Notification and reaping will be cascaded to the
* real parent when the ptracer detaches.
*/
if (unlikely(ptrace) || likely(!p->ptrace))
return wait_task_zombie(wo, p);
}
}
}
在函數(shù)wait_task_zombie中,release_task會(huì)回收task_struct州疾,將task_struct做一下清理后放回到slub中待用辜限。
static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
{
state = (ptrace_reparented(p) && thread_group_leader(p)) ?
EXIT_TRACE : EXIT_DEAD;
if (state == EXIT_DEAD)
release_task(p);
}
示例制造僵尸進(jìn)程
以下示例代碼是制造僵尸進(jìn)程的一個(gè)簡(jiǎn)單實(shí)現(xiàn),一句話概括就是父進(jìn)程不調(diào)用wait等待子進(jìn)程的退出严蓖。
C語言版本
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/unistd.h>
int main(int argc, char *argv[])
{
pid_t pid = fork();
if (pid == 0) {
exit(EXIT_SUCCESS);
} else if (pid > 0) {
printf("Parent created child %d\n", i);
}
sleep(30);
return EXIT_SUCCESS;
}
等效的Go語言版本
package main
import (
"time"
"os"
"syscall"
)
func main() {
id, _, _ := syscall.Syscall(syscall.SYS_FORK, 0, 0, 0)
if id == 0 {
os.Exit(0)
} else {
}
time.Sleep(60* time.Second)
}