Linux內(nèi)核學(xué)習(xí)009——進(jìn)程管理(五)
進(jìn)程創(chuàng)建
許多操作系統(tǒng)提供了產(chǎn)生(spawn)進(jìn)程的機(jī)制:首先在新的地址康健創(chuàng)建進(jìn)程导匣,讀入可執(zhí)行文件,然后開始執(zhí)行晕讲。Linux的進(jìn)程創(chuàng)建有些不同,它將上述步驟分解到兩個(gè)單獨(dú)的函數(shù)中去執(zhí)行:fork()和exec函數(shù)族。過程如下:
- 首先通過fork()函數(shù)拷貝當(dāng)前進(jìn)程創(chuàng)建一個(gè)子進(jìn)程帆赢,子進(jìn)程與父進(jìn)程的區(qū)別在于PID、PPID(父進(jìn)程PID)和某些資源和統(tǒng)計(jì)量线梗。
- 接著椰于,使用exec函數(shù)族中的一個(gè)函數(shù)讀取并載入可執(zhí)行文件到地址空間開始執(zhí)行。
注:exec函數(shù)族指的是exec系列函數(shù)仪搔,其定義在unistd.h頭文件中瘾婿,包括:execve()、fexecve()烤咧、execv()偏陪、execle()、execl()髓削、execvp()竹挡、execlp()、execvpe()立膛。想要查看每個(gè)函數(shù)的區(qū)別和詳細(xì)用法揪罕,可以查詢man手冊。
/* Replace the current process, executing PATH with arguments ARGV and
environment ENVP. ARGV and ENVP are terminated by NULL pointers. */
extern int execve (__const char *__path, char *__const __argv[],
char *__const __envp[]) __THROW __nonnull ((1, 2));
#ifdef __USE_XOPEN2K8
/* Execute the file FD refers to, overlaying the running program image.
ARGV and ENVP are passed to the new program, as for `execve'. */
extern int fexecve (int __fd, char *__const __argv[], char *__const __envp[])
__THROW __nonnull ((2));
#endif
/* Execute PATH with arguments ARGV and environment from `environ'. */
extern int execv (__const char *__path, char *__const __argv[])
__THROW __nonnull ((1, 2));
/* Execute PATH with all arguments after PATH until a NULL pointer,
and the argument after that for environment. */
extern int execle (__const char *__path, __const char *__arg, ...)
__THROW __nonnull ((1, 2));
/* Execute PATH with all arguments after PATH until
a NULL pointer and environment from `environ'. */
extern int execl (__const char *__path, __const char *__arg, ...)
__THROW __nonnull ((1, 2));
/* Execute FILE, searching in the `PATH' environment variable if it contains
no slashes, with arguments ARGV and environment from `environ'. */
extern int execvp (__const char *__file, char *__const __argv[])
__THROW __nonnull ((1, 2));
/* Execute FILE, searching in the `PATH' environment variable if
it contains no slashes, with all arguments after FILE until a
NULL pointer and environment from `environ'. */
extern int execlp (__const char *__file, __const char *__arg, ...)
__THROW __nonnull ((1, 2));
#ifdef __USE_GNU
/* Execute FILE, searching in the `PATH' environment variable if it contains
no slashes, with arguments ARGV and environment from `environ'. */
extern int execvpe (__const char *__file, char *__const __argv[],
char *__const __envp[])
__THROW __nonnull ((1, 2));
#endif
寫時(shí)復(fù)制
傳統(tǒng)的fork()系統(tǒng)調(diào)用直接把所有的資源復(fù)制給新創(chuàng)建的進(jìn)程宝泵,這種實(shí)現(xiàn)過于簡單且效率低下(比如:新創(chuàng)建的進(jìn)程立即執(zhí)行一個(gè)新的可執(zhí)行文件好啰,那么所有的拷貝都將前功盡棄)。因此Linux的fork()采用了寫時(shí)復(fù)制實(shí)現(xiàn)儿奶。寫時(shí)復(fù)制并不復(fù)制整個(gè)進(jìn)程的地址空間框往,而是讓父進(jìn)程和子進(jìn)程共享同一份數(shù)據(jù),只有到需要寫入時(shí)才會(huì)復(fù)制數(shù)據(jù)闯捎。這樣當(dāng)數(shù)據(jù)不會(huì)寫入的時(shí)候椰弊,就節(jié)省了復(fù)制的開銷。因此fork()的實(shí)際開銷就是復(fù)制父進(jìn)程的頁表以及給子進(jìn)程創(chuàng)建唯一的進(jìn)程描述符了瓤鼻。
fork()
Linuxt通過clone()系統(tǒng)調(diào)用實(shí)現(xiàn)了fork()函數(shù)(fork等這些庫函數(shù)都是由glibc提供的)秉版,該調(diào)用通過一系列的參數(shù)標(biāo)志來知名父子進(jìn)程需要共享的資源。fork()茬祷、vfork()清焕、__clone()庫函數(shù)都根據(jù)自身所需要的參數(shù)標(biāo)志調(diào)用clone(),然后由clone()調(diào)用do_fork()函數(shù)。
#define _GNU_SOURCE
#include <sched.h>
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
do_fork完成了創(chuàng)建中的大部分工作秸妥,它的定義在Linux2.6.34/kernel/fork.c#L1351滚停,代碼如下:
/*
* Ok, this is the main fork-routine.
*
* It copies the process, and if successful kick-starts
* it and waits for it to finish using the VM if required.
*/
long do_fork(unsigned long clone_flags,
unsigned long stack_start,
struct pt_regs *regs,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr)
{
struct task_struct *p;
int trace = 0;
long nr;
/*
* Do some preliminary argument and permissions checking before we
* actually start allocating stuff
*/
if (clone_flags & CLONE_NEWUSER) {
if (clone_flags & CLONE_THREAD)
return -EINVAL;
/* hopefully this check will go away when userns support is
* complete
*/
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
!capable(CAP_SETGID))
return -EPERM;
}
/*
* We hope to recycle these flags after 2.6.26
*/
if (unlikely(clone_flags & CLONE_STOPPED)) {
static int __read_mostly count = 100;
if (count > 0 && printk_ratelimit()) {
char comm[TASK_COMM_LEN];
count--;
printk(KERN_INFO "fork(): process `%s' used deprecated "
"clone flags 0x%lx\n",
get_task_comm(comm, current),
clone_flags & CLONE_STOPPED);
}
}
/*
* When called from kernel_thread, don't do user tracing stuff.
*/
if (likely(user_mode(regs)))
trace = tracehook_prepare_clone(clone_flags);
p = copy_process(clone_flags, stack_start, regs, stack_size,
child_tidptr, NULL, trace);
/*
* Do this prior waking up the new thread - the thread pointer
* might get invalid after that point, if the thread exits quickly.
*/
if (!IS_ERR(p)) {
struct completion vfork;
trace_sched_process_fork(current, p);
nr = task_pid_vnr(p);
if (clone_flags & CLONE_PARENT_SETTID)
put_user(nr, parent_tidptr);
if (clone_flags & CLONE_VFORK) {
p->vfork_done = &vfork;
init_completion(&vfork);
}
audit_finish_fork(p);
tracehook_report_clone(regs, clone_flags, nr, p);
/*
* We set PF_STARTING at creation in case tracing wants to
* use this to distinguish a fully live task from one that
* hasn't gotten to tracehook_report_clone() yet. Now we
* clear it and set the child going.
*/
p->flags &= ~PF_STARTING;
if (unlikely(clone_flags & CLONE_STOPPED)) {
/*
* We'll start up with an immediate SIGSTOP.
*/
sigaddset(&p->pending.signal, SIGSTOP);
set_tsk_thread_flag(p, TIF_SIGPENDING);
__set_task_state(p, TASK_STOPPED);
} else {
wake_up_new_task(p, clone_flags);
}
tracehook_report_clone_complete(trace, regs,
clone_flags, nr, p);
if (clone_flags & CLONE_VFORK) {
freezer_do_not_count();
wait_for_completion(&vfork);
freezer_count();
tracehook_report_vfork_done(p, nr);
}
} else {
nr = PTR_ERR(p);
}
return nr;
}
do_fork()函數(shù)會(huì)調(diào)用copy_process()函數(shù),然后再讓進(jìn)程開始運(yùn)行粥惧。該函數(shù)定義在linux2.6.34/kernel/fork.c#L954.
copy_process()函數(shù)完成了以下工作:
- 調(diào)用dup_task_struct()為新進(jìn)程創(chuàng)建一個(gè)內(nèi)核棧键畴、thread_info結(jié)構(gòu)體和task_struct結(jié)構(gòu)體,這些值與當(dāng)前進(jìn)程相同影晓,也即父進(jìn)程和子進(jìn)程的進(jìn)程描述符相同镰吵。
- 檢查并確保新創(chuàng)建這個(gè)子進(jìn)程后,當(dāng)前用戶所擁有的進(jìn)程數(shù)未超出限制挂签。
- 修改進(jìn)程描述符疤祭,主要是統(tǒng)計(jì)信息,但是大多數(shù)成員依然未變饵婆。
- 子進(jìn)程的狀態(tài)被設(shè)置為TASK_UNINTERRUPTIBLE勺馆,以保證其不會(huì)運(yùn)營。
- copy_process()調(diào)用copy_flags()已更新task_struct的flags成員侨核,表明進(jìn)程是否擁有超級(jí)用戶權(quán)限的PF_SUPERPRIV標(biāo)志被清零草穆,表明進(jìn)程還未調(diào)用exec函數(shù)族的PF_FORKNOEXEC標(biāo)志被設(shè)置。
- 調(diào)用alloc_pid()為新進(jìn)程分配一個(gè)有效的PID搓译。
- 根據(jù)傳遞給clone()的參數(shù)標(biāo)志悲柱,copy_process()拷貝或共享打開的文件、文件系統(tǒng)信息些己、信號(hào)處理函數(shù)豌鸡、進(jìn)程地址空間和命名空間等。
- 最后段标,copy_process()做掃尾工作并返回一個(gè)指向子進(jìn)程的指針涯冠。
再回到do_fork()函數(shù),若copy_process()函數(shù)成功返回逼庞,則新創(chuàng)建的子進(jìn)程被喚醒并投入運(yùn)行蛇更。