(一)How the Linux Kernel Boots
- The machine’s BIOS or boot firmware loads and runs a boot loader.(Boot Loader 是在操作系統(tǒng)內(nèi)核運(yùn)行之前運(yùn)行的一段小程序南吮,它嚴(yán)重地依賴于硬件而實(shí)現(xiàn))
- The boot loader finds the kernel image on disk, loads it into memory, and starts it. (選擇內(nèi)核鏡像,加載到內(nèi)存空間誊酌,為最終調(diào)用操作系統(tǒng)內(nèi)核準(zhǔn)備好正確的環(huán)境。)
- The kernel initializes the devices and its drivers.(初始化硬件設(shè)備及其驅(qū)動(dòng)程序)
- The kernel mounts the root filesystem.(掛載根目錄露乏。根目錄指文件系統(tǒng)的最上一級(jí)目錄碧浊,它是相對(duì)子目錄來說的;它如同一棵大樹的“根”一般瘟仿,所有的樹杈以它為起點(diǎn))
- The kernel starts a program called init with a process ID of 1. This point is the user space start.(內(nèi)核啟動(dòng)一個(gè)初始化程序箱锐,從這里開始虛擬內(nèi)存開始劃分出使用者空間,與內(nèi)核空間(Kernel space)對(duì)應(yīng))
- init sets the rest of the system processes in motion
- At some point, init starts a process allowing you to log in, usually at the end or near the end of the boot.
Startup Messages
有兩種方式可以查看內(nèi)核引導(dǎo)和運(yùn)行診斷信息:
- 查看內(nèi)核系統(tǒng)日志文件劳较。文件路徑: /var/log/kern.log
- 執(zhí)行dmesg命令
[root@li1437-101 ~]# dmesg
[ 0.000000] Linux version 4.9.7-x86_64-linode80 (maker@build) (gcc version 4.7.2 (Debian 4.7.2-5) ) #2 SMP Thu Feb 2 15:43:55 EST 2017
[ 0.000000] Command line: root=/dev/sda console=tty1 console=ttyS0 ro devtmpfs.mount=1
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] x86/fpu: Using 'eager' FPU context switches.
[ 0.000000] e820: BIOS-provided physical RAM map:
…….
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 0.000000] Hypervisor detected: KVM
[ 0.371925] raid6: sse2x1 gen() 7490 MB/s
[ 0.428689] raid6: sse2x1 xor() 5953 MB/s
[ 0.485463] raid6: sse2x2 gen() 9289 MB/s
[ 0.542230] raid6: sse2x2 xor() 6754 MB/s
[ 0.599013] raid6: sse2x4 gen() 10954 MB/s
[ 0.656189] raid6: sse2x4 xor() 5522 MB/s
[ 0.656943] raid6: using algorithm sse2x4 gen() 10954 MB/s
[ 0.657588] raid6: .... xor() 5522 MB/s, rmw enabled
[ 1.053697] Netfilter messages via NETLINK v0.30.
[ 1.054471] nfnl_acct: registering with nfnetlink.
[ 1.055332] nf_conntrack version 0.5.0 (8192 buckets, 32768 max)
[ 1.056324] ctnetlink v0.93: registering with nfnetlink.
[ 1.057335] nf_tables: (c) 2007-2009 Patrick McHardy <kaber@trash.net>
[ 1.058393] nf_tables_compat: (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org>
[ 1.059599] xt_time: kernel timezone is -0000
[ 1.060296] ip_set: protocol 6
[ 1.060791] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[ 1.061940] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[ 1.063162] IPVS: Creating netns size=2104 id=0
[ 1.064139] IPVS: ipvs loaded.
[ 1.744221] systemd[1]: Detected virtualization kvm.
[ 1.745058] systemd[1]: Detected architecture x86-64.
[ 1.747402] systemd[1]: Set hostname to <localhost.localdomain>.
[ 1.834328] tsc: Refined TSC clocksource calibration: 2800.119 MHz
[ 1.835512] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x285cb16f950, max_idle_ns: 440795333193 ns
[ 1.843476] systemd[1]: Created slice Root Slice.
[ 1.844251] systemd[1]: Starting Root Slice.
[ 1.845835] systemd[1]: Created slice System Slice.
[ 1.846631] systemd[1]: Starting System Slice.
[ 1.848257] systemd[1]: Listening on udev Kernel Socket.
[ 1.849119] systemd[1]: Starting udev Kernel Socket.
[ 2.014715] EXT4-fs (sda): re-mounted. Opts: (null)
[ 2.038202] systemd-journald[2010]: Received request to flush runtime journal from PID 1
[ 2.241341] audit: type=1305 audit(1488188850.897:2): audit_pid=2215 old=0 auid=4294967295 ses=4294967295 res=1
[ 2.287758] Adding 262140k swap on /dev/sdb. Priority:-1 extents:1 across:262140k FS
[ 2.905177] IPVS: Creating netns size=2104 id=1
[ 2.954613] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 2.955987] 8021q: adding VLAN 0 to HW filter on device eth0
[ 8.009765] random: crng init done
在故障排查中驹止,dmesg信息需要首先查看,例如輸出最近10條系統(tǒng)信息,
可以查看到引起性能問題的錯(cuò)誤。
$ dmesg | tail
[1880957.563150] perl invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[...]
[1880957.563400] Out of memory: Kill process 18694 (perl) score 246 or sacrifice child
[1880957.563408] Killed process 18694 (perl) total-vm:1972392kB, anon-rss:1953348kB, file-r
ss:0kB
[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request. Check SNMP cou
nters.
Kernel initialization and Boot Options
在啟動(dòng)時(shí)观蜗,Linux內(nèi)核初始化的順序如下:
- CPU inspection (檢查CPU)
- Memory inspection (檢查內(nèi)存)
- Device bus discovery (發(fā)現(xiàn)設(shè)備總線)
- Device discovery (發(fā)現(xiàn)設(shè)備)
- Auxiliary kernel subsystem setup(networking, and so on) (輔助內(nèi)核子系統(tǒng)啟動(dòng)臊恋,例如網(wǎng)絡(luò)等)
- Root filesystem mount (掛載根目錄)
- User space start (用戶空間啟動(dòng))
Kernel Parameters
文件/proc/cmdline記錄了系統(tǒng)內(nèi)核啟動(dòng)參數(shù):
[root@li1437-101 ~]# cat /proc/cmdline
root=/dev/sda console=tty1 console=ttyS0 ro devtmpfs.mount=1
查看運(yùn)行級(jí)別:
[root@li1437-101 ~]# who -r
run-level 3 2017-02-27 09:47
[root@li1437-101 ~]#
How User Space Starts
用戶空間啟動(dòng)順序:
- init
- 必要的低層服務(wù)例如:udevd 和 syslog
- 網(wǎng)絡(luò)配置
- 中高層服務(wù)例如 :cron , printing
- 登錄提示、圖形界面及其它高層次應(yīng)用
天字第一號(hào)進(jìn)程
init(initialization的簡寫)是 Unix 和 類Unix 系統(tǒng)中用來產(chǎn)生其它所有進(jìn)程的程序墓捻。它以守護(hù)進(jìn)程的方式存在抖仅,其進(jìn)程號(hào)為1。Linux系統(tǒng)在開機(jī)時(shí)加載Linux內(nèi)核后,便由Linux內(nèi)核加載init程序撤卢,由init程序完成余下的開機(jī)過程环凿,比如加載運(yùn)行級(jí)別,加載服務(wù)放吩,引導(dǎo)Shell/圖形化界面等等智听。
[root@li1437-101 ~]# ps -ef | grep init
root 1 0 0 Feb27 ? 00:03:05 /sbin/init
root 28683 28663 0 02:44 pts/0 00:00:00 grep --color=auto init
// Mac OS
bash-3.2$ ps -ef | grep init
0 243 1 0 15 517 ?? 0:00.74 /System/Library/CoreServices/CrashReporterSupportHelper server-init
0 533 1 0 15 517 ?? 0:02.07 /System/Library/CoreServices/SubmitDiagInfo server-init
501 52150 1 0 日01下午 ?? 0:15.49 /usr/libexec/secinitd
0 69864 1 0 11:35上午 ?? 0:00.20 /usr/libexec/secinitd
0 72830 1 0 1:51下午 ?? 0:00.19 /usr/libexec/secinitd
Darwin ACA80166.ipt.aol.com 16.5.0 Darwin Kernel Version 16.5.0: Fri Mar 3 16:52:33 PST 2017; root:xnu-3789.51.2~3/RELEASE_X86_64 x86_64
bash-3.2$
在Linux發(fā)行版中,init有三種主要的實(shí)現(xiàn)形式:
- System V init: 傳統(tǒng)的
- systemd: 所有主流Linux發(fā)行版中的標(biāo)準(zhǔn)init
- Upstart: Ubuntu
Android 和 BSD (運(yùn)行存放于'/etc/rc'的初始化 shell 腳本)也有它們自己的init版本渡紫,一些發(fā)行版也將System V init 修改為類似BSD風(fēng)格的實(shí)現(xiàn)到推。目前大部分Linux發(fā)行版都已采用新的systemd替代System V和Upstart她奥,但systemd向下兼容System V寒随。
System V init: 存在一個(gè)啟動(dòng)序列康震,同一時(shí)間只能啟動(dòng)一個(gè)任務(wù)妙黍,這種架構(gòu)下作媚,很容易解決依賴問題随珠,但是性能方面要受一些影響补疑。
systemd is goal oriented. : 針對(duì)System V init的不足瘦馍,systemd所有的服務(wù)都并發(fā)啟動(dòng)欣喧。systemd時(shí)基于目標(biāo)的腌零,需要定義要實(shí)現(xiàn)的目標(biāo),以及它的依賴項(xiàng)唆阿。systemd 將所有過程都抽象為一個(gè)配置單元益涧,即 unit⊙北睿可以認(rèn)為一個(gè)服務(wù)是一個(gè)配置單元闲询;一個(gè)掛載點(diǎn)是一個(gè)配置單元。
Upstart is reactionary.:Upstart是基于事件的浅辙,Upstart的事件驅(qū)動(dòng)模型允許它以異步方式對(duì)生成的事件作出回應(yīng)扭弧。
(三) The Initial RAM filesystem
Linux內(nèi)核不能通過訪問PC BIOS 或者 EFI接口從磁盤獲取數(shù)據(jù),所以為了mount它的root filesystem, 對(duì)于底層存儲(chǔ)需要驅(qū)動(dòng)程序支持记舆。解決方案是在內(nèi)核運(yùn)行之前鸽捻,由boot loader加載驅(qū)動(dòng)模塊及工具到內(nèi)存。在啟動(dòng)時(shí)泽腮,內(nèi)核讀取相關(guān)模塊到一個(gè)臨時(shí)的RAM filesystem(initramfs),掛載在/根目錄,initramsfs允許內(nèi)核為真正的root filesystem加載必要的驅(qū)動(dòng)模塊御蒲。
最后,再掛載真正的root filesystem诊赊、啟動(dòng)init厚满。
Linux在很多場(chǎng)景下都需要?jiǎng)?chuàng)建一個(gè)基于內(nèi)存的文件系統(tǒng),提供一個(gè)可以接近零延遲的快速存儲(chǔ)區(qū)域豪筝。目前有兩類主要的RAM磁盤可用痰滋,她們個(gè)有優(yōu)劣:ramfs和tmpfs摘能。(注意:創(chuàng)建之前使用 free 命令查看未使用的RAM)
# free
total used free shared buff/cache available
Mem: 1012720 168756 23576 52024 820388 754520
Swap: 262140 88 262052
# mkdir /mnt/ramdisk
# mount -t tmpfs -o size=512m tmpfs /mnt/ramdisk
# vi /etc/fstab
#tmpfs /mnt/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=1024M 0 0