場(chǎng)景來(lái)源
在使用kubevirt的場(chǎng)景中遥倦,因?yàn)槟承┊惓3腊龋瑪?shù)據(jù)盤被清理掉管挟,經(jīng)過(guò)分析disk.img在pod內(nèi)和宿主機(jī)的存儲(chǔ)路徑轿曙,基本可以排除人為刪除,軟件層面只有kubelet或者virt-handler可以去做這個(gè)清理僻孝。查看路徑导帝,可以看到數(shù)據(jù)盤使用的是kubernetes的empty-dir,極有可能是pod異常穿铆,被重啟時(shí)您单,empty-dir也會(huì)隨之被清理,現(xiàn)在要確認(rèn)到底是不是kubelet清理了該文件荞雏。
問(wèn)題分析
如果想知道一個(gè)文件是被誰(shuí)刪除的虐秦,有什么辦法呢?監(jiān)測(cè)rm命令凤优?如果是語(yǔ)言層面進(jìn)行刪除悦陋,就沒(méi)法監(jiān)測(cè)了。
我們可以想到一個(gè)辦法筑辨,從內(nèi)核層面監(jiān)聽(tīng)內(nèi)核的刪除函數(shù)俺驶。kprobe可以實(shí)現(xiàn)這個(gè)功能。
刪除文件是執(zhí)行系統(tǒng)調(diào)用unlink完成的棍辕,但是可能因?yàn)殒溄訂?wèn)題或者引用問(wèn)題暮现,這個(gè)文件并不會(huì)刪除还绘,所以我們應(yīng)該找到一個(gè)函數(shù),這個(gè)函數(shù)如果被調(diào)用到栖袋,文件肯定會(huì)被刪除蚕甥,由此可以想到,監(jiān)測(cè)底層文件系統(tǒng)的一個(gè)刪除函數(shù)栋荸。
如ext4,可以監(jiān)測(cè)ext4_unlink凭舶,xfs要監(jiān)測(cè)xfs_vn_unlink函數(shù)晌块。
inode_operations的unlink函數(shù)原型
int (*unlink) (struct inode *,struct dentry *);
我們可以從dentry中拿到文件路徑,通過(guò)內(nèi)核的全局變量current拿到當(dāng)前的進(jìn)程帅霜,由于dentry是該函數(shù)的第二個(gè)函數(shù)匆背,我們應(yīng)該從rsi寄存器中獲取地址。下面附上完整代碼身冀。
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
#define MAX_SYMBOL_LEN 64
// 要監(jiān)測(cè)的符號(hào)信息
static char symbol[MAX_SYMBOL_LEN] = "xfs_vn_unlink";
module_param_string(symbol, symbol, sizeof(symbol), 0644);
/* For each probe you need to allocate a kprobe structure */
static struct kprobe kp = {
.symbol_name = symbol,
};
/* kprobe pre_handler: called just before the probed instruction is executed */
static int handler_pre(struct kprobe *p, struct pt_regs *regs)
{
#ifdef CONFIG_X86
// rdi,rsi,rdx,rcx,r8,r9钝尸,寄存器傳參及順序,dentry在第二個(gè)參數(shù)搂根,所以用rsi珍促。
struct dentry *dentry = (struct dentry*)regs->si;
//pr_info("<%s> pre_handler: p->addr = 0x%p, ip = %lx, flags = 0x%lx\n",
// p->symbol_name, p->addr, regs->ip, regs->flags);
char * name = dentry->d_name.name;
struct task_struct * acurrent =current;
pr_info("name : %s pid : %d execname : %s\n", name, acurrent->pid, acurrent->comm);
#endif
return 0;
}
/* kprobe post_handler: called after the probed instruction is executed */
static void handler_post(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
#ifdef CONFIG_X86
//pr_info("<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n",
// p->symbol_name, p->addr, regs->flags);
#endif
}
/*
* * fault_handler: this is called if an exception is generated for any
* * instruction within the pre- or post-handler, or when Kprobes
* * single-steps the probed instruction.
* */
static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
pr_info("fault_handler: p->addr = 0x%p, trap #%dn", p->addr, trapnr);
/* Return 0 because we don't handle the fault. */
return 0;
}
static int __init kprobe_init(void)
{
int ret;
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
kp.fault_handler = handler_fault;
ret = register_kprobe(&kp);
if (ret < 0) {
pr_err("register_kprobe failed, returned %d\n", ret);
return ret;
}
pr_info("Planted kprobe at %p\n", kp.addr);
return 0;
}
static void __exit kprobe_exit(void)
{
unregister_kprobe(&kp);
pr_info("kprobe at %p unregistered\n", kp.addr);
}
對(duì)應(yīng)的Makefile
# To build modules outside of the kernel tree, we run "make"
# # # in the kernel source tree; the Makefile these then includes this
# # # Makefile once again.
# # # This conditional selects whether we are being included from the
# # # kernel Makefile or not.
# #
# # # called from kernel build system: just declare what our modules are
#obj-m := reg_module.o
obj-m := kprobe_test.o
# #
CROSS_COMPILE =
# #
CC = gcc
# # # Assume the source tree is where the running kernel was built
# # You should set KERNELDIR in the environment if it's elsewhere
KERNELDIR ?= /lib/modules/4.19.0/build
# # The current directory is passed to sub-makes as argument
PWD := $(shell pwd)
all:
make -C $(KERNELDIR) M=$(PWD) modules
clean:
rm -rf *.o *~ core .depend *.symvers .*.cmd *.ko *.mod.c .tmp_versions $(TARGET)
效果展示
加載內(nèi)核模塊后
insmod kprobe_test.ko
[256838.004884] name : kprobe_test.mod.o pid : 10749 execname : rm
[256861.029438] name : .kprobe_test.c.swx pid : 10756 execname : vim
[256861.029456] name : .kprobe_test.c.swp pid : 10756 execname : vim
[256901.147068] name : .viminfo pid : 10756 execname : vim
[256901.262790] name : .kprobe_test.c.swp pid : 10756 execname : vim
[256913.980868] name : abc pid : 10767 execname : rm
[256929.192739] name : .messages.swpx pid : 10769 execname : vim
[256929.192767] name : .messages.swp pid : 10769 execname : vim
[256933.188426] name : .viminfo pid : 10769 execname : vim
[256933.289964] name : .messages.swp pid : 10769 execname : vim
[256938.301215] name : .kprobe_test.c.swx pid : 10770 execname : vim
[256938.301236] name : .kprobe_test.c.swp pid : 10770 execname : vim
[256978.605412] name : .viminfo pid : 10770 execname : vim
[256978.714638] name : .kprobe_test.c.swp pid : 10770 execname : vim
[257128.882336] name : .kprobe_test.c.swx pid : 10798 execname : vim
[257128.882354] name : .kprobe_test.c.swp pid : 10798 execname : vim
[257382.589794] name : .viminfo pid : 10798 execname : vim
[257382.699274] name : .kprobe_test.c.swp pid : 10798 execname : vim
總結(jié)
代碼及功能雖然不多,但是需要對(duì)內(nèi)核有一些源碼層面的了解剩愧。
這個(gè)功能當(dāng)然也可以做為一個(gè)metrics來(lái)暴露出來(lái)猪叙,監(jiān)測(cè)是哪個(gè)進(jìn)程,甚至是哪個(gè)用戶仁卷,在什么時(shí)間刪除了文件穴翩。