note01
test_ops 測(cè)試流程疏理
包含單個(gè)算子的整個(gè)測(cè)試流程代碼
def test_batch_norm():
# 定義數(shù)據(jù)載體相當(dāng)于
data = relay.var("data", shape=(2, 7, 4, 3))
# 用relay構(gòu)成網(wǎng)絡(luò)(計(jì)算圖構(gòu)成)
net = batch_norm_infer(data)[0]
# 分析網(wǎng)絡(luò)的自由參數(shù)(輸入?yún)?shù))
args = relay.analysis.free_vars(net)
# 將net和args組合成一個(gè)小子圖
subgraph = relay.Function(args, net)
# 并進(jìn)行類型推導(dǎo)(為什么之前不行?因?yàn)楸仨毤尤雂ata)
subgraph = testing.run_infer_type(subgraph)
# 進(jìn)行反向傳播的轉(zhuǎn)換
reverse = transform.gradient(subgraph, mode="first_order")
# 編譯或者說解析得到可執(zhí)行模塊
# vm就是可執(zhí)行模塊
backward, vm = nn_compile(reverse)
# 數(shù)據(jù)載體賦予實(shí)際數(shù)據(jù)
data_np = np.random.uniform(0, 10, size=(2, 7, 4, 3)).astype("float32")
label_np = np.random.uniform(0, 10, size=(2, 7, 4, 3)).astype("float32")
# 運(yùn)行獲取輸出趟庄。
outputs = nn_exec(backward, vm, (2, 7, 4, 3), data_np=data_np, label_np=label_np)
# 打印輸出括细。
print(outputs[1])
return outputs
具體疏理
params都是哪來的的跟蹤
用pdb打了個(gè)斷點(diǎn)跟蹤了一下,從中間結(jié)果大概理解一下都在干了些什么
-> return backward, vm
(Pdb)
(Pdb)
(Pdb) print(backward)
def @main(%param_data: Tensor[(2, 7, 4, 3), float32], %param_bn_gamma: Tensor[(7), float32], %param_bn_beta: Tensor[(7), float32], %param_bn_moving_mean: Tensor[(7), float32], %param_bn_moving_var: Tensor[(7), float32]) {
%40 = fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32], Primitive=1, Compiler="xpucompiler", global_symbol="xpu") {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
可以看到可執(zhí)行模塊是一個(gè)包裹了需要運(yùn)行的內(nèi)容(包裹了fn)的main函數(shù)
它的參數(shù)有param_data param_bn_gamma param_bn_beta param_bn_moving_mean param_bn_moving_var
然后參數(shù)是怎么來的經(jīng)過探索
param_bn_gamma param_bn_beta param_bn_moving_mean param_bn_moving_var都是batch_norm_infer中加上的
而對(duì)于剩下的param_data戚啥,
¥@……@¥%……
總之奋单,
data就是data = relay.var("data") (從而有了一個(gè)relay.var叫"data")
然后net = batch_norm_infer(data)[0]使得net就把那個(gè)叫"data"的relay.var參數(shù)作為輸入?yún)?shù)了。所以就有了一個(gè)名字
然后batch_norm_infer里面還加入了bn_gamma, bn_beta, bn_moving_mean, bn_moving_var
他們也都是batch_norm_infer里面構(gòu)建的relay.var猫十,然后被賦予了上面這些名字览濒。
那"param_"這個(gè)前綴哪來的呢?
是在之后的nn.compile(reverse)
nn.compile里面會(huì)有內(nèi)容的重組拖云,包括把參數(shù)都重新命名了一遍贷笛,就是取出了參數(shù)加上了這個(gè)param_前綴。
模型IR在其中經(jīng)過了什么變化
# net = batch_norm_infer(data)[0]
# print(net)
free_var %data: Tensor[(2, 7, 4, 3), float32];
free_var %bn_gamma;
free_var %bn_beta;
free_var %bn_moving_mean;
free_var %bn_moving_var;
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var);
%0.0
# 在relay的語法中宙项,"xxx.0"表示xxx是一個(gè)元組(列表)之類的東西乏苦,然后取出元組中的第一項(xiàng)。
# 另外寫在最后的一句表示返回值尤筐。
# args = relay.analysis.free_vars(net)
# print(args)
[Var(data, ty=TensorType([2, 7, 4, 3], float32)), Var(bn_gamma), Var(bn_beta), Var(bn_moving_mean), Var(bn_moving_var)]
# subgraph = relay.Function(args, net)
# print(subgraph)
fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var);
%0.0
}
# subgraph = testing.run_infer_type(subgraph)
# print(subgraph)
fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32]) -> Tensor[(2, 7, 4, 3), float32] {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%0.0
}
# 可以看到已經(jīng)正確推導(dǎo)出了類型汇荐。是根據(jù)已有的類型加上參數(shù)之間的類型聯(lián)系得到的洞就。LayerNormRel里面也是根據(jù)data的shape得到其他參數(shù)的shape
# reverse = transform.gradient(subgraph, mode="first_order")
# print(reverse)
fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32]) -> (Tensor[(2, 7, 4, 3), float32], (Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32], Tensor[(7), float32], Tensor[(7), float32])) {
let %x = %data;
let %x1 = zeros_like(%x);
let %x2 = %bn_gamma;
let %x3 = zeros_like(%x2);
let %x4 = %bn_beta;
let %x5 = zeros_like(%x4);
let %x6 = %bn_moving_mean;
let %x7 = zeros_like(%x6);
let %x8 = %bn_moving_var;
let %x9 = zeros_like(%x8);
%0 = nn.batch_norm(%x, %x2, %x4, %x6, %x8) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
let %x10 = %0;
%1 = zeros(shape=[2, 7, 4, 3], dtype="float32");
%2 = zeros(shape=[7], dtype="float32");
%3 = zeros(shape=[7], dtype="float32");
let %x11 = (%1, %2, %3);
let %x12 = %x10.0;
let %x13 = zeros_like(%x12);
%4 = %x11.0;
%5 = ones_like(%x12);
%17 = (
let %x14 = add(%4, %5);
%6 = %x11.1;
%7 = %x11.2;
let %x15 = (%x14, %6, %7);
%8 = %x15.0;
%9 = %0.1;
%10 = %0.2;
%11 = nn.batch_norm_grad(%x, %8, %x2, %9, %10);
%12 = %11.0;
let %x16 = add(%x1, %12);
%13 = %11.1;
let %x17 = add(%x3, %13);
%14 = %11.2;
let %x18 = add(%x5, %14);
%15 = zeros_like(%x6);
let %x19 = add(%x7, %15);
%16 = zeros_like(%x8);
let %x20 = add(%x9, %16);
(%x16, %x17, %x18, %x19, %x20)
);
(%x12, %17)
}
# 可以看到里面既有batch_norm又有batch_norm_grad的調(diào)用。
# 內(nèi)容增多了拢驾。相當(dāng)于原來subgraph里IR是正向傳播(forward)
# 現(xiàn)在reverse里的IR還補(bǔ)上了反向傳播(backward)
# 所以transform.gradient中transform的意思是“改造”
# 改造為附加上反向傳播的IR
# 反向傳播的部分怎么得到的?
# 對(duì)應(yīng)了_tensor_grad.py中產(chǎn)生梯度的部分
# backward, vm = nn_compile(reverse)
# step into nn_compile:
# param_args = ....
# print(param_args)
[Var(param_data, ty=TensorType([2, 7, 4, 3], float32)), Var(param_bn_gamma, ty=TensorType([7], float32)), Var(param_bn_beta, ty=TensorType([7], float32)), Var(param_bn_moving_mean, ty=TensorType([7], float32)), Var(param_bn_moving_var, ty=TensorType([7], float32))]
# 確實(shí)還是那些參數(shù)改基,但參數(shù)名加上了param_前綴
# reverseIR = tvm.IRModule.from_expr(reverse)
# print(reverseIR)
def @main(%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32]) -> (Tensor[(2, 7, 4, 3), float32], (Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32], Tensor[(7), float32], Tensor[(7), float32])) {
let %x = %data;
let %x1 = zeros_like(%x);
let %x2 = %bn_gamma;
let %x3 = zeros_like(%x2);
let %x4 = %bn_beta;
let %x5 = zeros_like(%x4);
let %x6 = %bn_moving_mean;
let %x7 = zeros_like(%x6);
let %x8 = %bn_moving_var;
let %x9 = zeros_like(%x8);
%0 = nn.batch_norm(%x, %x2, %x4, %x6, %x8) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
let %x10 = %0;
%1 = zeros(shape=[2, 7, 4, 3], dtype="float32");
%2 = zeros(shape=[7], dtype="float32");
%3 = zeros(shape=[7], dtype="float32");
let %x11 = (%1, %2, %3);
let %x12 = %x10.0;
let %x13 = zeros_like(%x12);
%4 = %x11.0;
%5 = ones_like(%x12);
%17 = (
let %x14 = add(%4, %5);
%6 = %x11.1;
%7 = %x11.2;
let %x15 = (%x14, %6, %7);
%8 = %x15.0;
%9 = %0.1;
%10 = %0.2;
%11 = nn.batch_norm_grad(%x, %8, %x2, %9, %10);
%12 = %11.0;
let %x16 = add(%x1, %12);
%13 = %11.1;
let %x17 = add(%x3, %13);
%14 = %11.2;
let %x18 = add(%x5, %14);
%15 = zeros_like(%x6);
let %x19 = add(%x7, %15);
%16 = zeros_like(%x8);
let %x20 = add(%x9, %16);
(%x16, %x17, %x18, %x19, %x20)
);
(%x12, %17)
}
# 從reverse中的fn變成了reverseIR中的def @main
# 也許這就是from_expr的作用繁疤。
# from_expr在官方文檔中的解釋為:Construct a module from a standalone expression.
# 也許@main才是一個(gè)module,一個(gè)可調(diào)用的module秕狰。
# reverseIR = relay.transform.ToGraphNormalForm()(reverseIR)
# print(reverseIR)
def @main(%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32]) -> (Tensor[(2, 7, 4, 3), float32], (Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32], Tensor[(7), float32], Tensor[(7), float32])) {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
(%1, %31)
}
# 與之前的reverse相比沒有了let稠腊,確實(shí)更像一個(gè)圖的形式(所謂的ToGraphNormalForm)
# reverse = reverseIR["main"].body
# print(reverse)
free_var %data: Tensor[(2, 7, 4, 3), float32];
free_var %bn_gamma: Tensor[(7), float32];
free_var %bn_beta: Tensor[(7), float32];
free_var %bn_moving_mean: Tensor[(7), float32];
free_var %bn_moving_var: Tensor[(7), float32];
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
(%1, %31)
# 就是reverseIR中main部分的IR。里面還帶了free_vars
# 之前誰也帶了free_vars?沒錯(cuò)就是net
# 所以現(xiàn)在reverse也和之前的net一樣是一個(gè)“圖IR”鸣哀,而且這個(gè)圖IR是帶梯度計(jì)算的架忌。
# loss = relay.TupleGetItem(reverse, 0)
# print(loss)
free_var %data: Tensor[(2, 7, 4, 3), float32];
free_var %bn_gamma: Tensor[(7), float32];
free_var %bn_beta: Tensor[(7), float32];
free_var %bn_moving_mean: Tensor[(7), float32];
free_var %bn_moving_var: Tensor[(7), float32];
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
%32 = (%1, %31);
%32.0
# 這里相當(dāng)于一個(gè)小小的transform(“改造”)
# 對(duì)比reverse和loss可以發(fā)現(xiàn)
# loss的IR和reverse的IR就是reverse的IR加上取Tuple的IR(!)
# grad = relay.TupleGetItem(reverse, 1)
# print(grad)
free_var %data: Tensor[(2, 7, 4, 3), float32];
free_var %bn_gamma: Tensor[(7), float32];
free_var %bn_beta: Tensor[(7), float32];
free_var %bn_moving_mean: Tensor[(7), float32];
free_var %bn_moving_var: Tensor[(7), float32];
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
%32 = (%1, %31);
%32.1
# 與loss同理
# compute = []
# for idx in range(0, len(param_args)):
# compute.append(relay.TupleGetItem(grad, idx))
# compute.insert(0, loss)
# 自然這里就可以明白
# 解釋如下
# 首先reverse是transform.gradient(subgraph)得來的IR
# 而transform.gradient想打造一個(gè)什么樣的函數(shù)呢(也就是說得到的reverse里的IR是在干什么事情呢)
# 就是返回值是一個(gè)Tuple我衬,第一個(gè)是loss叹放,第二個(gè)是grad
# transform.gradient的文檔也是這么說的
# Transform the input function, returning a function that calculate the original result, paired with gradient of the input.
# 也就是說,transform.gradient(subgraph)里的IR內(nèi)容(即reverse里的IR內(nèi)容)所描述的事情是:計(jì)算原函數(shù)的結(jié)果和對(duì)給定輸入的梯度挠羔,返回一個(gè)原函數(shù)結(jié)果和梯度的二元組井仰。
#
# (所以根據(jù)文檔來看,loss這個(gè)參數(shù)名不是很合理破加,應(yīng)改為ori_result)
# 所以俱恶,loss = relay.TupleGetItem(reverse, 0)這一句意思是:由于reverse的IR所描述的內(nèi)容是返回一個(gè)(original result, gradient)的二元組,所以如果在reverse的IR里加上一段描述從Tuple里GetItem的內(nèi)容范舀,所得到的IR就是在描述怎么計(jì)算original_result合是。
# 與loss同理,grad = relay.TupleGetItem(reverse, 1)這一句意思是要(通過在reverse的IR里加一段取Tuple中內(nèi)容的IR)锭环,以獲得描述怎么計(jì)算grad的IR聪全。
# compute = []
# for idx in range(0, len(param_args)):
# compute.append(relay.TupleGetItem(grad, idx))
# compute.insert(0, loss)
# 這一段代碼也同理
# 因?yàn)間rad也是一個(gè)元組,元組里面分別是對(duì)三個(gè)參數(shù)的梯度辅辩,所以也要加一段描述從grad這個(gè)Tuple里取內(nèi)容的IR來得到計(jì)算每個(gè)參數(shù)的梯度的IR荔烧。
# 最后再鋪開來拼一起。
# 再把這些IR通過relay.Funtion(args, reverse)包裝成一個(gè)函數(shù)汽久。
# 這個(gè)函數(shù)的IR所描述的內(nèi)容就是返回一個(gè)元組鹤竭,里面分別是計(jì)算所得的(orig_result, grad_of_param1, grad_of_param2, grad_of_param3)
# 然后看之后的代碼,發(fā)現(xiàn)
# 最后還把relay.Funtion包裝成了relay.Call(reverse, param_args)
# 再包裝成backward = relay.Function(param_arg, call)
# 我們也順帶展示了一下
# reverse = relay.Tuple(compute)
# 經(jīng)過前面這么一大段景醇,我們知道臀稚,現(xiàn)在reverse也還是一個(gè)圖IR,
# 其返回值是一個(gè)元組三痰,元組里面各項(xiàng)是計(jì)算所得(怎么計(jì)算也在reverse的IR里描述)的
# original_result, grad_of_param1, grad_of_param2, grad_of_param3
# args = relay.analysis.free_vars(reverse)
# reverse = relay.Function(args, reverse)
# reverse = set_external_func_attr(reverse, "xpucompiler", "xpu") 下面會(huì)看到確實(shí)加上了Primitive Compiler global_symbol三個(gè)func_attr
# 到這里為止吧寺,經(jīng)歷了從net這個(gè)圖IR被轉(zhuǎn)換為FuntionIR又ToGraphNormalForm后取body(取的body變成了圖IR)窜管,
# 各種拼裝之后又回到了FuntionIR。
# print(reverse)
fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32], Primitive=1, Compiler="xpucompiler", global_symbol="xpu") {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
%32 = (%1, %31);
%33 = %32.0;
%34 = %32.1;
%35 = %34.0;
%36 = %34.1;
%37 = %34.2;
%38 = %34.3;
%39 = %34.4;
(%33, %35, %36, %37, %38, %39)
}
# call = relay.Call(reverse, param_args)
# print(call)
free_var %param_data: Tensor[(2, 7, 4, 3), float32];
free_var %param_bn_gamma: Tensor[(7), float32];
free_var %param_bn_beta: Tensor[(7), float32];
free_var %param_bn_moving_mean: Tensor[(7), float32];
free_var %param_bn_moving_var: Tensor[(7), float32];
%40 = fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32], Primitive=1, Compiler="xpucompiler", global_symbol="xpu") {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
%32 = (%1, %31);
%33 = %32.0;
%34 = %32.1;
%35 = %34.0;
%36 = %34.1;
%37 = %34.2;
%38 = %34.3;
%39 = %34.4;
(%33, %35, %36, %37, %38, %39)
};
%40(%param_data, %param_bn_gamma, %param_bn_beta, %param_bn_moving_mean, %param_bn_moving_var)
# 可以看到稚机,對(duì)比reverse幕帆,前后都多加了,前多加了帶param_的free_vars赖条,后多加了對(duì)之前組裝成函數(shù)的那個(gè)reverse的調(diào)用失乾,但現(xiàn)在又是“散裝”狀的圖IR了。于是之后backward = relay.Funtion(param_args, call)又組裝成Function纬乍。
# backward = relay.Function(param_args, call)
# print(backward)
fn (%param_data: Tensor[(2, 7, 4, 3), float32], %param_bn_gamma: Tensor[(7), float32], %param_bn_beta: Tensor[(7), float32], %param_bn_moving_mean: Tensor[(7), float32], %param_bn_moving_var: Tensor[(7), float32]) {
%40 = fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32], Primitive=1, Compiler="xpucompiler", global_symbol="xpu") {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
%32 = (%1, %31);
%33 = %32.0;
%34 = %32.1;
%35 = %34.0;
%36 = %34.1;
%37 = %34.2;
%38 = %34.3;
%39 = %34.4;
(%33, %35, %36, %37, %38, %39)
};
%40(%param_data, %param_bn_gamma, %param_bn_beta, %param_bn_moving_mean, %param_bn_moving_var)
}
# backward = tvm.IRModule.from_expr(backward)
# 是在繼續(xù)把上一句組裝成的Function轉(zhuǎn)為一個(gè)module碱茁,即def @main
# print(backward)
def @main(%param_data: Tensor[(2, 7, 4, 3), float32], %param_bn_gamma: Tensor[(7), float32], %param_bn_beta: Tensor[(7), float32], %param_bn_moving_mean: Tensor[(7), float32], %param_bn_moving_var: Tensor[(7), float32]) {
%40 = fn (%data: Tensor[(2, 7, 4, 3), float32], %bn_gamma: Tensor[(7), float32], %bn_beta: Tensor[(7), float32], %bn_moving_mean: Tensor[(7), float32], %bn_moving_var: Tensor[(7), float32], Primitive=1, Compiler="xpucompiler", global_symbol="xpu") {
%0 = nn.batch_norm(%data, %bn_gamma, %bn_beta, %bn_moving_mean, %bn_moving_var) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%1 = %0.0;
%2 = zeros_like(%data) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%3 = zeros(shape=[2, 7, 4, 3], dtype="float32") /* ty=Tensor[(2, 7, 4, 3), float32] */;
%4 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%5 = zeros(shape=[7], dtype="float32") /* ty=Tensor[(7), float32] */;
%6 = (%3, %4, %5);
%7 = %6.0;
%8 = ones_like(%1) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%10 = %6.1;
%11 = %6.2;
%12 = (%9, %10, %11);
%13 = %12.0;
%14 = %0.1;
%15 = %0.2;
%16 = nn.batch_norm_grad(%data, %13, %bn_gamma, %14, %15) /* ty=(Tensor[(2, 7, 4, 3), float32], Tensor[(7), float32], Tensor[(7), float32]) */;
%17 = %16.0;
%18 = add(%2, %17) /* ty=Tensor[(2, 7, 4, 3), float32] */;
%19 = zeros_like(%bn_gamma) /* ty=Tensor[(7), float32] */;
%20 = %16.1;
%21 = add(%19, %20) /* ty=Tensor[(7), float32] */;
%22 = zeros_like(%bn_beta) /* ty=Tensor[(7), float32] */;
%23 = %16.2;
%24 = add(%22, %23) /* ty=Tensor[(7), float32] */;
%25 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%26 = zeros_like(%bn_moving_mean) /* ty=Tensor[(7), float32] */;
%27 = add(%25, %26) /* ty=Tensor[(7), float32] */;
%28 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%29 = zeros_like(%bn_moving_var) /* ty=Tensor[(7), float32] */;
%30 = add(%28, %29) /* ty=Tensor[(7), float32] */;
%31 = (%18, %21, %24, %27, %30);
%32 = (%1, %31);
%33 = %32.0;
%34 = %32.1;
%35 = %34.0;
%36 = %34.1;
%37 = %34.2;
%38 = %34.3;
%39 = %34.4;
(%33, %35, %36, %37, %38, %39)
};
%40(%param_data, %param_bn_gamma, %param_bn_beta, %param_bn_moving_mean, %param_bn_moving_var)
}
# target = "llvm"
# ctx = tvm.cpu()
# with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout", "SimplifyInference"]):
# exe = relay.vm.compile(backward, target=target)
# code, lib = exe.save()
# lib = update_lib(lib)
# exe = runtime.vm.Executable.load_exec(code, lib)
# vm = runtime.vm.VirtualMachine(exe, ctx)
# print(backward)
# pdb.set_trace()
# 之后這些代碼就是在編譯剛才的module--backward得到exe(可執(zhí)行模塊)
# 然后vm是裝載了這個(gè)可執(zhí)行模塊的“虛擬機(jī)”
# 之后只需要給vm提供實(shí)際的數(shù)據(jù)參數(shù)即可運(yùn)行。
# 見outputs = vm.run(**mod_params)通過解開字典的方式把要提供的帶實(shí)際數(shù)據(jù)的參數(shù)逐個(gè)送進(jìn)去仿贬。
還可以發(fā)現(xiàn)纽竣,subgraph = relay.Funtion(args, net)是從圖IR轉(zhuǎn)為FunctionIR
之后才能subgraph = testing.run_infer_type(subgraph)分析參數(shù)形狀具體確定參數(shù)類型;
而reverseIR = tvm.IRModule.from_expr(reverse)之后又reverseIR = relay.transform.ToGraphNormalForm()(reverseIR)是把FuntionIR打散為圖IR茧泪。為什么要用from_expr是因?yàn)橐D(zhuǎn)為def @main函數(shù)蜓氨,這樣才能方便指定reverse = reverseIR["main"].body里指定"main"的body取得函數(shù)體的IR。才能方便后面加上Tuple GetItem的IR等IR改造队伟。
nn_exec里面其實(shí)沒有什么東西了语盈,就一句outputs = vm.run(**mod_params)。
其他的都是在設(shè)置mod_params這個(gè)字典里的參數(shù)缰泡。
而且nn_exec里設(shè)置mod_params里加了"param_label"這一項(xiàng)刀荒,其實(shí)很多時(shí)候是沒有這個(gè)參數(shù)的,打印backward就知道了棘钞。
另外還可以發(fā)現(xiàn)缠借,transform.gradient里面反向傳播的是ones_like。即根梯度是全一宜猜。所以我們?cè)谟胮ytorch比對(duì)的時(shí)候也要這樣泼返。
一些debug時(shí)候進(jìn)行的改動(dòng)
我因?yàn)閷憀ayer_norm_grad的需要,需要從layer_norm獲取另外兩個(gè)計(jì)算結(jié)果姨拥,就是mean和var
而本來的layer_norm是只有一個(gè)結(jié)果的绅喉,就是被Normalized的數(shù)據(jù),現(xiàn)在要從layer_norm獲取
normalized_data, mean, var這三個(gè)數(shù)據(jù)
需要做的(在源碼上改動(dòng)的)有:一是C++那邊類型推導(dǎo)的時(shí)候把返回值的類型改了叫乌,即LayerNormRel
二是relay.nn.layer_norm在包裝_make.layer_norm(即C++那邊的編譯后的函數(shù)的包裝)時(shí)需要加
result = expr.TupleWrapper(result, 3)把結(jié)果包裝起來柴罐。像relay.nn.batch_norm所做的包裝一樣。
三是pytorch.py里面在進(jìn)行圖算子轉(zhuǎn)換的時(shí)候憨奸,對(duì)_op.nn.layer_norm(xxx)應(yīng)改為_op.nn.layer_norm(xxx)[0]
因?yàn)楸緛韑ayer_norm計(jì)算結(jié)果只有一個(gè)的革屠,現(xiàn)在被包裝成三個(gè)了。所以應(yīng)該用下標(biāo)[0]指明取原來的那個(gè)
另外還有學(xué)長(zhǎng)寫的dropout出問題,他寫的dropout多加了一個(gè)參數(shù)似芝,好像導(dǎo)入bert的時(shí)候和其他
部分有沖突那婉,所以照原來的tvm中dropout的寫法改回去了。
test_batch_norm里一個(gè)把是shape改了(改成4個(gè)維度都不同党瓮,為了print IR 時(shí)查看類型推導(dǎo)shape的情況)
也加入了一個(gè)pdb斷點(diǎn)详炬,這才有了上面逐行跟蹤IR變化情況的筆記。
bert源碼筆記