在主流卷積神經(jīng)網(wǎng)絡(luò)模型中Conv+BN+Relu
是一種常見的模型結(jié)構(gòu)盒犹。在模型推理和訓(xùn)練中牵祟,BN層往往與其他層合并痪蝇,以減少計(jì)算量。
模型解析
node_of_325
[TRT] Parsing node: node_of_325 [Conv]
[TRT] Searching for input: 324
[TRT] Searching for input: layer1.0.conv1.weight
[TRT] node_of_325 [Conv] inputs: [324 -> (-1, 64, 56, 56)[FLOAT]], [layer1.0.conv1.weight -> (128, 64, 1, 1)[FLOAT]],
[TRT] Convolution input dimensions: (-1, 64, 56, 56)
[TRT] Registering layer: node_of_325 for ONNX node: node_of_325
[TRT] Using kernel: (1, 1), strides: (1, 1), prepadding: (0, 0), postpadding: (0, 0), dilations: (1, 1), numOutputs: 128
[TRT] Convolution output dimensions: (-1, 128, 56, 56)
[TRT] Registering tensor: 325 for ONNX tensor: 325
[TRT] node_of_325 [Conv] outputs: [325 -> (-1, 128, 56, 56)[FLOAT]],
node_of_326
[TRT] Parsing node: node_of_326 [BatchNormalization]
[TRT] Searching for input: 325
[TRT] Searching for input: layer1.0.bn1.weight
[TRT] Searching for input: layer1.0.bn1.bias
[TRT] Searching for input: layer1.0.bn1.running_mean
[TRT] Searching for input: layer1.0.bn1.running_var
[TRT] node_of_326 [BatchNormalization] inputs: [325 -> (-1, 128, 56, 56)[FLOAT]], [layer1.0.bn1.weight -> (128)[FLOAT]], [layer1.0.bn1.bias
n1.running_mean -> (128)[FLOAT]], [layer1.0.bn1.running_var -> (128)[FLOAT]],
[TRT] Registering layer: node_of_326 for ONNX node: node_of_326
[TRT] Registering tensor: 326 for ONNX tensor: 326
[TRT] node_of_326 [BatchNormalization] outputs: [326 -> (-1, 128, 56, 56)[FLOAT]],
node_of_327
[TRT] Parsing node: node_of_327 [Relu]
[TRT] Searching for input: 326
[TRT] node_of_327 [Relu] inputs: [326 -> (-1, 128, 56, 56)[FLOAT]],
[TRT] Registering layer: node_of_327 for ONNX node: node_of_327
[TRT] Registering tensor: 327 for ONNX tensor: 327
[TRT] node_of_327 [Relu] outputs: [327 -> (-1, 128, 56, 56)[FLOAT]],
優(yōu)化
在TensorRT中會(huì)對(duì)網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行垂直整合喻旷,即將 Conv生逸、BN、Relu 三個(gè)層融合為了一個(gè)層,即CBR融合
Scale fusion
[TRT] Fusing convolution weights from node_of_325 with scale node_of_326
在BN層中槽袄,首先對(duì)輸入進(jìn)行歸一化(輸入張量的均值烙无,輸入張量的方差),然后對(duì)歸一化的結(jié)果進(jìn)行比例縮放和位移遍尺。[1][2]
展開可得:
帶入替換后可得:
此時(shí)可以將BN層視為一個(gè)1x1卷積層截酷。
BN層的輸入特征(Conv層的輸出特征)的形狀為,對(duì)于Conv層:
- 卷積核大小為
- 權(quán)重為乾戏,偏置為
- 通道數(shù)為
迂苛,因此BN與Conv融合之后
融合之后:
- 權(quán)重為
- 偏置為
ConvRelu fusion
[TRT] ConvReluFusion: Fusing node_of_325 with node_of_327
線性整流函數(shù)(Rectified Linear Unit, ReLU)即:,又稱修正線性單元鼓择,是一種人工神經(jīng)網(wǎng)絡(luò)中常用的激活函數(shù)(activation function)三幻。
在神經(jīng)網(wǎng)絡(luò)中,線性整流作為神經(jīng)元的激活函數(shù)惯退,定義了該神經(jīng)元在線性變換之后的非線性輸出結(jié)果赌髓。換言之,對(duì)于來自上一層卷積層的輸入向量催跪,使用線性整流激活函數(shù)可以得到輸出:
在Int8量化模型中锁蠕,Conv+ReLU 一般也可以合并成一個(gè)Conv進(jìn)行運(yùn)算[3]。
對(duì)于Int8ReLU懊蒸,其計(jì)算公式可以寫為 :
由于ReLU的輸入(數(shù)值范圍為)和輸出(數(shù)值范圍為)的數(shù)值范圍不同荣倾,因此需要保證和 、和 是一致的骑丸。由于ReLU的截?cái)嗖僮魃嗳裕虼诵枰褂?img class="math-inline" src="https://math.jianshu.com/math?formula=Scale_%7Bout%7D" alt="Scale_{out}" mathimg="1"> 和 ,即對(duì)于ReLU的輸入通危,使用輸出對(duì)應(yīng)的 和 保證其對(duì)小于0截?cái)嗟妮斎脒M(jìn)行截?cái)嘀恚瑢?duì)大于等于0的輸入映射至[0,255]范圍內(nèi)。
在Int8Conv的計(jì)算過程中菊碟,首先使用量化計(jì)算公式 對(duì)輸入和權(quán)重值進(jìn)行量化計(jì)算节芥,將其轉(zhuǎn)換為數(shù)值范圍為(0,255)的整數(shù),在完成卷積計(jì)算后再將計(jì)算結(jié)果進(jìn)行反量化計(jì)算逆害。而 ReLU 本身沒有做任何的數(shù)學(xué)運(yùn)算头镊,只是一個(gè)截?cái)嗪瘮?shù)。假設(shè)Int8Conv的卷積輸出為 122()魄幕,則對(duì)應(yīng)反量化輸出 -0.3相艇,經(jīng)過Int8ReLU(),對(duì)該值進(jìn)行Int8量化纯陨,對(duì)應(yīng)的輸出為0坛芽。因此在ReLU層對(duì)輸入進(jìn)行截?cái)嘀傲舸ⅲ纯傻玫叫枰財(cái)嗟臄?shù)值。
因此靡馁,通過在完成卷積計(jì)算后直接使用 ReLU 后的 scale 和 zeropoint進(jìn)行反量化欲鹏,實(shí)現(xiàn)了將 ConvReLU融合。
-
BatchNormalization https://hub.fastgit.org/onnx/onnx/blob/master/docs/Operators.md#BatchNormalization ?
-
Speeding up model with fusing batch normalization and convolution http://learnml.today/speeding-up-model-with-fusing-batch-normalization-and-convolution-3 ?
-
神經(jīng)網(wǎng)絡(luò)量化入門--Folding BN ReLU https://zhuanlan.zhihu.com/p/176982058 ?