MNN
是一個輕量級的深度神經(jīng)網(wǎng)絡(luò)推理引擎,在端側(cè)加載深度神經(jīng)網(wǎng)絡(luò)模型進行推理預測。目前罗珍,MNN
已經(jīng)在阿里巴巴的手機淘寶、手機天貓脚粟、優(yōu)酷等20多個App中使用覆旱,覆蓋直播、短視頻核无、搜索推薦扣唱、商品圖像搜索、互動營銷团南、權(quán)益發(fā)放噪沙、安全風控等場景。此外吐根,IoT
等場景下也有若干應用正歼。
Documents:https://www.yuque.com/mnn/en/about
Github:https://github.com/alibaba/MNN
Github:https://github.com/xiaochus/DeepModelDeploy/tree/main/MNN/cpp
環(huán)境
- Python 3.7
- Pytorch 1.7
- GCC 9.3
- CMAKE 3.9.1
- Protobuf 3.14
- OpenCV 4.5 + Contrib
- MNN 1.1.0
C++依賴庫安裝:
由于樹莓派平臺是基于ARM
芯片的,很多庫都要從源碼編譯拷橘,不能直接使用二進制文件局义。
CMAKE
下載cmake
https://cmake.org/
安裝 cmake
之前需要確認已經(jīng)安裝make
喜爷、gcc
、g++
萄唇,用make -v | gcc -v | g++ -v
可查看是否已經(jīng)安裝檩帐,如果沒有安裝用apt-get安裝一下:
sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install make
安裝openSSL
:
sudo apt-get install openssl
sudo apt-get install libssl-dev
編譯安裝:
./bootstrap
make -j4
sudo make install
查看版本確定成功安裝:
cmake --version
Protobuf
安裝依賴包:
sudo apt-get install autoconf automake libtool curl unzip
編譯安裝:
./autogen.sh
./configure
make -j4
make check
sudo make install
sudo ldconfig # refresh shared library cache.
查看版本確定成功安裝:
protoc --version
OpenCV
安裝依賴包:
sudo apt-get install build-essential
sudo apt-get install git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev
libjasper-dev
安裝失敗解決方案:
sudo add-apt-repository "deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ xenial main multiverse restricted universe"
sudo apt update
sudo apt install libjasper1 libjasper-dev
編譯安裝:
mkdir build
sudo cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local -D OPENCV_EXTRA_MODULES_PATH=/home/ubuntu/opencv-4.5.0/opencv_contrib-4.5.0/modules/ ..
sudo make install -j4
MNN
樹莓派4B有一塊500MHz
的Broadcom VideoCore IV GPU
,因此MNN
可以嘗試使用OpenCL
進行加速:
sudo apt-get install ocl-icd-opencl-dev
編輯CMakeLists
另萤,打開OpenCL
選項湃密。
option(MNN_OPENCL "Enable OpenCL" ON)
編譯MNN
:
./schema/generate.sh
mkdir build
cd build
cmake .. -DMNN_BUILD_CONVERTER=true -DMNN_SEP_BUILD=false && make -j4
模型推理
MNN
支持Tensorflow
、Caffe
四敞、ONNX
等主流模型文件格式勾缭,支持CNN
、RNN
目养、GAN
等常用網(wǎng)絡(luò)。
支持 149 個TensorflowOp
毒嫡、47 個CaffeOp
癌蚁、74 個ONNX Op
;各計算設(shè)備支持的MNN Op
數(shù):CPU
110個兜畸,Metal 55個努释,OpenCL
29個,Vulkan
31個咬摇。
導出ONNX模型
由于Depth-Wise
模型擁有更小的參數(shù)量伐蒂,更適用于邊緣端的計算需求,這里以imageNet
1000分類的mobilenetV2模型為例子肛鹏。
import torch
import torch.onnx as onnx
import torchvision.models as models
if __name__ == '__main__':
net = models.mobilenet_v2()
net.eval()
dummy_input = torch.zeros((1, 3, 224, 224))
input_names = ["input"]
outout_names = ["output"]
onnx.export(net,
dummy_input,
"mobilenet_v2.onnx",
verbose=True,
opset_version=11,
input_names=input_names,
output_names=outout_names,
dynamic_axes=None
)
轉(zhuǎn)換為MNN模型
轉(zhuǎn)換onnx
模型為mnn
模型:
./build/MNNConvert -f ONNX --modelFile mobilenet_v2.onnx --MNNModel mobilenet_v2.mnn --bizCode biz
參數(shù)說明:
Usage:
MNNConvert [OPTION...]
-h, --help Convert Other Model Format To MNN Model
-v, --version 顯示當前轉(zhuǎn)換器版本
-f, --framework arg 需要進行轉(zhuǎn)換的模型類型, ex: [TF,CAFFE,ONNX,TFLITE,MNN]
--modelFile arg 需要進行轉(zhuǎn)換的模型文件名, ex: *.pb,*caffemodel
--prototxt arg caffe模型結(jié)構(gòu)描述文件, ex: *.prototxt
--MNNModel arg 轉(zhuǎn)換之后保存的MNN模型文件名, ex: *.mnn
--fp16 將conv/matmul/LSTM的float32參數(shù)保存為float16逸邦,
模型將減小一半,精度基本無損
--benchmarkModel 不保存模型中conv/matmul/BN等層的參數(shù)在扰,僅用于benchmark測試
--bizCode arg MNN模型Flag, ex: MNN
--debug 使用debug模型顯示更多轉(zhuǎn)換信息
--forTraining 保存訓練相關(guān)算子缕减,如BN/Dropout,default: false
--weightQuantBits arg arg=2~8芒珠,此功能僅對conv/matmul/LSTM的float32權(quán)值進行量化桥狡,
僅優(yōu)化模型大小,加載模型后會解碼為float32皱卓,量化位寬可選2~8裹芝,
運行速度和float32模型一致。8bit時精度基本無損娜汁,模型大小減小4倍
default: 0嫂易,即不進行權(quán)值量化
--compressionParamsFile arg
使用MNN模型壓縮工具箱生成的模型壓縮信息文件
--saveStaticModel 固定輸入形狀,保存靜態(tài)模型掐禁, default: false
--inputConfigFile arg 保存靜態(tài)模型所需要的配置文件, ex: ~/config.txt
C++ API 推理
code:
#include <iostream>
#include <string>
#include <vector>
#include <cstring>
#include <chrono>
#include <cmath>
#include <opencv2/opencv.hpp>
#include "MNN/Interpreter.hpp"
using namespace std;
using namespace MNN;
int main() {
string testImagePath = "/home/ubuntu/MNN/test.png";
string modelFile = "/home/ubuntu/mobilenet_v2.mnn";
string mode = "fp16";
string deviceType = "gpu";
int numThread = 1;
// build network
Interpreter* net = Interpreter::createFromFile(modelFile.c_str());
// build config
ScheduleConfig config;
// set cpu thread used
config.numThread = numThread;
// set host device
if (deviceType == "cpu")
config.type = static_cast<MNNForwardType>(MNN_FORWARD_CPU);
if (deviceType == "gpu")
config.type = static_cast<MNNForwardType>(MNN_FORWARD_OPENCL);
// set precision
BackendConfig backendConfig;
if (mode == "fp16")
backendConfig.precision = static_cast<BackendConfig::PrecisionMode>(BackendConfig::Precision_Low);
if(mode == "half")
backendConfig.precision = static_cast<BackendConfig::PrecisionMode>(BackendConfig::Precision_Normal);
if (mode == "fp32")
backendConfig.precision = static_cast<BackendConfig::PrecisionMode>(BackendConfig::Precision_High);
// set power use
backendConfig.power = static_cast<BackendConfig::PowerMode>(BackendConfig::Power_Normal);
// set memory use
backendConfig.memory = static_cast<BackendConfig::MemoryMode>(BackendConfig::Memory_Normal);
config.backendConfig = &backendConfig;
// build session use config
Session* session = net->createSession(config);
// get input and output node of network
Tensor* modelInputTensor = net->getSessionInput(session, NULL);
Tensor* modelOutputTensor = net->getSessionOutput(session, NULL);
// image preprocess
cv::Scalar mean = {0.485, 0.456, 0.406};
cv::Scalar stdv = {1 /0.229, 1 / 0.224, 1 / 0.225};
cv::Mat img = cv::imread(testImagePath);
cv::resize(img, img, cv::Size(224, 224));
img.convertTo(img, CV_32F, 1 / 255.0);
img = ((img - mean) / stdv);
Tensor* inputTensor = Tensor::create<float>({1, 224, 224, 3}, NULL, Tensor::TENSORFLOW);
Tensor* outputTensor = Tensor::create<float>({1, 1000}, NULL, Tensor::CAFFE);
memcpy(inputTensor->host<float>(), img.data, inputTensor->size());
// inference
auto start = chrono::high_resolution_clock::now();
modelInputTensor->copyFromHostTensor(inputTensor);
net->runSession(session);
modelOutputTensor->copyToHostTensor(outputTensor);
auto end = chrono::high_resolution_clock::now();
double cost = chrono::duration<double, milli>(end - start).count();
cout << "device: " << deviceType << ", mode: " << mode << endl;
cout << "inference time: " << to_string(cost) << "ms" << endl;
// post-process
vector<float> confidence;
confidence.resize(1000);
memcpy(confidence.data(), outputTensor->host<float>(), outputTensor->size());
// delete point
delete inputTensor;
delete outputTensor;
delete net;
return 0;
}
CMake:
cmake_minimum_required(VERSION 3.17)
project(MNN)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
find_package(OpenCV REQUIRED)
if(OpenCV_FOUND)
message(STATUS "OpenCV library: ${OpenCV_INSTALL_PATH}")
message(STATUS " version: ${OpenCV_VERSION}")
message(STATUS " libraries: ${OpenCV_LIBS}")
message(STATUS " include path: ${OpenCV_INCLUDE_DIRS}")
else()
message(FATAL_ERROR "Error! OpenCV not found!")
set(OpenCV_INCLUDE_DIRS "/usr/local/include/opencv4")
set(OpenCV_LIBS "/usr/local/lib")
endif()
set(MNN_INCLUDE_DIR "/home/ubuntu/MNN-1.1.0/include")
set(MNN_LIBRARIES "/home/ubuntu/MNN-1.1.0/build/libMNN.so")
message(STATUS " MNN libraries: ${MNN_LIBRARIES}")
message(STATUS " MNN include path: ${MNN_INCLUDE_DIR}")
add_executable(MNN main.cpp)
target_include_directories(MNN PUBLIC
${OpenCV_INCLUDE_DIRS}
${MNN_INCLUDE_DIR}
)
target_link_libraries(MNN PUBLIC
${MNN_LIBRARIES}
${OpenCV_LIBS}
)
編譯:
mkdir build
cd build
sudo cmake .. & make -j4
性能對比:
ARMv8.2
這個指令集架構(gòu)引入了新的fp16
運算和 int8 dot
指令炬搭,優(yōu)化得當就能大幅加速深度學習框架的推理效率蜈漓。由于樹莓派4B的CPU
內(nèi)核為Cortex A72
,對應的指令集架構(gòu)是ARMv8-A
宫盔,因此在CPU
上無法實現(xiàn)fp16
和int8
的加速融虽。MNN
本身的fp16
只作用在參數(shù)的存儲上,在計算時還是會轉(zhuǎn)換成fp32
送入CPU
中進行計算灼芭。gpu
在調(diào)用opencl
后端的時候會出現(xiàn)ERROR CODE : -1001
的錯誤有额,暫時還無法進行測試”吮粒可以看到mobilenet_v2
在單線程下的速度可以達到90ms
左右巍佑,多線程的速度還可以進一步加快。
Device | FP32 | HALF | FP16 |
---|---|---|---|
cpu | 95.56ms | 96.74ms | 95.18ms |
gpu | - | - | - |