推理引擎tengine編譯部署及MobileNet_SSD推理測(cè)試

Tengine介紹

OADI/Tengine | github
Tengine 是OPEN AI LAB 為嵌入式設(shè)備開(kāi)發(fā)的一個(gè)輕量級(jí)驱闷、高性能并且模塊化的引擎独旷。
Tengine在嵌入式設(shè)備上支持CPU,GPU杠览,DLA/NPU续扔,DSP異構(gòu)計(jì)算的計(jì)算框架,實(shí)現(xiàn)異構(gòu)計(jì)算的調(diào)度器轰传,基于ARM平臺(tái)的高效的計(jì)算庫(kù)實(shí)現(xiàn),針對(duì)特定硬件平臺(tái)的性能優(yōu)化苇羡,動(dòng)態(tài)規(guī)劃計(jì)算圖的內(nèi)存使用绸吸,提供對(duì)于網(wǎng)絡(luò)遠(yuǎn)端AI計(jì)算能力的訪問(wèn)支持鼻弧,支持多級(jí)別并行设江,整個(gè)系統(tǒng)模塊可拆卸,基于事件驅(qū)動(dòng)的計(jì)算模型攘轩,吸取已有AI計(jì)算框架的優(yōu)點(diǎn)叉存,設(shè)計(jì)全新的計(jì)算圖表示。

安裝Tengine

官方安裝指南

配置編譯環(huán)境

sudo apt install git cmake
sudo apt install libprotobuf-dev protobuf-compiler libboost-all-dev libgoogle-glog-dev protobuf-compiler

下載并配置編譯選項(xiàng)

git clone https://github.com/OAID/tengine.git
cd ~/tengine
cp makefile.config.example makefile.config
vim makefile.config

修改為以下內(nèi)容:

# Set the target arch
CONFIG_ARCH_ARM64=y
# Enable Compiling Optimization
CONFIG_OPT_CFLAGS=-O2
# Use BLAS as the operator implementation
CONFIG_ARCH_BLAS=y
# Enable GPU support by Arm Computing Library
# CONFIG_ACL_GPU=y
# Set the path of ACL
# ACL_ROOT=/home/firefly/ComputeLibrary
# Enable other serializers
CONFIG_CAFFE_SERIALIZER=y
CONFIG_MXNET_SERIALIZER=y
CONFIG_ONNX_SERIALIZER=y
CONFIG_TF_SERIALIZER=y
CONFIG_TENGINE_SERIALIZER=y

鏈接OPENCV

tengine需要鏈接opencv度帮,以下代碼用于生成鏈接:

sudo vim /etc/ld.so.conf

在打開(kāi)的文件上中增加opencv的安裝路徑:

/usr/local/lib

執(zhí)行sudo ldconfig完成配置歼捏。
如果usr/local/lib下的opencv均為有版本號(hào)的“.so”文件,tengine會(huì)鏈接不到笨篷,需要修改瞳秽,以下腳本用來(lái)生成“.so”的軟鏈接:

# coding=utf-8
import os,sys
import shutil
import struct

file_list = []

def listdir(folder, file_list):
    fileNum = 0
    new_file_list = os.listdir(folder) 
    for line in new_file_list:
        filepath = os.path.join(folder,line)
        if os.path.isfile(filepath):
            #print(line)
            file_list.append(line)
            fileNum = fileNum + 1
#change .jpg.txt to .txt
def ChangeFileName(folder, file_list):
    for file_line in file_list:
        old_file_name = file_line
        new_file_name = file_line.replace(".so.3.4.5", ".so")
        #print("new: " + new_file_name)
        #print("old: " + old_file_name)
        if new_file_name != old_file_name:
            if os.path.exists(os.path.join(folder, new_file_name)):
                print("file exist: " + new_file_name)
            else:
                #print("file_name:" + old_file_name)
                cmd = "sudo ln -s "+ old_file_name + " " + new_file_name
                print(cmd)
                os.system("sudo ln -s "+ old_file_name + " " + new_file_name)
#folder = sys.argv[1]
folder = os.getcwd()
print(folder)
listdir(folder, file_list)
ChangeFileName(folder, file_list)

將文件存為“filename.py”,拷貝到/usr/local/lib下率翅,通過(guò)sudo python3 filename.py運(yùn)行练俐,創(chuàng)建軟鏈接,如圖所示以.so結(jié)束的庫(kù)文件鏈接到了.so.3.4.5的庫(kù)文件:

-rw-r--r--  1 root root     365064 1月  31 06:28 libopencv_surface_matching.so.3.4.5
lrwxrwxrwx  1 root root         23 2月  19 00:09 libopencv_text.so -> libopencv_text.so.3.4.5
lrwxrwxrwx  1 root root         23 2月  15 06:00 libopencv_text.so.3.4 -> libopencv_text.so.3.4.5
-rw-r--r--  1 root root     428928 1月  31 07:06 libopencv_text.so.3.4.5
lrwxrwxrwx  1 root root         27 2月  19 00:08 libopencv_tracking.so -> libopencv_tracking.so.3.4.5
lrwxrwxrwx  1 root root         27 2月  15 06:00 libopencv_tracking.so.3.4 -> libopencv_tracking.so.3.4.5
-rw-r--r--  1 root root    2336240 1月  31 07:38 libopencv_tracking.so.3.4.5
lrwxrwxrwx  1 root root         26 2月  19 00:33 libopencv_videoio.so -> libopencv_videoio.so.3.4.5
lrwxrwxrwx  1 root root         26 2月  15 06:00 libopencv_videoio.so.3.4 -> libopencv_videoio.so.3.4.5
-rw-r--r--  1 root root     369296 1月  31 05:55 libopencv_videoio.so.3.4.5
lrwxrwxrwx  1 root root         24 2月  19 00:10 libopencv_video.so -> libopencv_video.so.3.4.5
lrwxrwxrwx  1 root root         24 2月  15 06:00 libopencv_video.so.3.4 -> libopencv_video.so.3.4.5
-rw-r--r--  1 root root     423112 1月  31 06:29 libopencv_video.so.3.4.5
lrwxrwxrwx  1 root root         28 2月  19 03:53 libopencv_videostab.so -> libopencv_videostab.so.3.4.5
lrwxrwxrwx  1 root root         28 2月  15 06:00 libopencv_videostab.so.3.4 -> libopencv_videostab.so.3.4.5
-rw-r--r--  1 root root     365104 1月  31 07:40 libopencv_videostab.so.3.4.5
lrwxrwxrwx  1 root root         30 2月  18 15:08 libopencv_xfeatures2d.so -> libopencv_xfeatures2d.so.3.4.5
lrwxrwxrwx  1 root root         30 2月  15 06:00 libopencv_xfeatures2d.so.3.4 -> libopencv_xfeatures2d.so.3.4.5
-rw-r--r--  1 root root    2836736 1月  31 07:43 libopencv_xfeatures2d.so.3.4.5
lrwxrwxrwx  1 root root         27 2月  19 00:11 libopencv_ximgproc.so -> libopencv_ximgproc.so.3.4.5
lrwxrwxrwx  1 root root         27 2月  15 06:00 libopencv_ximgproc.so.3.4 -> libopencv_ximgproc.so.3.4.5
-rw-r--r--  1 root root    1299184 1月  31 07:48 libopencv_ximgproc.so.3.4.5
lrwxrwxrwx  1 root root         29 2月  19 00:11 libopencv_xobjdetect.so -> libopencv_xobjdetect.so.3.4.5
lrwxrwxrwx  1 root root         29 2月  15 06:00 libopencv_xobjdetect.so.3.4 -> libopencv_xobjdetect.so.3.4.5
-rw-r--r--  1 root root      99232 1月  31 07:51 libopencv_xobjdetect.so.3.4.5
lrwxrwxrwx  1 root root         25 2月  19 03:53 libopencv_xphoto.so -> libopencv_xphoto.so.3.4.5
lrwxrwxrwx  1 root root         25 2月  15 06:00 libopencv_xphoto.so.3.4 -> libopencv_xphoto.so.3.4.5
-rw-r--r--  1 root root     242880 1月  31 06:32 libopencv_xphoto.so.3.4.5

編譯測(cè)試

在tengine目錄下編譯冕臭,編譯完后運(yùn)行測(cè)試程序進(jìn)行測(cè)試:

sudo make
sudo make install
./build/tests/bin/bench_sqz
run-time library version: 1.0.0-github
REPEAT COUNT= 100
Repeat [100] time 55990.35 us per RUN. used 5599035 us
0.2763 - "n02123045 tabby, tabby cat"
0.2673 - "n02123159 tiger cat"
0.1766 - "n02119789 kit fox, Vulpes macrotis"
0.0827 - "n02124075 Egyptian cat"
0.0777 - "n02085620 Chihuahua"
--------------------------------------
ALL TEST DONE
./build/tests/bin/bench_mobilenet
run-time library version: 1.0.0-github
REPEAT COUNT= 100
Repeat [100] time 56649.14 us per RUN. used 5664914 us
8.5976 - "n02123159 tiger cat"
7.9550 - "n02119022 red fox, Vulpes vulpes"
7.8679 - "n02119789 kit fox, Vulpes macrotis"
7.4274 - "n02113023 Pembroke, Pembroke Welsh corgi"
6.3646 - "n02123045 tabby, tabby cat"
ALL TEST DONE

運(yùn)行Tengine自帶的MobileNet SSD

編譯測(cè)試代碼

在tengine目錄下example文件夾中有一個(gè)mobilenet_ssd的子目錄腺晾,打開(kāi)CMakeLists.txt,在set( INSTALL_DIR ${TENGINE_DIR}/install/)前增加一句設(shè)置TENGINE_DIR值的語(yǔ)句:

set( TENGINE_DIR ~/work/Tengine ) 

下載模型文件

Tengine提供了模型下載——Tengine_models|百度云(提取碼:57vb)
找到mobilenet_ssd文件夾把其中的MobileNetSSD_deploy.prototxt和MobileNetSSD_deploy.caffemodel下載下來(lái)放到./models目錄下就行辜贵,以下代碼用于將下載在ftp文件夾里的模型傳到小機(jī)中悯蝉。

wget ftp://192.168.199.1/sda1/MobileNetSSD_deploy.prototxt --ftp-user=root --ftp-password="password"
wget ftp://192.168.199.1/sda1/MobileNetSSD_deploy.caffemodel --ftp-user=root --ftp-password="password"

編譯

編譯MobileNet SSD示例:

cd ~/work/Tengine/examples/mobilenet_ssd
cmake .
make
./MSSD -i test.jpg
/home/dolphin/work/tengine/examples/mobilenet_ssd/MSSD
proto file not specified,using /home/dolphin/work/tengine/models/MobileNetSSD_deploy.prototxt by default
model file not specified,using /home/dolphin/work/tengine/models/MobileNetSSD_deploy.caffemodel by default
--------------------------------------
repeat 1 times, avg time per run is 118.913 ms
detect result num: 6
dog     :99%
BOX:( 322.588 , 232.231 ),( 455.996 , 330.833 )
person  :99%
BOX:( 213.043 , 153.082 ),( 310.846 , 322.655 )
person  :96%
BOX:( 536.058 , 76.8777 ),( 709.835 , 391.781 )
dog     :90%
BOX:( 177.256 , 296.386 ),( 258.995 , 461.81 )
person  :89%
BOX:( 499.474 , 72.645 ),( 619.208 , 369.286 )
person  :74%
BOX:( 149.663 , 130.89 ),( 217.314 , 245.324 )
======================================
[DETECTED IMAGE SAVED]: save.jpg
======================================

運(yùn)行后會(huì)存儲(chǔ)一張加框的圖片“save.jpg”。


運(yùn)行MobileNet SSD并檢測(cè)視頻

修改mssd.cpp的代碼:

#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "tengine_c_api.h"
#include <sys/time.h>
#include <stdio.h>
#include "common.hpp"
#include <pthread.h>
#include <sched.h>

#define DEF_PROTO "../../models/MobileNetSSD_deploy.prototxt"
#define DEF_MODEL "../../models/MobileNetSSD_deploy.caffemodel"

struct Box
{
    float x0;
    float y0;
    float x1;
    float y1;
    int class_idx;
    float score;
};

void get_input_data_ssd(cv::Mat img, float* input_data, int img_h,  int img_w){
    if (img.empty()){
        std::cerr << "Failed to read image from camera.\n";
        return;
    }
   
    cv::resize(img, img, cv::Size(img_h, img_w));
    img.convertTo(img, CV_32FC3);
    float *img_data = (float *)img.data;
    int hw = img_h * img_w;

    float mean[3]={127.5,127.5,127.5};
    for (int h = 0; h < img_h; h++){
        for (int w = 0; w < img_w; w++){
            for (int c = 0; c < 3; c++){
                input_data[c * hw + h * img_w + w] = 0.007843* (*img_data - mean[c]);
                img_data++;
            }
        }
    }
}

void post_process_ssd(cv::Mat img, float threshold,float* outdata,int num){
#if 0
    const char* class_names[] = {"background",
                    "airplane", "bicycle", "bird", "boat",
                    "bus", "car", "chair", "dog", "motorcycle",
                    "panther", "tiger"};
#else
    const char* class_names[] = {"background",   "aeroplane",   "bicycle",   "bird",   "boat",   "bottle",
                     "bus",   "car",   "cat",   "chair",  "cow",   "diningtable",
                     "dog",   "horse",   "motorbike",   "person",   "pottedplant",   "sheep",
                     "sofa",   "train",   "tvmonitor"};
#endif  
    int raw_h = img.size().height;
    int raw_w = img.size().width;
    std::vector<Box> boxes;
    int line_width=raw_w*0.002;
    printf("detect result num: %d \n",num);
    for (int i=0;i<num;i++){
        if(outdata[1]>=threshold){
            Box box;
            box.class_idx=outdata[0];
            box.score=outdata[1];
            box.x0=outdata[2]*raw_w;
            box.y0=outdata[3]*raw_h;
            box.x1=outdata[4]*raw_w;
            box.y1=outdata[5]*raw_h;
            boxes.push_back(box);
            printf("%s\t:%.0f%%\n", class_names[box.class_idx], box.score * 100);
            printf("BOX:( %g , %g ),( %g , %g )\n",box.x0,box.y0,box.x1,box.y1);
        }
        outdata+=6;
    }
#if 0
    for(int i=0;i<(int)boxes.size();i++){
        Box box=boxes[i];
        cv::rectangle(img, cv::Rect(box.x0, box.y0,(box.x1-box.x0),(box.y1-box.y0)),cv::Scalar(255, 255, 0),line_width);
        std::ostringstream score_str;
        score_str<<box.score;
        std::string label = std::string(class_names[box.class_idx]) + ": " + score_str.str();
        int baseLine = 0;
        cv::Size label_size = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
        cv::rectangle(img, cv::Rect(cv::Point(box.x0,box.y0- label_size.height),
                                  cv::Size(label_size.width, label_size.height + baseLine)),
                      cv::Scalar(255, 255, 0), CV_FILLED);
        cv::putText(img, label, cv::Point(box.x0, box.y0),
                    cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));
    }
#endif
}

float outdata[15*6];
cv::Mat frame;
int detect_num;
bool quit_flag = false;
graph_t graph;

pthread_mutex_t m_frame, m_outdata, m_quit;
void *th_vedio(void *){
    //cv::VideoCapture capture(0);     // usb camera
    cv::VideoCapture capture("test.mp4");    // vedio
    capture.set(CV_CAP_PROP_FRAME_WIDTH, 960);
    capture.set(CV_CAP_PROP_FRAME_HEIGHT, 540);
#if 0
    cv::namedWindow("MSSD", CV_WINDOW_NORMAL);
    cvResizeWindow("MSSD", 1280, 720);
#endif
    while(1){
        float show_threshold=0.25;
        pthread_mutex_lock(&m_frame);
        capture >> frame;
        pthread_mutex_lock(&m_outdata);
        post_process_ssd(frame, show_threshold, outdata, detect_num);
        pthread_mutex_unlock(&m_outdata);
#if 0
    cv::imshow("MSSD", frame);
#endif
    pthread_mutex_unlock(&m_frame);
        if( cv::waitKey(10) == 'q' ){
            pthread_mutex_lock(&m_quit);
            quit_flag = true;
            pthread_mutex_unlock(&m_quit);
            break;
        }
    usleep(500000);
    }
}

void *th_detect(void*){
    // input
    int img_h = 300;
    int img_w = 300;
    int img_size = img_h * img_w * 3;
    float *input_data = (float *)malloc(sizeof(float) * img_size);

    int node_idx=0;
    int tensor_idx=0;
    tensor_t input_tensor = get_graph_input_tensor(graph, node_idx, tensor_idx);
    if(!check_tensor_valid(input_tensor)){
        printf("Get input node failed : node_idx: %d, tensor_idx: %d\n",node_idx,tensor_idx);
        return NULL;
    }

    int dims[] = {1, 3, img_h, img_w};
    set_tensor_shape(input_tensor, dims, 4);
    prerun_graph(graph);

    int repeat_count = 1;
    const char *repeat = std::getenv("REPEAT_COUNT");

    if (repeat)
        repeat_count = std::strtoul(repeat, NULL, 10);

    int out_dim[4];
    tensor_t out_tensor;
    while(1){
        pthread_mutex_lock(&m_quit);
        if(quit_flag)  break;
        pthread_mutex_unlock(&m_quit);

        struct timeval t0, t1;
        float total_time = 0.f;

        for (int i = 0; i < repeat_count; i++){
            pthread_mutex_lock(&m_frame);
            get_input_data_ssd(frame, input_data, img_h,  img_w);
            pthread_mutex_unlock(&m_frame);

            gettimeofday(&t0, NULL);
            set_tensor_buffer(input_tensor, input_data, img_size * 4);
            run_graph(graph, 1);

            gettimeofday(&t1, NULL);
            float mytime = (float)((t1.tv_sec * 1000000 + t1.tv_usec) - (t0.tv_sec * 1000000 + t0.tv_usec)) / 1000;
            total_time += mytime;
        }
        std::cout << "--------------------------------------\n";
        std::cout << "repeat " << repeat_count << " times, avg time per run is " << total_time / repeat_count << " ms\n";

        out_tensor = get_graph_output_tensor(graph, 0,0);
        get_tensor_shape( out_tensor, out_dim, 4);
        pthread_mutex_lock(&m_outdata);
        detect_num = out_dim[1] <= 15 ? out_dim[1] : 15;
        memcpy(outdata, get_tensor_buffer(out_tensor), sizeof(float)*detect_num*6);
        pthread_mutex_unlock(&m_outdata);
    }

    free(input_data);
}

int main(int argc, char *argv[])
{
    const std::string root_path = get_root_path();
    std::string proto_file;
    std::string model_file;

    int res;
    while( ( res=getopt(argc,argv,"p:m:h"))!= -1){
        switch(res){
            case 'p':
                proto_file=optarg;
                break;
            case 'm':
                model_file=optarg;
                break;
            case 'h':
                std::cout << "[Usage]: " << argv[0] << " [-h]\n"
                          << "   [-p proto_file] [-m model_file]\n";
                return 0;
            default:
                break;
        }
    }

    const char *model_name = "mssd_300";
    if(proto_file.empty()){
        proto_file = DEF_PROTO;
        std::cout<< "proto file not specified,using "<< proto_file << " by default\n";

    }
    if(model_file.empty()){
        model_file = DEF_MODEL;
        std::cout<< "model file not specified,using "<< model_file << " by default\n";
    }

    // init tengine
    init_tengine_library();
    if (request_tengine_version("0.1") < 0)
        return 1;
    if (load_model(model_name, "caffe", proto_file.c_str(), model_file.c_str()) < 0)
        return 1;
    std::cout << "load model done!\n";
   
    // create graph
    graph = create_runtime_graph("graph", model_name, NULL);
    if (!check_graph_valid(graph)){
        std::cout << "create graph0 failed\n";
        return 1;
    }

    pthread_mutex_init(&m_frame, NULL);
    pthread_mutex_init(&m_outdata, NULL);
    pthread_mutex_init(&m_quit, NULL);


    pthread_t id1, id2;
    pthread_create(&id1, NULL, th_vedio, NULL);
    pthread_create(&id2, NULL, th_detect, NULL);

    pthread_join(id1, NULL);
    pthread_join(id2, NULL);

    pthread_mutex_destroy(&m_frame);
    pthread_mutex_destroy(&m_outdata);
    pthread_mutex_destroy(&m_quit);

    postrun_graph(graph);
    destroy_runtime_graph(graph);
    remove_model(model_name);

    return 0;
}

修改完后即可編譯運(yùn)行了:

./MSSD
/home/dolphin/work/tengine/examples/test_mssd/MSSD
proto file not specified,using ../../models/MobileNetSSD_deploy.prototxt by default
model file not specified,using ../../models/MobileNetSSD_deploy.caffemodel by default
load model done!
detect result num: 0
--------------------------------------
repeat 1 times, avg time per run is 134.089 ms
--------------------------------------
repeat 1 times, avg time per run is 120.139 ms
--------------------------------------
repeat 1 times, avg time per run is 124.079 ms
detect result num: 7
aeroplane       :85%
BOX:( 454.456 , 65.2393 ),( 647.656 , 150.583 )
motorbike       :82%
BOX:( 769.058 , 193.725 ),( 1017.29 , 404.13 )
diningtable     :80%
BOX:( -10.7789 , 273.882 ),( 1049.22 , 603.307 )
chair   :70%
BOX:( 235.559 , 190.266 ),( 401.388 , 467.526 )
bird    :69%
BOX:( 564.433 , 367.69 ),( 808.597 , 480.245 )
person  :36%
BOX:( 796.151 , 190.622 ),( 985.793 , 380.198 )
pottedplant     :31%
BOX:( 2.61609 , 2.01261 ),( 278.975 , 341.759 )

Tengine的GPU/CPU異構(gòu)調(diào)度

根據(jù)算子對(duì)計(jì)算圖(圖表)進(jìn)行切分托慨,切分的子圖(子圖)再通過(guò)調(diào)度器分配給相應(yīng)的設(shè)備鼻由。由于GPU的編程較復(fù)雜,會(huì)優(yōu)先支持神經(jīng)網(wǎng)絡(luò)中的常用算子(例如:CONV,POOL嗡靡,F(xiàn)C等)跺撼,而對(duì)于某些網(wǎng)絡(luò)中特有的算子(例如檢測(cè)網(wǎng)絡(luò)SSD中的PRIORBOX等),就會(huì)分配給CPU進(jìn)行計(jì)算讨彼。


Tengine在RK3399上做了異構(gòu)的處理歉井,可以充分發(fā)揮RK3399的運(yùn)算能力,提升推理速度哈误。
RK3399的GPU為Mali-T860哩至,CPU包括:雙核Cortex-A72+四核Cortex-A53。
為了發(fā)揮GPU的最高性能蜜自,需要設(shè)置GPU的頻率到最高頻率:

sudo su
echo “performance” > /sys/devices/platform/ff9a0000.gpu/devfreq/ff9a0000.gpu/governor
cat /sys/devices/platform/ff9a0000.gpu/devfreq/ff9a0000.gpu/cur_freq
800000000

編譯有關(guān)項(xiàng)目

Tengine是通過(guò)調(diào)用Arm Compute Library(ACL)進(jìn)行GPU加速菩貌,使用的ACL版本為18.05,從git上獲取代碼后重荠,編譯即可箭阶,并注意文件所在的路徑,將在下一步操作中被引用:

git clone https://github.com/ARM-software/ComputeLibrary.git
git checkout v18.05
scons Werror = 1 -j4 debug=0 asserts=1 neon=0 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a

下載Tengine項(xiàng)目:

git clone https://github.com/OAID/Tengine.git
cp makefile.config.example makefile.config
vim makefile.config

在配置文件中打開(kāi)ACL開(kāi)關(guān)戈鲁,并設(shè)定上一步操作時(shí)ACL的路徑:

CONFIG_ACL_GPU=Y
ACL_ROOT=/home/dolphin/ComputeLibrary

編譯安裝:

make -j4
make install

下載MobilenetSSD模型仇参,可以從Tengine_models | 百度云(提取碼:57vb)
下載模型到tengine/models/路徑下。
進(jìn)入tengine目錄下example文件夾中有一個(gè)mobilenet_ssd的子目錄婆殿,打開(kāi)CMakeLists.txt诈乒,在set( INSTALL_DIR ${TENGINE_DIR}/install/)前增加一句設(shè)置TENGINE_DIR值的語(yǔ)句:

set( TENGINE_DIR ~/work/Tengine ) 

cmake完成自動(dòng)配置后,運(yùn)行make來(lái)編譯:

cmake . 
make

執(zhí)行時(shí)需要設(shè)置一些環(huán)境變量:

export GPU_CONCAT=0#禁用gpu run concat婆芦,避免cpu和gpu之間頻繁的數(shù)據(jù)傳輸
export ACL_FP16=1#支持GPU用float16的數(shù)據(jù)格式進(jìn)行推理計(jì)算
export REPEAT_COUNT=100#讓算法重復(fù)執(zhí)行100次怕磨,取平均時(shí)間作為性能數(shù)據(jù);
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ ./MSSD
ACL Graph Initialized
Driver: ACLGraph probed 1 devices
repeat 100 times, avg time per run is 196.927 ms
detect result num: 3
dog     :100%
BOX:( 138.509 , 209.394 ),( 324.57 , 541.314 )
car     :100%
BOX:( 467.315 , 72.8045 ),( 687.269 , 171.128 )
bicycle :100%
BOX:( 107.395 , 140.657 ),( 574.212 , 415.188 )
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ export TENGINE_CPU_LIST=4
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ ./MSSD
ENV SET: [4]
ACL Graph Initialized
Driver: ACLGraph probed 1 devices
repeat 100 times, avg time per run is 313.66 ms
detect result num: 3
dog     :100%
BOX:( 138.509 , 209.394 ),( 324.57 , 541.314 )
car     :100%
BOX:( 467.315 , 72.8045 ),( 687.269 , 171.128 )
bicycle :100%
BOX:( 107.395 , 140.657 ),( 574.212 , 415.188 )
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ export TENGINE_CPU_LIST=4,5
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ ./MSSD
ENV SET: [4,5]
ACL Graph Initialized
Driver: ACLGraph probed 1 devices
repeat 100 times, avg time per run is 241.372 ms
detect result num: 3
dog     :100%
BOX:( 138.509 , 209.394 ),( 324.57 , 541.314 )
car     :100%
BOX:( 467.315 , 72.8045 ),( 687.269 , 171.128 )
bicycle :100%
BOX:( 107.395 , 140.657 ),( 574.212 , 415.188 )
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ export TENGINE_CPU_LIST=0,1,2,3
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ ./MSSD
ENV SET: [0,1,2,3]
ACL Graph Initialized
Driver: ACLGraph probed 1 devices
repeat 100 times, avg time per run is 221.02 ms
detect result num: 3
dog     :100%
BOX:( 138.509 , 209.394 ),( 324.57 , 541.314 )
car     :100%
BOX:( 467.315 , 72.8045 ),( 687.269 , 171.128 )
bicycle :100%
BOX:( 107.395 , 140.657 ),( 574.212 , 415.188 )
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ unset TENGINE_CPU_LIST
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ export GPU_CONCAT=0
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ export ACL_FP16=1
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ taskset 0x4 ./MSSD -d acl_opencl
/home/pi/work/Tengine/examples/mobilenet_ssd/MSSD
ACL Graph Initialized
Driver: ACLGraph probed 1 devices
repeat 100 times, avg time per run is 202.103 ms
detect result num: 3
dog     :100%
BOX:( 138.419 , 209.091 ),( 324.504 , 541.568 )
car     :100%
BOX:( 467.356 , 72.9224 ),( 687.269 , 171.123 )
bicycle :100%
BOX:( 107.053 , 140.221 ),( 574.472 , 415.248 )
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ unset ACL_FP16
pi@NanoPi-NEO4:~/work/Tengine/examples/mobilenet_ssd$ taskset 0x4 ./MSSD -d acl_opencl
/home/pi/work/Tengine/examples/mobilenet_ssd/MSSD
ACL Graph Initialized
Driver: ACLGraph probed 1 devices
repeat 100 times, avg time per run is 272.369 ms
detect result num: 3
dog     :100%
BOX:( 138.509 , 209.394 ),( 324.57 , 541.314 )
car     :100%
BOX:( 467.315 , 72.8045 ),( 687.269 , 171.128 )
bicycle :100%
BOX:( 107.395 , 140.657 ),( 574.212 , 415.188 )

執(zhí)行的時(shí)候需要加-d acl_opencl來(lái)打開(kāi)使用gpu的開(kāi)關(guān)。
從下圖可以看到消约,GPU用半浮點(diǎn)精度f(wàn)loat16的檢測(cè)結(jié)果是正確的肠鲫。



以下是對(duì)比Tengine用純CPU進(jìn)行MobilenetSSD的推理計(jì)算的性能:

運(yùn)行環(huán)境 運(yùn)算時(shí)間 時(shí)間對(duì)比
CPU:2A72+4A53 190.927 14%
CPU:1A72 313.66 -42%
CPU:2A72 241.372 -9%
CPU:4A53 221.02 0%
GPU:FP16+CPU:1A72 202.103 9%
GPU:FP32+CPU:1A72 272.369 -23%

可以看出,通過(guò)GPU/CPU異構(gòu)調(diào)度的性能大約是兩個(gè)CPU大核A72的性能或粮,或者4個(gè)A53的小核的性能导饲,而用6個(gè)核的速度最快。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末被啼,一起剝皮案震驚了整個(gè)濱河市帜消,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌浓体,老刑警劉巖泡挺,帶你破解...
    沈念sama閱讀 217,509評(píng)論 6 504
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異命浴,居然都是意外死亡娄猫,警方通過(guò)查閱死者的電腦和手機(jī)贱除,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,806評(píng)論 3 394
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)媳溺,“玉大人月幌,你說(shuō)我怎么就攤上這事⌒危” “怎么了扯躺?”我有些...
    開(kāi)封第一講書(shū)人閱讀 163,875評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)蝎困。 經(jīng)常有香客問(wèn)我录语,道長(zhǎng),這世上最難降的妖魔是什么禾乘? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,441評(píng)論 1 293
  • 正文 為了忘掉前任澎埠,我火速辦了婚禮,結(jié)果婚禮上始藕,老公的妹妹穿的比我還像新娘蒲稳。我一直安慰自己,他們只是感情好伍派,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,488評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布江耀。 她就那樣靜靜地躺著,像睡著了一般拙已。 火紅的嫁衣襯著肌膚如雪决记。 梳的紋絲不亂的頭發(fā)上摧冀,一...
    開(kāi)封第一講書(shū)人閱讀 51,365評(píng)論 1 302
  • 那天倍踪,我揣著相機(jī)與錄音,去河邊找鬼索昂。 笑死建车,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的椒惨。 我是一名探鬼主播缤至,決...
    沈念sama閱讀 40,190評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼康谆!你這毒婦竟也來(lái)了领斥?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,062評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤沃暗,失蹤者是張志新(化名)和其女友劉穎月洛,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體孽锥,經(jīng)...
    沈念sama閱讀 45,500評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡嚼黔,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,706評(píng)論 3 335
  • 正文 我和宋清朗相戀三年细层,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片唬涧。...
    茶點(diǎn)故事閱讀 39,834評(píng)論 1 347
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡疫赎,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出碎节,到底是詐尸還是另有隱情捧搞,我是刑警寧澤,帶...
    沈念sama閱讀 35,559評(píng)論 5 345
  • 正文 年R本政府宣布狮荔,位于F島的核電站实牡,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏轴合。R本人自食惡果不足惜创坞,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,167評(píng)論 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望受葛。 院中可真熱鬧题涨,春花似錦、人聲如沸总滩。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,779評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)闰渔。三九已至席函,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間冈涧,已是汗流浹背茂附。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 32,912評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留督弓,地道東北人营曼。 一個(gè)月前我還...
    沈念sama閱讀 47,958評(píng)論 2 370
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像愚隧,于是被迫代替她去往敵國(guó)和親蒂阱。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,779評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容