在上一篇中學(xué)習(xí)了pytorch模型如何轉(zhuǎn)為onnx模型壶辜,TensorRT推理的一般過程,以及python 接口下onnx采用TensorRT推理的示例厚掷。 本文繼續(xù)學(xué)習(xí)下C++接口下采用TensorRT推理onnx模型。
其過程和上篇文章中說到的過程一致:
- 使用
logger
創(chuàng)建builder
; -
builder
可以創(chuàng)建INetworkDefinition
即計(jì)算圖辖试; - 使用
onnxParser
解析onnx模型填充計(jì)算圖; - 由計(jì)算圖
INetworkDefinition
創(chuàng)建CudaEngine
; - 由
cudaEngine
創(chuàng)建推理的上下文IExecutionContext
.
推理過程:
- 分配輸入輸出空間
- 將輸入拷貝到緩存空間
- 進(jìn)行推理
- 將輸出拷貝到cpu
- 進(jìn)行后處理
下面我們直接以~/TensorRT-7.0.0.11/samples/sampleMnist/
為例劈狐,中間遇到的一些問題記錄下其解決辦法罐孝。
// 頭文件
#include "argsParser.h" // TensorRT-7.0.0.11/samples/common
#include "buffers.h" // TensorRT-7.0.0.11/samples/common
#include "common.h" // TensorRT-7.0.0.11/samples/common
#include "logger.h" // TensorRT-7.0.0.11/samples/common
#include "parserOnnxConfig.h" // TensorRT-7.0.0.11/samples/common
#include "NvInfer.h" // TensorRT-7.0.0.11/include
#include <cuda_runtime_api.h>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <sstream>
示例代碼中將mnist的轉(zhuǎn)換過程和推理過程封裝成了一個(gè)類。
class SampleOnnxMNIST
{
template <typename T>
using SampleUniquePtr = std::unique_ptr<T, samplesCommon::InferDeleter>;
// 模板類的指針聲明肥缔,以及對(duì)應(yīng)的deleter函數(shù)莲兢,參考
//http://senlinzhan.github.io/2015/04/20/%E8%B0%88%E8%B0%88C-%E7%9A%84%E6%99%BA%E8%83%BD%E6%8C%87%E9%92%88/
public:
SampleOnnxMNIST(const samplesCommon::OnnxSampleParams& params)
: mParams(params)
, mEngine(nullptr)
{//構(gòu)造函數(shù),params用于存儲(chǔ)需要的一些變量值
//mEngine即cudaEngine的指針
}
bool build(); // 用于構(gòu)造cudaEngine续膳, Function builds the network engine
bool infer(); // 推理 Runs the TensorRT inference engine for this sample
private:
samplesCommon::OnnxSampleParams mParams; //實(shí)例的參數(shù)改艇,
nvinfer1::Dims mInputDims; // 網(wǎng)絡(luò)輸入?yún)?shù)的維度, 和python中的tuple類型相似
nvinfer1::Dims mOutputDims; //!網(wǎng)絡(luò)輸出參數(shù)的維度
int mNumber{0}; //!< The number to classify
std::shared_ptr<nvinfer1::ICudaEngine> mEngine; // CudaEngine的指針
bool constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder,
SampleUniquePtr<nvinfer1::INetworkDefinition>& network,
SampleUniquePtr<nvinfer1::IBuilderConfig>& config,
SampleUniquePtr<nvonnxparser::IParser>& parser);
// 類似python接口中的with 環(huán)境坟岔,目的是采用onnxParser的結(jié)果填充計(jì)算圖并得到cudaEngine
bool processInput(const samplesCommon::BufferManager& buffers);
//對(duì)輸入進(jìn)行預(yù)處理谒兄,并將其存入buffer
bool verifyOutput(const samplesCommon::BufferManager& buffers);
// 輸出存放在緩存中,從中獲取輸出
};
其中命名空間 samplesCommon
中的OnnxSampleParams
在argsParser.h
中定義:
struct SampleParams
{
int batchSize{1}; //!< Number of inputs in a batch
int dlaCore{-1}; //!< Specify the DLA core to run network on.
bool int8{false}; //!< Allow runnning the network in Int8 mode.
bool fp16{false}; //!< Allow running the network in FP16 mode.
std::vector<std::string> dataDirs; //!< Directory paths where sample data files are stored
std::vector<std::string> inputTensorNames;
std::vector<std::string> outputTensorNames;
};
struct OnnxSampleParams : public SampleParams
{
std::string onnxFileName; //!< Filename of ONNX file of a network
};
可以發(fā)現(xiàn)SampleParams
中主要參數(shù)包括網(wǎng)絡(luò)輸入輸出的名稱社付, 網(wǎng)絡(luò)輸入的bs承疲,精度模式以及數(shù)據(jù)可能的存儲(chǔ)路徑,該路徑參數(shù)可以是多個(gè)鸥咖。 OnnxSampleParams
這添加了onnx模型的路徑參數(shù)燕鸽。
接下來(lái),我們先忽略類的方法的具體實(shí)現(xiàn)啼辣,看一下主函數(shù)中的代碼:
void printHelpInfo()
{
std::cout
<< "Usage: ./sample_onnx_mnist [-h or --help] [-d or --datadir=<path to data directory>] [--useDLACore=<int>]"
<< std::endl;
std::cout << "--help Display help information" << std::endl;
std::cout << "--datadir Specify path to a data directory, overriding the default. This option can be used "
"multiple times to add multiple directories. If no data directories are given, the default is to use "
"(data/samples/mnist/, data/mnist/)"
<< std::endl;
std::cout << "--useDLACore=N Specify a DLA engine for layers that support DLA. Value can range from 0 to n-1, "
"where n is the number of DLA engines on the platform."
<< std::endl;
std::cout << "--int8 Run in Int8 mode." << std::endl;
std::cout << "--fp16 Run in FP16 mode." << std::endl;
}
int main(int argc, char** argv)
{
samplesCommon::Args args;
bool argsOK = samplesCommon::parseArgs(args, argc, argv); //解析輸入?yún)?shù)
if (!argsOK)
{
gLogError << "Invalid arguments" << std::endl;
printHelpInfo(); // 輸出輔助信息的函數(shù)
return EXIT_FAILURE;
}
if (args.help)
{
printHelpInfo();
return EXIT_SUCCESS;
}
auto sampleTest = gLogger.defineTest(gSampleName, argc, argv);
gLogger.reportTestStart(sampleTest);
SampleOnnxMNIST sample(initializeSampleParams(args));
// 由輸入的參數(shù)初始化 samplesCommon::OnnxSampleParams類型啊研,并由其構(gòu)建 SampleOnnxMNIST對(duì)象
gLogInfo << "Building and running a GPU inference engine for Onnx MNIST" << std::endl;
if (!sample.build()) // 構(gòu)建 cudaEngine
{
return gLogger.reportFail(sampleTest);
}
if (!sample.infer()) // 進(jìn)行推理
{
return gLogger.reportFail(sampleTest);
}
return gLogger.reportPass(sampleTest);
}
這里的gLogger.defineTest
應(yīng)該定義的是一個(gè)測(cè)試單元,與推理并無(wú)直接關(guān)系。initializeSamples(args)
用來(lái)構(gòu)建輸入類型數(shù)據(jù)
samplesCommon::OnnxSampleParams initializeSampleParams(const samplesCommon::Args& args)
{
samplesCommon::OnnxSampleParams params;
if (args.dataDirs.empty()) //!< Use default directories if user hasn't provided directory paths
{ // args.dataDirs存放的是提供數(shù)據(jù)的待搜索的路徑
params.dataDirs.push_back("data/mnist/");
params.dataDirs.push_back("data/samples/mnist/");
}
else //!< Use the data directory provided by the user
{
params.dataDirs = args.dataDirs;
}
params.onnxFileName = "mnist.onnx"; //onnx模型名
params.inputTensorNames.push_back("Input3"); //網(wǎng)絡(luò)輸入變量名
params.batchSize = 1; // 批大小
params.outputTensorNames.push_back("Plus214_Output_0"); //網(wǎng)絡(luò)的輸出名
params.dlaCore = args.useDLACore; // 是否使用DLA 深度學(xué)習(xí)加速器鸥拧,對(duì)網(wǎng)絡(luò)進(jìn)行硬件加速
params.int8 = args.runInInt8;
params.fp16 = args.runInFp16;
return params;
}
再接下來(lái)我們深入到SampleOnnxMNIST
類中方法的具體定義
1.首先是創(chuàng)建 cuda引擎的部分党远,及build
方法, 最好和我們上一篇的python API下示例結(jié)合看
bool SampleOnnxMNIST::build()
{
auto builder = SampleUniquePtr<nvinfer1::IBuilder> nvinfer1::createInferBuilder(gLogger.getTRTLogger()));
// 由Logger創(chuàng)建builder
if (!builder) return false;
const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
// 創(chuàng)建計(jì)算圖富弦,這里和python API中接口一樣麸锉,TENSORRT對(duì)于onnx僅支持full-dimension的輸入
if (!network) return false;
auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
if (!config) return false;
// 這里和python API不同,使用onnxparser 填充計(jì)算圖時(shí)還需要IBuilderConfig類型
auto parser = SampleUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, gLogger.getTRTLogger())); if (!parser) return false;
auto constructed = constructNetwork(builder, network, config, parser); //填充計(jì)算圖舆声,這是自定義的函數(shù)
if (!constructed) return false;
mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(
builder->buildEngineWithConfig(*network, *config), samplesCommon::InferDeleter());
if (!mEngine) return false; // 由計(jì)算圖創(chuàng)建Cuda引擎
assert(network->getNbInputs() == 1); // 只有一個(gè)輸入
mInputDims = network->getInput(0)->getDimensions();
assert(mInputDims.nbDims == 4);
assert(network->getNbOutputs() == 1); //只有一個(gè)輸出
mOutputDims = network->getOutput(0)->getDimensions();
assert(mOutputDims.nbDims == 2);
return true;
}
其中填充 計(jì)算圖 的代碼如下:
bool SampleOnnxMNIST::constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder,
SampleUniquePtr<nvinfer1::INetworkDefinition>& network,
SampleUniquePtr<nvinfer1::IBuilderConfig>& config,
SampleUniquePtr<nvonnxparser::IParser>& parser)
{
auto parsed = parser->parseFromFile( locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast<int>(gLogger.getReportableSeverity()));
if (!parsed) return false; // locateFile() 是從給定的dir list中定位 file的位置花沉, parseFromFile是從文件解析onnx模型
builder->setMaxBatchSize(mParams.batchSize); // 設(shè)置 最大的bs
config->setMaxWorkspaceSize(16_MiB);
if (mParams.fp16)
{
config->setFlag(BuilderFlag::kFP16);
}
if (mParams.int8)
{
config->setFlag(BuilderFlag::kINT8);
samplesCommon::setAllTensorScales(network.get(), 127.0f, 127.0f);
}
samplesCommon::enableDLA(builder.get(), config.get(), mParams.dlaCore);
return true;
}
- 其次就是 推理 階段
infer
, 包括數(shù)據(jù)預(yù)處理柳爽,前向和后處理。
bool SampleOnnxMNIST::infer()
{
// Create RAII buffer manager object
samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);
// 在 samples/common/buffers.h中定義碱屁,和python API中 HOST 的 DEVICE中申請(qǐng)現(xiàn)存一樣的功能
auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());
if (!context) return false; //創(chuàng)建推理環(huán)境
// Read the input data into the managed buffers
assert(mParams.inputTensorNames.size() == 1);
if (!processInput(buffers) return false; // 數(shù)據(jù)輸入
// Memcpy from host input buffers to device input buffers
buffers.copyInputToDevice();
bool status = context->executeV2(buffers.getDeviceBindings().data()); // 執(zhí)行推理
if (!status) return false;
// Memcpy from device output buffers to host output buffers
buffers.copyOutputToHost(); // 由cuda復(fù)制到cpu
// Verify results
if (!verifyOutput(buffers)) return false; // 驗(yàn)證輸出
return true;
}
- 最后輸入輸出定義磷脯。 input方法主要進(jìn)行了輸入的預(yù)處理,以及拷貝到指定的緩存中娩脾, 輸出則是驗(yàn)證分類是否正確
bool SampleOnnxMNIST::processInput(const samplesCommon::BufferManager& buffers)
{
const int inputH = mInputDims.d[2];
const int inputW = mInputDims.d[3];
// Read a random digit file
srand(unsigned(time(nullptr)));
std::vector<uint8_t> fileData(inputH * inputW);
mNumber = rand() % 10;
readPGMFile(locateFile(std::to_string(mNumber) + ".pgm", mParams.dataDirs), fileData.data(), inputH, inputW); // 讀取 pgm文件赵誓, samples/common/common.h中定義,讀取pgm文件柿赊,存在在 fileData.data()為首地址俩功,大小得inputH*inputW得空間。
// Print an ascii representation
gLogInfo << "Input:" << std::endl;
for (int i = 0; i < inputH * inputW; i++)
{
gLogInfo << (" .:-=+*#%@"[fileData[i] / 26]) << (((i + 1) % inputW) ? "" : "\n");
}
gLogInfo << std::endl;
float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
for (int i = 0; i < inputH * inputW; i++)
{
hostDataBuffer[i] = 1.0 - float(fileData[i] / 255.0);
}
return true;
}
bool SampleOnnxMNIST::verifyOutput(const samplesCommon::BufferManager& buffers)
{
const int outputSize = mOutputDims.d[1];
float* output = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));
float val{0.0f};
int idx{0};
// Calculate Softmax
float sum{0.0f};
for (int i = 0; i < outputSize; i++)
{
output[i] = exp(output[i]);
sum += output[i];
}
gLogInfo << "Output:" << std::endl;
for (int i = 0; i < outputSize; i++)
{
output[i] /= sum;
val = std::max(val, output[i]);
if (val == output[i])
{
idx = i;
}
gLogInfo << " Prob " << i << " " << std::fixed << std::setw(5) << std::setprecision(4) << output[i] << " "
<< "Class " << i << ": " << std::string(int(std::floor(output[i] * 10 + 0.5f)), '*') << std::endl;
}
gLogInfo << std::endl;
return idx == mNumber && val > 0.9f;
}
- 編寫
CMakeLists.txt
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(FP_TEST)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
set(CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER})
find_package(CUDA)
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS}
-o3
-gencode arch=compute_70,code=sm_70
)
find_package(Protobuf)
if(PROTOBUF_FOUND)
message(STATUS " version: ${Protobuf_VERSION}")
message(STATUS " libraries: ${PROTOBUF_LIBRARIES}")
message(STATUS " include path: ${PROTOBUF_INCLUDE_DIR}")
else()
message(WARNING "Protobuf not found, onnx model convert tool won't be built")
endif()
set(TENSORRT_ROOT /home/zwzhou/packages/TensorRT-7.0.0.11)
find_path(TENSORRT_INCLUDE_DIR NvInfer.h
HINTS ${TENSORRT_ROOT} ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES include)
MESSAGE(STATUS "Found TensorRT headers at ${TENSORRT_INCLUDE_DIR}")
find_library(TENSORRT_LIBRARY_INFER nvinfer
HINTS ${TENSORRT_ROOT} ${TENSORRT_BUILD} ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 lib/x64)
find_library(TENSORRT_LIBRARY_INFER_PLUGIN nvinfer_plugin
HINTS ${TENSORRT_ROOT} ${TENSORRT_BUILD} ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 lib/x64)
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_INFER_PLUGIN})
MESSAGE(STATUS "Find TensorRT libs at ${TENSORRT_LIBRARY}")
find_package_handle_standard_args(
TENSORRT DEFAULT_MSG TENSORRT_INCLUDE_DIR TENSORRT_LIBRARY)
if(NOT TENSORRT_FOUND)
message(ERROR
"Cannot find TensorRT library.")
endif()
LINK_LIBRARIES("/home/zwzhou/packages/TensorRT-7.0.0.11/lib/libnvonnxparser.so")
LINK_LIBRARIES("/home/zwzhou/packages/TensorRT-7.0.0.11/lib/libnvinfer.so")
INCLUDE_DIRECTORIES("/home/zwzhou/packages/TensorRT-7.0.0.11/samples/common")
# opencv
set(OpenCV_DIR /home/zwzhou/opencv4/lib/cmake/opencv4/)
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
# OpenCV_INCLUDE_DIRS 中存儲(chǔ)OpenCV相關(guān)頭文件
set(OpenCV_LIBS opencv_core opencv_imgproc opencv_objdetect )
##############################################
# set(gLogger /home/zwzhou/packages/TensorRT-7.0.0.11/samples/common/logger.cpp)
##############################################
cuda_add_executable(mtest ./onnx2trt_test.cpp ${gLogger})
target_include_directories(mtest PUBLIC ${CUDA_INCLUDE_DIRS} ${TENSORRT_INCLUDE_DIR})
target_link_libraries(mtest ${CUDA_LIBRARIES} ${OpenCV_LIBS} ${TENSORRT_LIBRARY} ${CUDA_CUBLAS_LIBRARIES} ${CUDA_cudart_static_LIBRARY})
- 遇到的問題:
- 找不到 各種后綴 cuda.9.0的動(dòng)態(tài)鏈接庫(kù)
解決辦法: 懷疑可能是因?yàn)樗铆h(huán)境下是cuda 9.2碰声, 但安裝的TensorRT對(duì)應(yīng)的是cuda9.0. 于是在自己的目錄下重新安裝了 cuda9.0, 安裝過程參考:非 root權(quán)限安裝 cuda和cudnn诡蜓。
安裝之后設(shè)置環(huán)境變量.bashrc
:
# CUDA9.0
export CUDA_HOME=/home/zwzhou/cuda-9.0
export PATH=$PATH:$CUDA_HOME/bin
# export PATH=$CUDA_HOME/bin:$PATH
#export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/zwzhou/cuda-9.0/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/zwzhou/packages/TensorRT-7.0.0.11/lib
export PKG_CONFIG_PATH=/home/zwzhou/opencv4/lib/pkgconfig:$PKG_CONFIG_PATH
執(zhí)行 source ~/.bashrc
之后,發(fā)現(xiàn)nvcc -V
仍然顯示 cuda 9.2胰挑,此時(shí)輸出echo $PATH
發(fā)現(xiàn):
/home/zwzhou/bin:/home/zwzhou/.local/bin:/home/zwzhou/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/zwzhou/.dotnet/tools:/usr/local/cuda-9.2/bin:/home/zwzhou/cuda-9.0/bin:/home/zwzhou/cuda-9.0/bin:/usr/local/cuda/bin
即先采用的是cuda 9.2的nvcc和路徑蔓罚,所以修改PATH變量導(dǎo)出為: export PATH=$CUDA_HOME/bin:$PATH
再次 nvcc -V
顯示 cuda 9.0 版本。
- 找不到
gLogger
以及對(duì)應(yīng)的各種函數(shù)
解決辦法 將gLogger.cpp
文件納入到依賴中瞻颂,即在CMakeLists.txt
中修改如下:
set(gLogger /home/zwzhou/packages/TensorRT-7.0.0.11/samples/common/logger.cpp)
cuda_add_executable(mtest ./onnx2trt_test.cpp ${gLogger})
- 找不到
*.pgm
文件豺谈,即找不到MNIST的圖像數(shù)據(jù)
解決辦法:執(zhí)行TensorRT-7.0.0.11/data/mnist/download_pgms.py
文件,會(huì)下載并解壓10個(gè)pgm文件在對(duì)應(yīng)文件夾贡这。
- 嘗試bs>1的情形茬末。
- maxBatchSize>1,但輸入batchsize=1時(shí)依然能正確運(yùn)行盖矫。
- 大batchsize的輸入輸出团南。修改部分如下:
const int BATCHSIZE = 2; //全局變量
...
bool SampleOnnxMNIST::processInput(const samplesCommon::BufferManager& buffers)
{ //批量讀入
const int inputH = mInputDims.d[2];
const int inputW = mInputDims.d[3];
int batch_size = BATCHSIZE;
srand(unsigned(time(nullptr)));
std::vector<uint8_t> fileData(batch_size * inputH * inputW);
for(int i=0; i<batch_size; ++i)
{
mNumber = rand() % 10;
readPGMFile(locateFile(std::to_string(mNumber) + ".pgm", mParams.dataDirs), fileData.data()+i*(inputH*inputW), inputH, inputW);
std::cout<<std::to_string(mNumber) + ".pgm"<<"\n";
}
// Print an ascii representation
gLogInfo << "Input:" << std::endl;
for (int i = 0; i < batch_size* inputH * inputW; i++)
{
gLogInfo << (" .:-=+*#%@"[fileData[i] / 26]) << (((i + 1) % inputW) ? "" : "\n");
}
gLogInfo << std::endl;
float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
for (int i = 0; i < batch_size * inputH * inputW; i++)
{
hostDataBuffer[i] = 1.0 - float(fileData[i] / 255.0);
}
return true;
}
bool SampleOnnxMNIST::verifyOutput(const samplesCommon::BufferManager& buffers)
{ // 批量輸出
const int outputSize = mOutputDims.d[1];
float* output = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));
float val{0.0f};
int idx{0};
// Calculate Softmax
float sum{0.0f};
for(int b=0; b<BATCHSIZE; ++b)
{
for (int i = b*outputSize; i < (b+1)*outputSize; i++)
{
output[i] = exp(output[i]);
sum += output[i];
}
gLogInfo << "Output:" << std::endl;
for (int i = b*outputSize; i < (b+1)*outputSize; i++)
{
output[i] /= sum;
val = std::max(val, output[i]);
if (val == output[i])
{
idx = i;
}
gLogInfo << " Prob " << i << " " << std::fixed << std::setw(5) << std::setprecision(4) << output[i] << " "
<< "Class " << i << ": " << std::string(int(std::floor(output[i] * 10 + 0.5f)), '*') << std::endl;
}
gLogInfo << std::endl;
}
return idx == mNumber && val > 0.9f;
}
samplesCommon::OnnxSampleParams initializeSampleParams(const samplesCommon::Args& args)
{ //設(shè)置最大batchsize
...
params.batchSize = BATCHSIZE;
...
return params;
}
輸出結(jié)果為:
發(fā)現(xiàn)TensorRT對(duì)于ONNX的大batchsize的支持還是和python API相同的問題,因?yàn)閛nnx存儲(chǔ)時(shí)bs=1炼彪,所以只有第一個(gè)sample輸出是正確的,其余的輸出都為0.
參考:
nvcc定位不到的問題
gLogger找不到問題
非root權(quán)限安裝多版本cuda和cudnn
利用TensorRT對(duì)深度學(xué)習(xí)進(jìn)行加速
利用TensorRT實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)提速(讀取onnx模型并運(yùn)行
Nvidia/TensorRT doc
動(dòng)態(tài)batchsize