一. 安裝 Nvidia Driver 和 CUDA Toolkit
參考: https://blog.csdn.net/xueshengke/article/details/78134991
Nvidia Driver下載鏈接: https://www.nvidia.cn/Download/index.aspx?lang=cn
cuda-toolkit下載鏈接: https://developer.nvidia.com/cuda-toolkit-archive
安裝dkms: https://blog.csdn.net/qq_41613251/article/details/108488681
1.1 安裝Nvidia Driver
1) 查看系統(tǒng)內(nèi)核版本
# uname -r
# 3.10.0-1062.el7.x86_64 ; 不同操作系統(tǒng)的內(nèi)核版本會不一樣软棺,最好記住它
# df -h ; 確認 boot 目錄的空間不少于 300 MB
2) 屏蔽 nouveau 驅(qū)動
nouveau 是系統(tǒng)自帶的一個顯示驅(qū)動程序抹恳,需要先將其禁用蒸辆,然后再進行下一步操作阅爽,否則在安裝顯卡驅(qū)動時,會提示:You appear to be running an X server …忌卤,然后安裝失敗瘩例。分別打開如下兩個文件(如果沒有就創(chuàng)建一個)俘闯,并在其中輸入如下兩句蒲拉,然后保存肃拜。
# vim /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
# vim /lib/modprobe.d/nvidia-installer-disable-nouveau.conf
...
blacklist nouveau
options nouveau modeset=0
3) 重做 initramfs鏡像
重做鏡像之后啟動才會屏蔽驅(qū)動,否則無效雌团,重做時應先rm已有驅(qū)動燃领,否則會提示無法覆蓋。
這一步需要確保 boot 文件目錄的空間足夠辱姨,否則會失敗柿菩。建議大于 400 MB
# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
# dracut /boot/initramfs-$(uname -r).img $(uname -r) --force
# rm /boot/initramfs-$(uname -r).img.bak ; 這一步可不執(zhí)行
4) 重啟
# systemctl set-default multi-user.target
# init 3
# reboot
5) 預安裝組件
預安裝一些必需的組件戚嗅,需要聯(lián)網(wǎng)
# yum install gcc kernel-devel kernel-headers
執(zhí)行如下的安裝步驟雨涛,必需指定 kernel source path,否則會報錯懦胞;kernel 的版本和系統(tǒng)內(nèi)核有關(guān)替久,可能會有差別
# cd /to/your/directory/ ; 跳轉(zhuǎn)到驅(qū)動所在的目錄
# ./NVIDIA-Linux-x86_64-510.47.03.run --kernel-source-path=/usr/src/kernels/3.10.0-1062.el7.x86_64 -k $(uname -r)
執(zhí)行后,開始解壓驅(qū)動包躏尉,進入安裝步驟蚯根,可能中間會出現(xiàn)一些警告,但是不影響
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 ******.......................................
..................................................................
..................................................................
許可證 -accept
安裝 32 位兼容庫 -yes
安裝順利完成
檢查驅(qū)動安裝情況
執(zhí)行如下兩條語句胀糜,如果出現(xiàn)顯卡的型號信息颅拦,說明驅(qū)動已經(jīng)安裝成功。
# lspci |grep NVIDIA
03:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 30% 20C P0 84W / 350W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:25:00.0 Off | N/A |
| 31% 22C P0 91W / 350W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:81:00.0 Off | N/A |
| 32% 24C P0 88W / 350W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:C1:00.0 Off | N/A |
| 33% 22C P0 85W / 350W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
1.2 安裝CUDA Toolkit
Installation Instructions:
wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-rhel7-11-4-local-11.4.0_470.42.01-1.x86_64.rpm
sudo rpm -i cuda-repo-rhel7-11-4-local-11.4.0_470.42.01-1.x86_64.rpm
sudo yum clean all
# sudo yum -y install nvidia-driver-latest-dkms cuda
# sudo yum -y install cuda-drivers
sudo yum -y install cuda # 只需要執(zhí)行這一條
不需要具體指定 cuda 版本號教藻,系統(tǒng)已經(jīng)建立了一個鏈接 cuda -> cuda-11, 如果沒有則執(zhí)行
ln -s /usr/local/cuda-11 /usr/local/cuda # 創(chuàng)建軟鏈接
配置環(huán)境變量
# vim /etc/profile
...
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# source /etc/profile ; 使環(huán)境變量立即生效
CUDA 測試
首先,測試 cuda, nvcc 命令是否可用
# cuda ; 按兩下 tab 鍵
cuda cuda-gdb cuda-install-samples-11.4.sh
cudafe++ cuda-gdbserver cuda-memcheck
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
1.3 安裝 NVIDIA Container Toolkit
參考: https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md
參考: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum clean expire-cache
sudo yum install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
nvidia-docker鏡像加速
安裝完nvidia-docker ,之后添加鏡像加速咏窿,自己可以根據(jù)自己的需求添加阿里的鏡像加速
sudo vim /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"registry-mirrors": ["https://cbrok4rc.mirror.aliyuncs.com"]
}
sudo systemctl daemon-reload
sudo systemctl restart docker
二.利用tensorRT Container和 tensorRTx 生成model.plan (engine)文件
參考: https://github.com/wang-xinyu/tensorrtx/tree/master/yolov5
2.1 利用pt 生成 wts
以下為示例
# cp {tensorrtx}/yolov5/gen_wts.py {ultralytics}/yolov5
# cd {ultralytics}/yolov5
# python gen_wts.py -w yolov5s.pt -o yolov5s.wts
進入conda 虛擬環(huán)境 conda activate python38
1) helmet
cp ~/tensorrtx/yolov5/gen_wts.py ~/yolov5_5.0_helmet/ # copy gen_wts.py到 yolov5中
cd ~/yolov5_5.0_helmet/
python gen_wts.py -w yolov5s.pt -o yolov5s.wts
2) mask
3) fire
2.2 部署TensorRT Container
參考: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt
參考: https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html
重點參考: https://medium.com/@penolove15/yolov4-with-triton-inference-server-and-client-6b02f085c622
sudo docker pull nvcr.io/nvidia/tensorrt:21.09-py3
啟動tensorrt container 容器:
sudo docker run --gpus all -it \
--name trt_transfer \
-v /home/huaxi:/huaxi nvcr.io/nvidia/tensorrt:21.09-py3 \
/bin/bash
進入一個已經(jīng)在運行的容器
sudo docker ps -a # 查詢要進入容器的ID
sudo docker start -ia 02e73b37db71 # 容器的ID
sudo docker exec -it 02e73b37db71 /bin/bash
進入docker虛擬機后更新apt源:
sed -i s:/archive.ubuntu.com:/mirrors.tuna.tsinghua.edu.cn/ubuntu:g /etc/apt/sources.list
cat /etc/apt/sources.list
apt-get clean
apt-get -y update --fix-missing # 會清空已下載的緩存包荷腊,導致libopencv-dev及其依賴包重新下載
安裝opencv:
cmake --version # 如果不存在則執(zhí)行 apt install cmake
apt-get install libopencv-dev # 安裝opencv绍移, 安裝如果報 connection failed 則重新執(zhí)行該命令,保證安裝成功
2.1.3 利用tensorrt container 生成model.plan(engine)文件
以下為參考示例讥电,不需要執(zhí)行:
cd {tensorrtx}/yolov5/
// update CLASS_NUM in yololayer.h if your model is trained on custom dataset
mkdir build
cd build
cp {ultralytics}/yolov5/yolov5s.wts {tensorrtx}/yolov5/build
cmake ..
make
sudo ./yolov5 -s [.wts] [.engine] [n/s/m/l/x/n6/s6/m6/l6/x6 or c/c6 gd gw] // serialize model to plan file
sudo ./yolov5 -d [.engine] [image folder] // deserialize and run inference, the images in [image folder] will be processed.
// For example yolov5s
sudo ./yolov5 -s yolov5s.wts yolov5s.engine s
sudo ./yolov5 -d yolov5s.engine ../samples
// For example Custom model with depth_multiple=0.17, width_multiple=0.25 in yolov5.yaml
sudo ./yolov5 -s yolov5_custom.wts yolov5.engine c 0.17 0.25
sudo ./yolov5 -d yolov5.engine ../samples
以下為實際運行的腳本(包括 helmet蹂窖、 mask、 fire三個模型)
1) helmet engine模型生成
cd /huaxi/tensorrtx/yolov5_helmet/
vim yololayer.h # 已參考tensorrtx GitHub 說明進行修改
vim yolov5.cpp # 已參考tensorrtx GitHub 說明進行修改
mkdir build
cd build
cp /huaxi/yolov5_5.0_helmet/yolov5s.wts /huaxi/tensorrtx/yolov5_helmet/build
cmake ..
make
./yolov5 -s yolov5s.wts yolov5s.engine s # 生成對應的.engine文件
./yolov5 -d yolov5s.engine ../images/ # 測試.engine文件
2) mask engine模型生成
3) fire engine模型生成
三 NVIDIA TRITON INFERENCE SERVER部署engine文件
重點參考:
https://medium.com/@penolove15/yolov4-with-triton-inference-server-and-client-6b02f085c622
https://blog.csdn.net/JulyLi2019/article/details/119875633
3.1 安裝 Triton Server Docker Image
注意: TensorRT 8.0.3
# docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
sudo docker pull nvcr.io/nvidia/tritonserver:21.09-py3 # 同 tensorrtx container 版本匹配
The model repository is the directory where you place the models that you want Triton to serve.
參考: https://blog.csdn.net/JulyLi2019/article/details/119875633
mkdir ~/Triton/model_repository/helmet_detection/1/
mkdir ~/Triton/plugins/helmet_detection/
cp -R ~/tensorrtx/yolov5_helmet/build/yolov5s.engine ~/Triton/model_repository/helmet_detection/1/model.plan
cp -R ~/tensorrtx/yolov5_helmet/build/libmyplugins.so ~/Triton/plugins/helmet_detection/
創(chuàng)建start_server.sh (路徑與docker 創(chuàng)建時映射的路徑要匹配)
LD_PRELOAD=/Triton/plugins/helmet_detection/libmyplugins.so tritonserver --model-repository=/Triton/model_repository/
初始啟動 Triton Server (Container)
# docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
sudo docker run \
--gpus all\
--shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
--name trt_serving \
-v /home/huaxi/Triton:/Triton \
-itd \
nvcr.io/nvidia/tritonserver:21.09-py3 \
/bin/bash /Triton/start_server.sh
進入已啟動的docker
sudo docker stop 515f33be25b6
sudo docker ps -a # 查詢要進入容器的ID
sudo docker start -ia 515f33be25b6# 啟動已停止的容器并進入恩敌,容器的ID
sudo docker exec -it 515f33be25b6 /bin/bash # 進入已啟動的容器
3.2 部署Triton Server Client 并測試部署的模型
參考: https://blog.csdn.net/JulyLi2019/article/details/119875633
參考: https://github.com/JulyLi2019/tensorrt-yolov5 下載源碼
進入conda 虛擬環(huán)境 conda activate python38:
conda install -c conda-forge python-rapidjson
pip install tritonclient==2.18.0 # pip uninstall tritonclient
cd ~/Triton_client/triton_client_yolov5
python client_image.py