一蘑险、安裝docker
參考: NVidia-Docker2安裝與常用命令 - jimchen1218 - 博客園 (cnblogs.com)
1.備份sources.list
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
2.修改sources.list
sudo gedit /etc/apt/sources.list
3.替換云鏡像
如果系統(tǒng)版本是20.04
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
如果系統(tǒng)版本是22.04
deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
# stable add by , in order to install g++7
deb [arch=amd64] http://archive.ubuntu.com/ubuntu focal main universe
4.更新
sudo apt update
5.清除系統(tǒng)原有docker
sudo apt-get remove docker docker-engine docker.io
6.更新程序
sudo apt update
7.安裝依賴
# 如果遇到software-properties-common裝不上可不用安裝
sudo apt install apt-transport-https ca-certificates curl software-properties-common
8.添加Docker官方密鑰到系統(tǒng)中
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
如果執(zhí)行該命令時(shí)報(bào)錯(cuò):curl:(35) gnutils_handshake() failed:Error in the push function. gpg:找不到有效的OpenPGP數(shù)據(jù)
解決方法:
sudo apt-get install build-essential fakeroot dpkg-dev libcurl4-openssl-dev
9.添加Docker源
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
10.更新一下源
sudo apt update
11.查看可以安裝的docker版本
apt-cache policy docker-ce
如果有列表顯示梗摇,說(shuō)明可以正常安裝了
12.開始安裝docker
sudo apt install docker-ce
13.測(cè)試
docker --version
sudo docker run hello-world
出現(xiàn)unable to find image 'hello-world:latest' locally說(shuō)明已安裝成功
二、安裝nvidia-container-runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
sudo apt install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
三主到、安裝nvidia-docker2
3.1 安裝nvidia-docker2
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
如果遇到zlib缺失或版本低,執(zhí)行如下命令
# 安裝
sudo apt-get install zlib1g-dev
# 升級(jí)
sudo apt-get upgrade zlib1g-dev
3.2 添加nvidia運(yùn)行時(shí)
為 Docker 添加 nvidia 這個(gè)運(yùn)行時(shí)躯概。完成后登钥,我們的應(yīng)用就能在容器中使用顯卡資源了:
sudo nvidia-ctk runtime configure --runtime=docker
3.3 重啟
sudo systemctl daemon-reload
sudo systemctl restart docker
服務(wù)重啟完畢,我們查看 Docker 運(yùn)行時(shí)列表娶靡,能夠看到 nvidia 已經(jīng)生效啦牧牢。
docker info | grep Runtimes
Runtimes: nvidia runc io.containerd.runc.v2
3.4 驗(yàn)證nvidia-docker
nvidia-docker -v
返回結(jié)果:
Docker version 24.0.6, build ed223bc
說(shuō)明 nvidia-docker 安裝成功
四、下拉鏡像和運(yùn)行容器
4.1 拉取鏡像
sudo docker pull nvidia/cudagl:11.4.0-runtime-ubuntu20.04
如果報(bào)代理錯(cuò)誤
Error response from daemon: Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp 192.168.8.12:7890: connect: no route to host
清除代理
sudo vim /etc/systemd/system/docker.service.d/http-proxy.conf
sudo systemctl daemon-reload
sudo systemctl restart docker
# 查看是否取消成功
sudo docker info | grep -i proxy
4.2 運(yùn)行容器
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cudagl:11.4.0-runtime-ubuntu20.04 nvidia-smi
或者
sudo nvidia-docker run --rm --gpus all nvidia/cudagl:11.4.0-runtime-ubuntu20.04 nvidia-smi
顯示如下信息則表示成功
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 30% 37C P5 32W / 320W | 7452MiB / 16376MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+