1,軟件版本等基礎(chǔ)信息
裸機(jī)環(huán)境
nvidia 驅(qū)動(dòng)版本:390.77
cudnn?版本:7.0.5
cuda 版本:9.0
tensorflow-gpu ?版本:1.12.0
容器環(huán)境
nvidia-docker?版本:1.0.1
安裝包下載地址
docker tensorflow 驚醒地址
https://hub.docker.com/r/tensorflow/tensorflow/
鏡像名稱(chēng):tensorflow:latest-gpu-py320181126
cudnn :7.0 ?https://anaconda.org/anaconda/cudnn/files?version=7.0.5
7.3https://www.archlinux.org/packages/community/x86_64/cudnn/
cuda :版本選擇地址https://developer.nvidia.com/cuda-toolkit-archive
版本對(duì)照表https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
nvidia-docker ?:https://github.com/NVIDIA/nvidia-docker/releases(選擇rpm 下載安裝)
相關(guān)文檔
1https://blog.csdn.net/A632189007/article/details/78801166(第一版安裝流程)
2https://liqiang311.github.io/docker/nvidia-docker%E5%91%BD%E4%BB%A4%E8%AF%A6%E8%A7%A3/(nvidia-docker 使用說(shuō)明 博客)
3https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html(nvidia cudnn安裝文檔)
4https://www.tensorflow.org/guide/using_gpu?hl=zh-cn(tensorflow guide)
二榕茧,安裝過(guò)程
1规哪,先安裝 nvidia GPU 驅(qū)動(dòng)程序
安裝完成后會(huì)有?nvidia-smi 命令然爆〈魏停可以查看GPU信息和版本
2一姿,安裝 cuda 包各,安裝方法按照 如上文檔鏈接 1
3摘仅, 安裝 cudnn ,下載之后直接解壓问畅,并復(fù)制到/usr/local/cudnn
4娃属,裸機(jī)安裝 tensorflow?
pip install tensorflow-gpu
5,指定環(huán)境變量护姆,將 ?cuda和cudd 加入進(jìn)來(lái)
export LD_LIBRARY_PATH="/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cudnn/lib::/usr/local/cuda/lib64"
6矾端,邏輯測(cè)試成功
三,容器隔離和測(cè)試
1卵皂,安裝 nvidia-docker 插件秩铆,并啟動(dòng)服務(wù)
方式?
rpm -ivh??nvidia-docker-1.0.1-1.x86_64.rpm
systemctl start nvidia-docker
容器啟動(dòng)方式
nvidia-docker run --rm -it? ? tensorflow:latest-gpu-py320181126? ?/bin/bash
測(cè)試成功
gpu 容器隔離方法
#通過(guò)變量 ?NV_GPU 指定GPU
NV_GPU=0,1 nvidia-docker run --rm -it? ? tensorflow:latest-gpu-py320181126? ?/bin/bash
四,測(cè)試用例
#?Creates?a?graph.
import tensorflow as tf
a?=?tf.constant([1.0,?2.0,?3.0,?4.0,?5.0,?6.0],?shape=[2,?3],?name='a')
b?=?tf.constant([1.0,?2.0,?3.0,?4.0,?5.0,?6.0],?shape=[3,?2],?name='b')
c?=?tf.matmul(a,?b)
sess?=?tf.Session(config=tf.ConfigProto(log_device_placement=True))
#?Creates?a?session?with?log_device_placement?set?to?True.
#?Runs?the?op.
print(sess.run(c))
問(wèn)題:
https://github.com/tensorflow/tensorflow/issues/609
會(huì)OOM GPU顯存不足