Ubuntu16.04下 tensorflow1.60安裝
本方法是通過Anaconda安裝tensorflow.
Ubuntu16.04 + python3.5 + tensorflow1.6 + cuda9.0 + cuDNN7.0 + Anaconda3-5.1.0 + nvidia384 + GeForce GTX 1060 3GB
幾個注意事項(xiàng)用爪。
由于安裝tensorflow需要安裝cuda,cuDNN,所以需要注意之間的版本對應(yīng)關(guān)系行疏,事先在tensorflow的github中的release note中查詢,網(wǎng)址為https://github.com/tensorflow/tensorflow/releases
不要裝太新的,要裝穩(wěn)定的組合购撼,python3.6下的tensorflow1.60一直出錯,包也裝里凶赁,conda list里也有tensorflow-gpu颅拦,可是import tensorflow時一直出錯,無語怖侦。篡悟。。
版本判斷
cuda_9.0.176_384.81_linux.run # cuda 9.0 nvidia驅(qū)動 384
cudnn-9.0-linux-x64-v7.tgz # 與cuda 9.0對應(yīng)的cudnn 7.0
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp35-cp35m-linux_x86_64.whl #python 3.5
安裝nvidia驅(qū)動程序
-
禁用系統(tǒng)默認(rèn)的集成驅(qū)動匾寝,倘若安裝過nvidia驅(qū)動可以跳過
2018-03-13 16-21-14屏幕截圖.png
上圖是安裝nvidia驅(qū)動后的狀態(tài)搬葬,如果是驅(qū)動是第二個則需要進(jìn)行此步驟
Ubuntu系統(tǒng)集成的顯卡驅(qū)動程序是nouveau,它是第三方為NVIDIA開發(fā)的開源驅(qū)動艳悔,我們需要先將其屏蔽才能安裝NVIDIA官方驅(qū)動急凰。
將驅(qū)動添加到黑名單blacklist.conf中,但是由于該文件的屬性不允許修改猜年。所以需要先修改文件屬性抡锈。
查看屬性
$sudo ls -lh /etc/modprobe.d/blacklist.conf
修改屬性
$sudo chmod 666 /etc/modprobe.d/blacklist.conf
用gedit打開
$sudo gedit /etc/modprobe.d/blacklist.conf
在該文件后添加以下幾行:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb
- 開始安裝
卸載已有nvidia驅(qū)動,在終端中運(yùn)行:
sudo apt-get remove --purge nvidia*
卸載完成后,按Ctrl+Alt+F1進(jìn)入命令行模式乔外,關(guān)閉圖形系統(tǒng)
$sudo service lightdm stop
安裝N卡驅(qū)動程序(我的顯卡推薦的是nvidia-384) ,從 系統(tǒng)設(shè)置->軟件更新->附加驅(qū)動 查看
$sudo apt-get install nvidia-384
安裝完成后床三,啟動圖形系統(tǒng)
$sudo service lightdm start
上面的命令執(zhí)行后會自動轉(zhuǎn)到圖形界面,因?yàn)橹癠buntu系統(tǒng)集成的顯卡驅(qū)動程序nouveau被禁用了杨幼,這時候可能無法顯示圖形界面勿璃,此時再按下Ctrl+Alt+F1進(jìn)入命令行模式,輸入reboot 重啟計(jì)算機(jī)即可。
通過 nvidia-smi 查看是否成功安裝补疑,如果正確歧沪,會輸出類似以下的信息
(tensorflow) ajm@ajm-zju:~$ nvidia-smi
Tue Mar 13 13:21:14 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 27% 27C P8 9W / 120W | 408MiB / 3012MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1074 G /usr/lib/xorg/Xorg 151MiB |
| 0 1933 G compiz 138MiB |
| 0 2071 G fcitx-qimpanel 9MiB |
| 0 2359 G ...-token=F8442C0855613A1C9ED488250D0EE24D 107MiB |
+-----------------------------------------------------------------------------+
安裝cuda
在https://developer.nvidia.com/cuda-downloads里選擇機(jī)器環(huán)境后下載runfile(local)文件
切換到相應(yīng)目錄,在終端中運(yùn)行
sudo sh cuda_9.0.176_384.81_linux.run
在詢問是否安裝Nvidia驅(qū)動時莲组,由于前一步已經(jīng)安裝好了驅(qū)動诊胞,選擇no,最后會報(bào)錯沒有Nvidia drivers,但這沒有關(guān)系。其余的問題都是yes
安裝完成后需要添加環(huán)境變量
網(wǎng)上推薦的方法大都如下
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64$LD_LIBRARY_PATH
并通過 source /etc/profile
生效
但是锹杈,這種方法只是臨時設(shè)置撵孤,電腦重啟等情況下又會失效,所以永久設(shè)置的方法如下:
sudo gedit /etc/profile #對所有用戶永久設(shè)置
#在文件末尾加上以下兩行
export PATH="$PATH:/usr/local/cuda/bin" #以:分隔竭望,注意如果原來已經(jīng)有這一行邪码,則將:/usr/local/cuda/bin添加到后面
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64"
并通過
source /etc/profile
生效,否則需重啟才能生效
測試是否成功安裝:
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
若成功安裝咬清,會輸出類似以下的信息
(tensorflow) ajm@ajm-zju:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 3GB"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 3013 MBytes (3158900736 bytes)
( 9) Multiprocessors, (128) CUDA Cores/MP: 1152 CUDA Cores
GPU Max Clock rate: 1734 MHz (1.73 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
安裝cuDNN
在https://developer.nvidia.com/cudnn內(nèi)點(diǎn)擊download闭专,需要注冊并登錄后才可以下載cuDNN的包,這里下載的是cuDNN v7.1.1 Library for Linux
在終端中旧烧,解壓下好的cuDNN包:
tar -xvf cudnn-9.0-linux-x64-v7.tgz
接下來只需把頭文件和庫文件加入到安裝的cuda目錄下:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h //對所有用戶加上讀取權(quán)限
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
cuDNN安裝完畢
安裝Anaconda
- 下載
通過 https://www.anaconda.com/download/#linux選擇需要的Anaconda版本影钉,下載安裝包,也可以在清華大學(xué)Anaconda下載掘剪,本文下載的是 Anaconda3-5.1.0-Linux-x86_64.sh
- 安裝
# 切換到軟件包的目錄下
bash Anaconda3-5.1.0-Linux-x86_64.sh
- 添加清華鏡像
因?yàn)閲饩W(wǎng)址訪問可能會很慢平委,可以在conda配置文件添加清華鏡像網(wǎng)址清華大學(xué)Anaconda 鏡像,配置如下
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
并通過
gedit ~/.condarc
刪除 default那一行
通過Anaconda安裝tensorflow
通過conda命令添加tensorflow運(yùn)行環(huán)境
# 我裝的時候 python3.6環(huán)境下一直有問題夺谁,所以選擇python3.5
$ conda create -n tensorflow python=3.5 # or python=3.3,2.7 ...
通過一下命令激活該運(yùn)行環(huán)境
$ source activate tensorflow
接著安裝tensorflow:
(tensorflow)$ pip install --ignore-installed --upgrade tfBinaryURL
其中 tfBinaryURL 是需要安裝的tensorflow 對應(yīng)的URL.例如 https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp35-cp35m-linux_x86_64.whl
測試是否安裝成功
- 安裝完成后廉赔,需要運(yùn)行一小段tensorflow腳本來測試安裝是否正確。Tensorflow的官方教程里給出了兩個階段的測試匾鸥,第一個是hello world性質(zhì)的:
$ python
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print(sess.run(a + b))
42
>>>
倘若出現(xiàn)以下錯誤
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
說明讀取libcublas.so.9.0庫文件錯誤昂勉,之前的LD_LIBRARY_PATH環(huán)境變量沒有設(shè)置正確,通過
echo $PATH
echo $LD_LIBRARY_PATH
可以查看環(huán)境變量是否設(shè)置正確
- 運(yùn)行CNN卷積神經(jīng)網(wǎng)絡(luò)扫腺,MNIST手寫數(shù)字識別代碼
代碼來自zouxy09 Deep Learning-TensorFlow (1) CNN卷積神經(jīng)網(wǎng)絡(luò)_MNIST手寫數(shù)字識別代碼實(shí)現(xiàn)
# -*- coding: utf-8 -*-
import time
import tensorflow.examples.tutorials.mnist.input_data as input_data
import tensorflow as tf
'''''
權(quán)重w和偏置b
初始化為一個接近0的很小的正數(shù)
'''
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev = 0.1) # 截?cái)嗾龖B(tài)分布
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape) # 常量0.1
return tf.Variable(initial)
'''''
卷積和池化,卷積步長為1(stride size)村象,0邊距(padding size)
池化用簡單傳統(tǒng)的2x2大小的模板max pooling
'''
def conv2d(x, W):
# strides[1,,,1]默認(rèn)為1笆环,中間兩位為size,padding same為0厚者,保證輸入輸出大小一致
return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1,2,2,1],
strides=[1,2,2,1], padding='SAME')
# 計(jì)算開始時間
start = time.clock()
# MNIST數(shù)據(jù)輸入
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# 圖像輸入輸出向量
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None,10])
# 第一層躁劣,由一個卷積層加一個maxpooling層
# 卷積核的大小為5x5,個數(shù)為32
# 卷積核張量形狀是[5, 5, 1, 32]库菲,對應(yīng)size账忘,輸入通道為1,輸出通道為32
# 每一個輸出通道都有一個對應(yīng)的偏置量
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# 把x變成一個4d向量,其第2鳖擒、第3維對應(yīng)圖片的寬溉浙、高,最后一維代表圖片的顏色通道數(shù)
x_image = tf.reshape(x, [-1, 28, 28, 1]) # -1代表None
# x_image權(quán)重向量卷積蒋荚,加上偏置項(xiàng)戳稽,之后應(yīng)用ReLU函數(shù),之后進(jìn)行max_polling
h_conv1 = tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# 第二層期升,結(jié)構(gòu)不變惊奇,輸入32個通道,輸出64個通道
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
# 全連接層
'''''
圖片尺寸變?yōu)?x7(28/2/2=7)播赁,加入有1024個神經(jīng)元的全連接層颂郎,把池化層輸出張量reshape成向量
乘上權(quán)重矩陣,加上偏置容为,然后進(jìn)行ReLU
'''
W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout乓序,用來防止過擬合
# 加在輸出層之前,訓(xùn)練過程中開啟dropout舟奠,測試過程中關(guān)閉
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# 輸出層, 添加softmax層竭缝,類別數(shù)為10
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2) + b_fc2)
# 訓(xùn)練和評估模型
'''''
ADAM優(yōu)化器來做梯度最速下降,feed_dict加入?yún)?shù)keep_prob控制dropout比例
'''
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) # 計(jì)算交叉熵
# 使用adam優(yōu)化器來以0.0001的學(xué)習(xí)率來進(jìn)行微調(diào)
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# 判斷預(yù)測標(biāo)簽和實(shí)際標(biāo)簽是否匹配
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
# 啟動創(chuàng)建的模型沼瘫,并初始化變量
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# 開始訓(xùn)練模型抬纸,循環(huán)訓(xùn)練1000次
for i in range(1000):
batch = mnist.train.next_batch(50) # batch 大小設(shè)置為50
if i%100 == 0:
train_accuracy = accuracy.eval(session=sess,
feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
print("step %d, train_accuracy %g" %(i,train_accuracy))
# 神經(jīng)元輸出保持keep_prob為0.5,進(jìn)行訓(xùn)練
train_step.run(session=sess, feed_dict={x:batch[0], y_:batch[1], keep_prob:0.5})
# 神經(jīng)元輸出保持keep_prob為1.0耿戚,進(jìn)行測試
print("test accuracy %g" %accuracy.eval(session=sess,
feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:1.0}))
# 計(jì)算程序結(jié)束時間
end = time.clock()
print("running time is %g s" %(end-start))
將上述代碼復(fù)制后保存到 test.py文件湿故,在Anaconda中激活tensorflow環(huán)境
source activate tensorflow
(tensorflow) ajm@ajm-zju:~$ python test.py
若運(yùn)行正確會輸出以下結(jié)果:
(tensorflow) ajm@ajm-zju:~$ python test.py
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
2018-03-13 15:02:30.015471: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-13 15:02:30.134023: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-13 15:02:30.134237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 2.94GiB freeMemory: 2.49GiB
2018-03-13 15:02:30.134251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-13 15:02:30.303165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2198 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 3GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
step 0, train_accuracy 0.06
step 100, train_accuracy 0.92
step 200, train_accuracy 0.96
step 300, train_accuracy 0.92
step 400, train_accuracy 0.92
step 500, train_accuracy 1
step 600, train_accuracy 1
step 700, train_accuracy 0.96
step 800, train_accuracy 0.96
step 900, train_accuracy 0.98
2018-03-13 15:02:36.083316: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 747.68MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-03-13 15:02:36.083361: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.59GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-03-13 15:02:36.083390: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-03-13 15:02:36.343473: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.42GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
test accuracy 0.9673
running time is 7.45613 s
其他問題
通過上述方法完成后,通過Ipython,jupyter notebook import tensorflow會出錯膜蛔,此時應(yīng)該在tensorflow環(huán)境下重新安裝
conda install jupter notebook