我的配置: Ubuntu 18.04+nvidia 410.78+cuda 10.0+cudnn 7.4.2
- 下載 py-faster-rcnn
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
- 由于用到了caffe框架,所以需要先將caffe依賴的包裝上
sudo apt-get install python-pip
sudo pip install cython
sudo pip install easydict
sudo apt-get install python-opencv
還需要裝:
- boost
sudo apt-get install libboost-all-dev
- proto
sudo apt-get install libprotobuf-dev protobuf-c-compiler protobuf-compiler
- glog
sudo apt-get install libgoogle-glog-dev
- gflags
sudo apt-get install libgflags-dev
- lmdb
sudo apt-get install liblmdb-dev
- leveldb
sudo apt-get install libleveldb-dev
- snappy
sudo apt-get install libsnappy-dev
- opencv
sudo apt-get install libopencv-dev
- BLAS
sudo apt-get install libatlas-base-dev
- hdf5.h頭文件
sudo apt-get install libhdf5-\*
- 編譯caffe-faster-rcnn
- 編譯Cython模塊
cd py-faster-rcnn/lib
make
- 編譯caffe和pycaffe
先進入caffe-fast-rcnn目錄下
cd py-faster-rcnn/caffe-fast-rcnn
復制Makefile.config.example為Makefile.config
cp Makefile.config.example Makefile.config
編輯Makefile.config,對應地方改為如下形式:
USE_CUDNN := 1
WITH_PYTHON_LAYER := 1
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
這時進行編譯還是會出現(xiàn)錯誤亿汞,faster-rcnn默認的caffe支持的cudnn版本是v4,因此編譯caffe會出現(xiàn)版本不兼容而導致的函數(shù)參數(shù)不對應的錯誤模软。這時參考博文https://blog.csdn.net/flygeda/article/details/78638824,下載caffe最新源碼https://github.com/BVLC/caffe
用最新caffe源碼的以下文件替換掉caffe-fast-rcnn中的對應文件:
include/caffe/layers/cudnn_relu_layer.hpp
src/caffe/layers/cudnn_relu_layer.cpp
src/caffe/layers/cudnn_relu_layer.cu
include/caffe/layers/cudnn_sigmoid_layer.hpp
src/caffe/layers/cudnn_sigmoid_layer.cpp
src/caffe/layers/cudnn_sigmoid_layer.cu
include/caffe/layers/cudnn_tanh_layer.hpp
src/caffe/layers/cudnn_tanh_layer.cpp
src/caffe/layers/cudnn_tanh_layer.cu
include/caffe/util/cudnn.hpp
將caffe-fast-rcnn中的src/caffe/layers/cudnn_conv_layer.cu 文件中所有的
cudnnConvolutionBackwardData_v3 函數(shù)名替換為 cudnnConvolutionBackwardData
cudnnConvolutionBackwardFilter_v3函數(shù)名替換為 cudnnConvolutionBackwardFilter
然后進行編譯:
cd py-faster-rcnn/caffe-fast-rcnn
make -j8 && make pycaffe
這時編譯又遇到一個錯誤nvcc fatal : Unsupported gpu architecture 'compute_20'
饮潦,這時需要將Makefile.config中CUDA_ARCH
配置去掉
-gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
然后編譯完成燃异。
獲取faster-rcnn模型
cd py-faster-rcnn
./data/scripts/fetch_faster_rcnn_models.sh
服務器沒法翻墻,所以我先在本地下載后傳到服務器的py-faster-rcnn/data目錄下继蜡,下載URL位于fetch_faster_rcnn_models.sh中特铝。
然后進行解壓:tar -xvf faster_rcnn_models.tgz
運行demo
cd py-faster-rcnn
sudo ./tools/demo.py
- 報錯:
ImportError: No module named skimage.io
解決:sudo apt-get install python-skimage
- 報錯:
ImportError: No module named google.protobuf.internal
解決:pip install protobuf
- 報錯:
Cannot create Cublas handle. Cublas won't be available
...中間省略幾十行
Check failed: status == CUDNN_STATUS_SUCCESS (1 VS. 0) CUDNN_STATUS_NOT_INITIALIZED
電腦之前安裝了cuda10.1,這個版本是不適合我的顯卡驅(qū)動410.78的壹瘟,一直沒刪,在此將其卸載鳄逾,只保留cuda10.0稻轨。到/usr/local/cuda-10.1/bin
目錄下執(zhí)行./cuda_uninstaller
- 報錯:
...
in <module>
from nms.gpu_nms import gpu_nms
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory
這是由于我之前用cuda10.1編譯過,而換成cuda10.0進行編譯后部分文件并沒有進行重新編譯雕凹,依然依賴cuda10.1殴俱。所以需要將/py-faster-rcnn/lib/
下文件夾中所有的*.so
文件刪除,之后再重新進行make
枚抵。
至此demo運行成功:)
下載在ImageNet上pre-trained的模型參數(shù)(用于初始化網(wǎng)絡(luò)參數(shù))
cd py-faster-rcnn
./data/scripts/fetch_imagenet_models.sh
下載不下來的話方法同4.創(chuàng)建PASCAL VOC數(shù)據(jù)集的符號鏈接线欲,以便可以在多個項目使用該數(shù)據(jù)集,
$VOCdevkit
為你下載的數(shù)據(jù)集的目錄
cd py-faster-rcnn/data
ln -s $VOCdevkit VOCdevkit2007
用VOC數(shù)據(jù)集進行訓練
cd py-faster-rcnn
./experiments/scripts/faster_rcnn_alt_opt.sh [GPU_ID] [NET] [--set...]
./experiments/scripts/faster_rcnn_alt_opt.sh 1 ZF pascal_voc
此時報錯:
File "/home/zd/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 51, in __init__
pb2.text_format.Merge(f.read(), self.solver_param)
AttributeError: 'module' object has no attribute 'text_format'
解決辦法是在py-faster-rcnn/lib/fast_rcnn/train.py
中加上一句代碼:
import google.protobuf.text_format
然后開始training...
但是跑了一會兒又報了個錯:
File "/home/zd/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 110, in _sample_rois
fg_inds, size=fg_rois_per_this_image, replace=False
File "mtrand.pyx", line 1176, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:18822)
TypeError: 'numpy.float64' object cannot be interpreted as an index
于是重裝numpy1.11.0版本sudo pip install -U numpy==1.11.0
但是會出現(xiàn)新的錯誤ImportError: numpy.core.multiarray failed to import
于是參考https://github.com/rbgirshick/py-faster-rcnn/issues/626 修改py-faster-rcnn/lib/roi_data_layer/minibatch.py
文件中的line55 line98 line110 line124 line175
汽摹,并且將numpy版本升級到1.13.1sudo pip install -U numpy==1.13.1
參考文章:
[1] Kali新手喝咖啡(Caffe)的艱辛之路
[2] Caffe-GPU編譯問題:nvcc fatal:Unsupported gpu architecture 'compute_20'
[3] Ubuntu16.04 faster-rcnn+caffe+gpu運行環(huán)境配置以及解決各種bug
[4] caffe學習(四):py-faster-rcnn配置李丰,運行測試程序(Ubuntu)