介紹
U-Net是15年出來(lái)的在顯微組織切片細(xì)胞分割領(lǐng)域大獲成功的一個(gè)CNN segmentation模型。它借鑒了當(dāng)時(shí)新提出不久的FCN網(wǎng)絡(luò)莺治,進(jìn)一步有效利用了各個(gè)尺度context feature map所具有的信息廓鞠,充分使用了各種可行的
數(shù)據(jù)增強(qiáng)的手段,在生物細(xì)胞組織切片這種精度高谣旁、物體數(shù)目密集的圖片數(shù)據(jù)集上獲得了較好的效果床佳。
關(guān)于FCN的具體情況,可參考區(qū)區(qū)的此篇博文:Semantic segmentation系列其一:FCN榄审。
U-Net網(wǎng)絡(luò)本質(zhì)上就是一個(gè)FCN砌们。但它也有些值得贊許的地方像通過(guò)在Conv中使用valid padding,使得feature map經(jīng)過(guò)conv層時(shí)不斷減少兩個(gè)邊緣像素搁进;最終幾經(jīng)裁減并下采樣的feature map獲得了我們想要的mask
map的大小浪感。它的具體網(wǎng)絡(luò)結(jié)構(gòu)請(qǐng)見(jiàn)下圖。
U-Net
或者U-Net中最值得一說(shuō)的是它處理像生物組織切片此類(lèi)數(shù)據(jù)的有效方式及所使用的有些特點(diǎn)的loss函數(shù)吧饼问。
數(shù)據(jù)輸入
筆者因?yàn)榭蛻?hù)項(xiàng)目需求也增接觸過(guò)某種類(lèi)型的人體顯微組織切片數(shù)據(jù)影兽。項(xiàng)目需求是找到一種好的方案來(lái)定位出切片上的有問(wèn)題細(xì)胞區(qū)域并給出初步判斷,標(biāo)明此切片的陰莱革、陽(yáng)性類(lèi)別峻堰。
乍聽(tīng)上去并不難,像是一個(gè)典型的object detection或者復(fù)雜一點(diǎn)那就是semantic segmentation的問(wèn)題盅视【杳可它的數(shù)據(jù)卻并非像我們平時(shí)玩壞了的Imagenet、Cifar10/100或者COCO/VOC那樣‘正規(guī)’闹击。镶蹋。客戶(hù)給我們的數(shù)據(jù)集
甚至標(biāo)注都存在問(wèn)題,要么有些圖片標(biāo)注不全(漏標(biāo))贺归,要么就是標(biāo)注錯(cuò)誤(錯(cuò)標(biāo))(可能是請(qǐng)老專(zhuān)家們費(fèi)力盯著切片去標(biāo)時(shí)錢(qián)沒(méi)給夠吧O健)。每張圖片更是有數(shù)百萬(wàn)個(gè)像素牧氮,單個(gè)圖片文件大小有幾個(gè)GB琼腔。
碰到這樣的需求你會(huì)如何做呢?不過(guò)多討論踱葛,先看下U-Net作者們的做法丹莲。可能會(huì)有些借鑒吧尸诽。
首先甥材,它們將大的圖片按照Tiling的方式切成許多個(gè)patch,然后再將patch作為網(wǎng)絡(luò)的輸入性含。因?yàn)镾egmentation對(duì)圖片之上物體的位置要求極其嚴(yán)格洲赵,因此使用Valid padding的方式對(duì)feature maps進(jìn)行conv操作、處理商蕴。
這意味著我們需要讓輸入的patch有著更大的context叠萍,如此才能保證圖片經(jīng)過(guò)一層層conv的去border(因?yàn)関alid padding的使用)最終也能保證其位置信息正確。
如下為U-Net在對(duì)輸入圖片進(jìn)行overlapping patch處理的方式绪商。
數(shù)據(jù)預(yù)處理
一般像這種生物類(lèi)的切片數(shù)據(jù)集成本都較高(隱私苛谷、標(biāo)注成本等),所以數(shù)據(jù)集規(guī)模都不會(huì)太大格郁。為此需要多用些data augmentation的方法才能保證數(shù)據(jù)集對(duì)模型的充分?jǐn)M合腹殿。
本篇中作者用的data augmentation方法有:shift/rotation(比如多些不同類(lèi)型的角度旋轉(zhuǎn)),進(jìn)行些平滑過(guò)濾例书、變形等(被證明為在使用小的數(shù)據(jù)集時(shí)極其有效锣尉。)。
網(wǎng)絡(luò)結(jié)構(gòu)
從開(kāi)篇圖中亦可看出U-Net網(wǎng)絡(luò)起的名字確實(shí)是名符其實(shí)决采。它共有兩個(gè)Patch組成自沧,一條為contracting path(left side),它主要是典型的CNN特征提取網(wǎng)絡(luò)(用于逐級(jí)精煉树瞭、提取特征)暂幼;另一條則為expansive path,用于使用
contracting path得到的不同尺度移迫、級(jí)別的特征進(jìn)一步上采樣、變形管行、轉(zhuǎn)換為高精度的mask map厨埋。
從里面隱約已經(jīng)看出了后來(lái)的DSSD模型與FPN模型中使用的結(jié)構(gòu)的影子。
模型訓(xùn)練
在訓(xùn)練時(shí)捐顷,我們通過(guò)在使用輸入的label mask map與最終網(wǎng)絡(luò)輸出的結(jié)果mask map之間逐個(gè)元素進(jìn)行cross-entropy求和來(lái)得到loss荡陷,并進(jìn)行backward propagation計(jì)算雨效。
如下為loss函數(shù)計(jì)算公式。
有趣的是為了更好地學(xué)習(xí)出細(xì)胞間的細(xì)小分隔元素(以更好地表示細(xì)胞間的邊緣)废赞,作者在loss函數(shù)計(jì)算時(shí)附加了一個(gè)反映元素級(jí)別位置的權(quán)重矩陣(w(x)徽龟,x為圖片上的每一個(gè)像素)。
其中w(x)的值則是根據(jù)訓(xùn)練數(shù)據(jù)集中所知的segmentation map的類(lèi)別情況及其上像素分布位置而定的唉地。它的計(jì)算公式如下:
乍看時(shí)感覺(jué)像是在使用先驗(yàn)的標(biāo)注信息据悔,有些像是作弊≡耪樱可一思考它好像也符合機(jī)器學(xué)習(xí)的一貫做法极颓,像Yolo v2中使用的對(duì)訓(xùn)練數(shù)據(jù)集中的所有g(shù)round truth box進(jìn)行聚類(lèi)以獲得其prior box 合理width/height的做法不也是如此嘛。
實(shí)驗(yàn)結(jié)果
下面是它對(duì)細(xì)胞組織切片的處理結(jié)果群嗤,可以直觀感覺(jué)到它的效果菠隆。
最后下表為它在2015ISBI細(xì)胞定位大賽中與其它模型比較的結(jié)果。
代碼分析
下面是U-Net 的prototxt的構(gòu)成狂秘『Ь叮可見(jiàn)它是用的valid conv,同時(shí)在expand path中使用的上采樣則是用了deconvolution者春。
name: 'phseg_v5'
force_backward: true
layers { top: 'data' top: 'label' name: 'loaddata' type: HDF5_DATA hdf5_data_param { source: 'aug_deformed_phseg_v5.txt' batch_size: 1 } include: { phase: TRAIN }}
layers { bottom: 'data' top: 'd0b' name: 'conv_d0a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd0b' top: 'd0b' name: 'relu_d0b' type: RELU }
layers { bottom: 'd0b' top: 'd0c' name: 'conv_d0b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd0c' top: 'd0c' name: 'relu_d0c' type: RELU }
layers { bottom: 'd0c' top: 'd1a' name: 'pool_d0c-1a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd1a' top: 'd1b' name: 'conv_d1a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd1b' top: 'd1b' name: 'relu_d1b' type: RELU }
layers { bottom: 'd1b' top: 'd1c' name: 'conv_d1b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd1c' top: 'd1c' name: 'relu_d1c' type: RELU }
layers { bottom: 'd1c' top: 'd2a' name: 'pool_d1c-2a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd2a' top: 'd2b' name: 'conv_d2a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd2b' top: 'd2b' name: 'relu_d2b' type: RELU }
layers { bottom: 'd2b' top: 'd2c' name: 'conv_d2b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd2c' top: 'd2c' name: 'relu_d2c' type: RELU }
layers { bottom: 'd2c' top: 'd3a' name: 'pool_d2c-3a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd3a' top: 'd3b' name: 'conv_d3a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd3b' top: 'd3b' name: 'relu_d3b' type: RELU }
layers { bottom: 'd3b' top: 'd3c' name: 'conv_d3b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd3c' top: 'd3c' name: 'relu_d3c' type: RELU }
layers { bottom: 'd3c' top: 'd3c' name: 'dropout_d3c' type: DROPOUT dropout_param { dropout_ratio: 0.5 } include: { phase: TRAIN }}
layers { bottom: 'd3c' top: 'd4a' name: 'pool_d3c-4a' type: POOLING pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layers { bottom: 'd4a' top: 'd4b' name: 'conv_d4a-b' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1024 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd4b' top: 'd4b' name: 'relu_d4b' type: RELU }
layers { bottom: 'd4b' top: 'd4c' name: 'conv_d4b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1024 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'd4c' top: 'd4c' name: 'relu_d4c' type: RELU }
layers { bottom: 'd4c' top: 'd4c' name: 'dropout_d4c' type: DROPOUT dropout_param { dropout_ratio: 0.5 } include: { phase: TRAIN }}
layers { bottom: 'd4c' top: 'u3a' name: 'upconv_d4c_u3a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u3a' top: 'u3a' name: 'relu_u3a' type: RELU }
layers { bottom: 'd3c' bottom: 'u3a' top: 'd3cc' name: 'crop_d3c-d3cc' type: CROP }
layers { bottom: 'u3a' bottom: 'd3cc' top: 'u3b' name: 'concat_d3cc_u3a-b' type: CONCAT }
layers { bottom: 'u3b' top: 'u3c' name: 'conv_u3b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u3c' top: 'u3c' name: 'relu_u3c' type: RELU }
layers { bottom: 'u3c' top: 'u3d' name: 'conv_u3c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 512 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u3d' top: 'u3d' name: 'relu_u3d' type: RELU }
layers { bottom: 'u3d' top: 'u2a' name: 'upconv_u3d_u2a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u2a' top: 'u2a' name: 'relu_u2a' type: RELU }
layers { bottom: 'd2c' bottom: 'u2a' top: 'd2cc' name: 'crop_d2c-d2cc' type: CROP }
layers { bottom: 'u2a' bottom: 'd2cc' top: 'u2b' name: 'concat_d2cc_u2a-b' type: CONCAT }
layers { bottom: 'u2b' top: 'u2c' name: 'conv_u2b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u2c' top: 'u2c' name: 'relu_u2c' type: RELU }
layers { bottom: 'u2c' top: 'u2d' name: 'conv_u2c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u2d' top: 'u2d' name: 'relu_u2d' type: RELU }
layers { bottom: 'u2d' top: 'u1a' name: 'upconv_u2d_u1a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u1a' top: 'u1a' name: 'relu_u1a' type: RELU }
layers { bottom: 'd1c' bottom: 'u1a' top: 'd1cc' name: 'crop_d1c-d1cc' type: CROP }
layers { bottom: 'u1a' bottom: 'd1cc' top: 'u1b' name: 'concat_d1cc_u1a-b' type: CONCAT }
layers { bottom: 'u1b' top: 'u1c' name: 'conv_u1b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u1c' top: 'u1c' name: 'relu_u1c' type: RELU }
layers { bottom: 'u1c' top: 'u1d' name: 'conv_u1c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u1d' top: 'u1d' name: 'relu_u1d' type: RELU }
layers { bottom: 'u1d' top: 'u0a' name: 'upconv_u1d_u0a' type: DECONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 pad: 0 kernel_size: 2 stride: 2 weight_filler { type: 'xavier' }} }
layers { bottom: 'u0a' top: 'u0a' name: 'relu_u0a' type: RELU }
layers { bottom: 'd0c' bottom: 'u0a' top: 'd0cc' name: 'crop_d0c-d0cc' type: CROP }
layers { bottom: 'u0a' bottom: 'd0cc' top: 'u0b' name: 'concat_d0cc_u0a-b' type: CONCAT }
layers { bottom: 'u0b' top: 'u0c' name: 'conv_u0b-c' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u0c' top: 'u0c' name: 'relu_u0c' type: RELU }
layers { bottom: 'u0c' top: 'u0d' name: 'conv_u0c-d' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 pad: 0 kernel_size: 3 engine: CAFFE weight_filler { type: 'xavier' }} }
layers { bottom: 'u0d' top: 'u0d' name: 'relu_u0d' type: RELU }
layers { bottom: 'u0d' top: 'score' name: 'conv_u0d-score' type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { engine: CAFFE num_output: 2 pad: 0 kernel_size: 1 weight_filler { type: 'xavier' }} }
layers { bottom: 'score' bottom: 'label' top: 'loss' name: 'loss' type: SOFTMAX_LOSS loss_param { ignore_label: 2 }include: { phase: TRAIN }}
參考文獻(xiàn)
- U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf-Ronneberger, 2015
- https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/