Deep Depth Completion of a Single RGB-D Image

Title.png

單個RGB-D圖像的深度補(bǔ)全
主頁：http://deepcompletion.cs.princeton.edu/

Github：https://github.com/yindaz/DeepCompletionRelease

Paper：http://deepcompletion.cs.princeton.edu/paper.pdf

Abstract

Goal--complete the depth channel of
an RGB-D image

Problem--Commodity-grade depth cameras often fail to sense depth for shiny, bright, transparent, and distant surfaces

Method--takes an RGB image as input and predicts dense surface normals and occlusion boundaries. Those predictions are then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation

Introduction

Goal: to complete the depth channel of
an RGB-D image captured with a commodity camera (i.e., fill all the holes) 填充深度圖的空缺

以前的depth inpainting (深度修復(fù))方法使用 hand-tuned（手工調(diào)整）來解決溉知，該方法通過外推邊界表面陨瘩、馬爾可夫圖像合成；來fill holes

Figure 1

深度網(wǎng)絡(luò)已經(jīng)用于depth estimation级乍，但還未用來depth completion,因為有以下幾個難點：

Training data

Large-scale training sets are not readily available for captured RGB-D images paired with ”completed” depth images (e.g., where ground-truth depth is provided for holes)

對于和補(bǔ)全的深度圖配對的捕獲的RGB-D圖像舌劳，這樣的大規(guī)模訓(xùn)練數(shù)據(jù)不易獲得

這樣 depth estimation只能重現(xiàn) observed depth，不能估計 unobserved 深度信息

本文引入了新數(shù)據(jù)集玫荣，105432張RGB-D 圖像甚淡，且與在72個真實環(huán)境中從大規(guī)模表面重建計算出的完整深度圖像對應(yīng)

Depth representation

深度表示

直接用FCN回歸depth不work尤其是對圖1中缺失那么大的一片區(qū)域，因為即使對人類來說捅厂，從單眼 color image 估計精確深度也很難啊
所以本文先用網(wǎng)絡(luò)預(yù)測depth的 局部微分屬性：表面法線&遮擋邊界

Deep network design 深度網(wǎng)絡(luò)設(shè)計

以前沒有人去訓(xùn)練一個端對端網(wǎng)絡(luò)去從RGB-D圖像補(bǔ)全深度

一個想法是擴(kuò)展一下前人color to depth 的網(wǎng)絡(luò)贯卦，但是

they generally learn only to copy and interpolate the input depth
它們通常只學(xué)習(xí)對輸入的depth復(fù)制和插值。

It is also challenging for the network to learn how to adapt for misalignments of color and depth

這里的不對齊具體指啥焙贷，有空間位置的不對齊嗎.有顏色信息的像素不一定有深度信息撵割？

本文是只將 color image 作為輸入，先預(yù)測 local surface normals and occlusion boundaries with supervision,因為 從顏色信息預(yù)測局部特征是深度網(wǎng)絡(luò)可以勝任的辙芍。然后通過一個把這些預(yù)測和輸入depth組合起來的全局優(yōu)化問題來complete depth

The coarse-scale structure of the scene is reconstructed through global optimization with regularization from the input depth
通過從輸入深度進(jìn)行正則化的全局優(yōu)化來重建場景的粗尺度結(jié)構(gòu)啡彬。

Main Insight

prediction of surface normals and occlusion boundaries only from color

optimization of global surface structure from those predictions with soft constraints provided by observed depths

好處：這樣做smaller relative error ，網(wǎng)絡(luò)獨立于observed depth故硅，不需要因為新的depth sensors再次訓(xùn)練 ?

Figure 2

Related work

Depth estimation

depth estimation from a monocular color image 單目彩色圖像

Classic methods:
Shape-from-shading

R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: a survey. IEEE transactions on pattern analysis and machine intelligence, 21(8):690–706, 1999.

Shape-from-defocus

S. Suwajanakorn, C. Hernandez, and S. M. Seitz. Depth from focus with your mobile phone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3497–3506, 2015. 2

Others

based on hand-tuned models and/or assumptions about surface orientations

D. Hoiem, A. A. Efros, and M. Hebert. Automatic photo pop-up. ACM transactions on graphics (TOG), 24(3):577– 584, 2005.
A. Saxena, S. H. Chung, and A. Y. Ng. Learning depth from single monocular images. In Advances in neural information processing systems, pages 1161–1168, 2006.
A. Saxena, M. Sun, and A. Y. Ng. Make3d: Learning 3d scene structure from a single still image. IEEE transactions on pattern analysis and machine intelligence, 31(5):824– 840, 2009. 2

Newer methods
based on DL

D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolu- tional architecture. In Proceedings ofthe IEEE International Conference on Computer Vision, pages 2650–2658, 2015.
D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems, pages 2366–2374, 2014. 

I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 Fourth Interna- tional Conference on, pages 239–248. IEEE, 2016.

F. Liu, C. Shen, G. Lin, and I. Reid. Learning depth from sin- gle monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelli- gence, 38(10):2024–2039, 2016.

A. Roy and S. Todorovic. Monocular depth estimation using neural regression forest. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 5506–5514, 2016.

we focus on depth completion, where the explicit goal is to make novel predictions for pixels where the depth sensor has no return.
上面的方法只能重現(xiàn) 商用RGB-D獲得的原始深度外遇，本文關(guān)注的深度補(bǔ)全是對深度傳感器沒有返回值的像素點進(jìn)行全新的深度預(yù)測

Depth inpainting 深度修復(fù)

filling holes in depth channels of RGB-D images

-Old methods

D. Herrera, J. Kannala, J. Heikkil¨a, et al. Depth map inpainting under a second-order smoothness prior. In Scandinavian Conference on Image Analysis, pages 555–566. Springer, 2013

X. Gong, J. Liu, W. Zhou, and J. Liu. Guided depth enhancement via a fast marching method. Image and Vision Computing, 31(10):695–703, 2013.
J. Liu, X. Gong, and J. Liu. Guided inpainting and filtering for kinect depth maps. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2055–2058. IEEE, 2012.

M. Bertalmio, A. L. Bertozzi, and G. Sapiro. Navier-stokes, fluid dynamics, and image and video inpainting. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings ofthe 2001 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE, 2001.

J. Liu and X. Gong. Guided depth enhancement via anisotropic diffusion. In Pacific-Rim Conference on Multimedia, pages 408–417. Springer, 2013.

K. Matsuo and Y. Aoki. Depth image enhancement using local tangent plane approximations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 3574–3583, 2015.
S. M. Muddala, M. Sjostrom, and R. Olsson. Depth-based inpainting for disocclusion filling. In 3DTV-Conference: The True Vision-Capture, Transmission and Display of3D Video (3DTV-CON), 2014, pages 1–4. IEEE, 2014.
A. K. Thabet, J. Lahoud, D. Asmar, and B. Ghanem. 3d aware correction and completion of depth maps in piecewise planar scenes. In Asian Conference on Computer Vision, pages 226–241. Springer, 2014.

W. Chen, H. Yue, J. Wang, and X. Wu. An improved edge detection algorithm for depth map inpainting. Optics and Lasers in Engineering, 55:69–77, 2014.
H.-T. Zhang, J. Yu, and Z.-F. Wang. Probability contour guided depth map inpainting and superresolution using nonlocal total generalized variation. Multimedia Tools and Applications, pages 1–18, 2017.
Y. Zuo, Q. Wu, J. Zhang, and P. An. Explicit edge inconsis- tency evaluation model for color-guided depth map enhancement. IEEE Transactions on Circuits and Systems for Video Technology, 2016.

H. Xue, S. Zhang, and D. Cai. Depth image inpainting: Improving low rank matrix completion with low gradient regularization. IEEE Transactions on Image Processing, 26(9):4311–4320, 2017.

M. Kulkarni and A. N. Rajagopalan. Depth inpainting by tensor voting. JOSA A, 30(6):1155–1165, 2013.

M. Liu, X. He, and M. Salzmann. Building scene models by completing and hallucinating depth and semantics. In European Conference on Computer Vision, pages 258–274. Springer, 2016

J. T. Barron and J. Malik. Intrinsic scene properties from a single rgb-d image. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 17–24, 2013.

M. Ciotta and D. Androutsos. Depth guided image completion for structure and texture synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 1199–1203.
D. Doria and R. J. Radke. Filling large holes in lidar data by inpainting depth gradients. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 65–72.
J. Gautier, O. Le Meur, and C. Guillemot. Depth-based image completion for view synthesis. In 3DTV Conference: The True Vision-capture, Transmission and Display of 3D Video (3DTV-CON), 2011, pages 1–4.

DL methods：

Auto-encoder

A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixel-cnn decoders. In Advances in Neural Information Processing Systems, pages 4790–4798, 2016

GAN

D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 2536–2544, 2016

先前的方法沒有研究depth images的inpainting，由于depth images缺少魯棒的特征strong features 和大規(guī)模訓(xùn)練數(shù)據(jù)契吉，這是比較難的問題

Depth super-resolution 深度超分辨

to improve the spatial resolution of depth images using high-resolution color

Markov random fields

[44] [12] [42] [51] [58]
O. Mac Aodha, N. D. Campbell, A. Nair, and G. J. Brostow. Patch based synthesis for single depth image super-resolution. In European Conference on Computer Vision, pages 71–84. Springer, 2012.
J. Diebel and S. Thrun. An application of markov random fields to range sensing. In Advances in neural information processing systems, pages 291–298, 2006
J. Lu, D. Min, R. S. Pahwa, and M. N. Do. A revisit to mrf-based depth map super-resolution and enhancement. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 985–988. IEEE, 2011
J. Park, H. Kim, Y.-W. Tai, M. S. Brown, and I. Kweon. High quality depth map upsampling for 3d-tof cameras. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1623–1630. IEEE, 2011
E. Shabaninia, A. R. Naghsh-Nilchi, and S. Kasaei. High-order markov random field for single depth image super- resolution. IET Computer Vision, 2017

Shape-from-shading

[23] [71]
Y. Han, J.-Y. Lee, and I. So Kweon. High quality shape from a single rgb-d image under uncalibrated natural illumination. In Proceedings ofthe IEEE International Conference on Computer Vision, pages 1617–1624, 2013
L.-F. Yu, S.-K. Yeung, Y.-W. Tai, and S. Lin. Shading-based shape refinement of rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1415–1422, 2013

Segmentation

[41]
J. Lu and D. Forsyth. Sparse depth super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2245–2253, 2015

Dictionary methods

[18] [30] [45] [63]
W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-based super-resolution. IEEE Computer graphics and Applications, 22(2):56–65, 2002
M. Kiechle, S. Hawe, and M. Kleinsteuber. A joint intensity and depth co-sparse analysis model for depth map super-resolution. In Proceedings ofthe IEEE International Conference on Computer Vision, pages 1545–1552, 2013
M. Mahmoudi and G. Sapiro. Sparse representations for range data restoration. IEEE Transactions on Image Processing, 21(5):2909–2915, 2012
I. Tosic and S. Drewes. Learning joint intensity-depth sparse representations. IEEE Transactions on Image Processing, 23(5):2122–2132, 2014

盡管一些方法可以被用來 depth completion跳仿，但兩者的關(guān)注點是不同的。

where low-resolution measurements are assumed to be complete and regularly sampled. In contrast, our focus is on filling holes, which can be quite large and complex and thus require synthesis of large-scale content

Depth reconstruction from sparse samples 稀疏樣本的深度重建

其他工作已經(jīng)研究了用稀疏深度測量集增強(qiáng)的彩色圖像的深度重建捐晶。

S. Hawe, M. Kleinsteuber, and K. Diepold. Dense disparity maps from sparse disparity measurements. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2126–2133

L.-K. Liu, S. H. Chan, and T. Q. Nguyen. Depth reconstruction from sparse samples: Representation, algorithm, and sampling. IEEE Transactions on Image Processing, 24(6):1983–1996, 2015

F. Ma and S. Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. arXiv preprint arXiv:1709.07492, 2017

但是該研究的motivation是降低一定設(shè)置下的傳感成本（例如節(jié)省機(jī)器人的成本）菲语，而不是depth completion

Method

對應(yīng)于introduction提到的三個難點妄辩，本文的研究也聚焦于以下三個問題：

how can we get training data for depth completion

what depth representation should we use

how should cues from color and depth be combined

Dataset

to create a dataset of RGB-D
images paired with completed depth images

Straight forward approach:

to capture images with a low-cost RGB-D camera and align them to images captured simultaneously with a higher cost depth sensor
這項任務(wù)的直接方法是使用低成本的RGB-D攝像機(jī)捕獲圖像，并將它們與使用成本較高的深度傳感器同時捕獲的圖像對齊

但這方法昂貴耗時山上，這個類型的public dataset只包含少量的 indoor scenes

本文方法

utilize existing surface meshes reconstructed from multi-view RGB-D scans of large environments

for example: Matterport3D [6], ScanNet [10], SceneNN [28], and SUN3D[22,67]

a) For each scene, we extract a triangle mesh M with ～1-6 million triangles per room from a global surface reconstruction using screened Poisson surface reconstruction.
對每個場景眼耀，使用篩選的泊松表面重建(screened Poisson surface reconstruction)從全局表面重構(gòu)中提取三角形網(wǎng)格M，該網(wǎng)格在每個房間中含有1-6百萬個三角形

b) for a sampling of RGB-D images in the scene, we render the reconstructed mesh M from the camera pose of the image viewpoint to acquire a completed depth image D*
對于場景中RGB-D圖像的一個采樣佩憾，從圖像視點的攝像機(jī)姿態(tài)渲染重建的網(wǎng)格M哮伟，從而得到完整的depth image D*

這就得到了包含 RGB-D & D* 圖像對的數(shù)據(jù)集！

Figure 3

疑問：多視點的 RGB-D images 的結(jié)合是需要圖像之間的配準(zhǔn)regirstration 吧妄帘？難道這個得到mesh的過程是原來的數(shù)據(jù)集現(xiàn)成的楞黄？全局表面重建是數(shù)據(jù)集現(xiàn)有的
參見

A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgb-d data in indoor environments. Interna- tional Conference on 3D Vision (3DV), 2017

這樣的得到的completed depth image 的優(yōu)點,這些特性有助于網(wǎng)絡(luò)的訓(xùn)練：

a)
have fewer holes. On average, 64.6% of the pixels missing from the raw depth images are filled in by our reconstruction process

b)
the completed depth images generally replicate
the resolution of the originals for close-up surfaces, but provide far better resolution for distant surfaces.
深度補(bǔ)全的圖像對于近距離的表面復(fù)制了原始的分辨率，遠(yuǎn)距離的表面會比原始分辨率有提高

由于表面重建以與深度相機(jī)的分辨率相當(dāng)?shù)?D網(wǎng)格尺寸構(gòu)造抡驼，因此在完成的深度圖像中通常不會損失分辨率鬼廓。 然而，當(dāng)投影到視平面上時致盟，相同的3D分辨率為遠(yuǎn)離相機(jī)的表面提供了有效的更高像素分辨率碎税。因此，完成的深度圖像可以在渲染高分辨率網(wǎng)格時利用子像素抗鋸齒來獲得比原始圖像更精細(xì)的分辨率（請注意圖3中家具中的細(xì)節(jié)）馏锡。WHY

c)
completed depth images generally have far less noise than the originals
由于表面重建算法通過濾波和平均來組合來自許多攝像機(jī)視圖的噪聲深度樣本雷蹂，因此它基本上對表面進(jìn)行去噪。這對于遠(yuǎn)距離觀測（例如杯道，> 4米）尤其重要萎河，其中原始深度測量被量化并且噪聲很大

本文的數(shù)據(jù)集有 117516 RGB-D images with rendered completions：
Training set:105432;Test set:12084

Depth Representation

what geometric representation is best for deep depth completion

Straight-forward approach:

to design a network that
regresses completed depth from raw depth and color.

然而difficult to predict absolute depth from monocular images,

as it may require knowledge of object sizes,scene categories,etc

本文是預(yù)測每個像素的局部屬性，表面法線蕉饼、遮擋邊界

Indirect representation of depth

a) relative depth [7]
b) depth derivatives [5] 做了實驗，但不是最好的
c) depth derivatives & depth [35]

image.png

why use 表面法線遮擋邊界：

normals are differential surface properties差分表面屬性, they depend only on local neighborhoods of pixels;

relate strongly to local lighting variations directly observable in a color image 和彩色圖像中可直接觀察到的局部光照變化非常相關(guān) ？

so,的工作在從顏色圖像到表面法線的密集預(yù)測效果不錯 [1,15,34,66,75]

occlusion boundaries produce local patterns in pixels (e.g., edges), and so they usually can be robustly detected with a deep network [14, 75]

那么，如何從surface normals & occlusion boundary 計算深度：

理論上來說那是不可能的：

the depth relationships between different parts of the image cannot be inferred only from normals
見圖4 a

在現(xiàn)實場景中朋譬，圖像區(qū)域不太可能同時被遮擋邊界包圍音同，并且根本不包含原始深度觀察（圖4 b）

所以即使是較大的空缺，也可以使用surface normals 補(bǔ)全深度矿酵，有預(yù)測遮擋邊界加權(quán)。
其中預(yù)測的表面法線具有通過預(yù)測的遮擋邊界加權(quán)的相干性，并且由觀察到的原始深度限制正則化

Figure 5

Network Architecture and training

what is the best way to
train a deep network to predict surface normals and occlusion boundaries for depth completion?

Network architecture:
[75]

Y. Zhang, S. Song, E. Yumer, M. Savva, J.-Y. Lee, H. Jin, and T. Funkhouser. Physically-based rendering for indoor scene understanding using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

The model is a fully convolutional neural network built on the back-bone of VGG-16 with symmetry encoder and decoder 基于VGG16叹侄，有對稱的編碼器和解碼器
用從重建網(wǎng)格計算出的表面法線和輪廓邊界來訓(xùn)練網(wǎng)絡(luò)

How to train it
本文以法線估計為例，遮擋邊界檢測同理

a) What loss should be used to train the network

two choices: trained only on holes vs all pixels:

loss of all pixels (oberved and holes pixels)
loss for only unobserved pixels (holes) by masking out the gradients on other pixels during the back-propagation

trained with rendered normals VS raw normals ?
詳見論文附件

image.png

對比實驗結(jié)果：

the models trained with all pixels perform better than the ones using only observed or only unobserved pixels, and ones trained with rendered normals perform better than with raw normals
結(jié)果顯示昨登，所有像素訓(xùn)練的模型比僅使用觀察到的或僅使用未觀察到的像素訓(xùn)練的模型表現(xiàn)更好趾代，而使用渲染法線訓(xùn)練的模型比使用原始法線訓(xùn)練的模型表現(xiàn)更好

b) What image channels should be input to the network

實驗表明如果用RGB-D作為輸入來預(yù)測法線，在holes部分像素的預(yù)測很差（盡管對于observed pixels work）丰辣，推測這種網(wǎng)絡(luò)只是從RGB-D中的depth channel預(yù)測normals撒强，所以對于hole就不能work了

Figure 5

圖5的結(jié)論啟發(fā)作者僅用color image 預(yù)測 surface normals

separating “prediction without depth” from “optimization with depth” is compelling for two reasons：好處

對于不同的深度傳感器不用重訓(xùn)練網(wǎng)絡(luò) （==不理解換了sensor就得re train==禽捆？）
優(yōu)化可以推廣到==各種深度觀測作為正則化==，包括稀疏深度樣本[43]

Optimizations

前面的網(wǎng)絡(luò)預(yù)測得到 surface normal image N 和 occlusion boundary image B(==長什么樣子飘哨？==)

求解一個方程組

目標(biāo)函數(shù)由四個平方誤差的加權(quán)求和

Optimization

$E_D$ ：估計的深度和原始觀察到的深度的距離

$E_N$ ：預(yù)測的深度和表面法線的一致性 by 切線點乘法線

$E_S$ ：促使相鄰像素有相似的深度值

B： $B ∈ [0, 1] $ down- weights the normal terms based on the predicted probability a pixel is on an occlusion boundary $(B(p))$

==提問：如果在邊界胚想，~~實際是不滿足法線垂直切線，所以減小他的權(quán)重~~芽隆？極端情況只考慮在遮擋邊界的 $E_N$ ??==

this objective function is non-linear,
due to the normalization of the tangent vector $v(p, q)$ required for the dot product with the surface normal in $E_N$

==疑問：本來平方誤差不就已經(jīng)是非線性了嗎==

approximate this error term with ==a linear formation by foregoing the vector normalization==, as suggested in [50].

目標(biāo)函數(shù)的矩陣形式是稀疏且對稱正定的浊服，所以可使用==a sparse Cholesky factorization [11] 稀疏 Cholesky 分解== 來求解近似的目標(biāo)含函數(shù)

曲面法線和遮擋邊界（以及可選的深度導(dǎo)數(shù)）僅捕獲==曲面幾何的局部屬性==，這使得它們相對容易估計胚吁。只有通過全局優(yōu)化牙躺，我們才能將它們組合在一起，在一致的解決方案中補(bǔ)全所有像素的深度in a consistent solution囤采。

Experiment Result

Unless otherwise specified, networks were pretrained on the SUNCG dataset [60, 75] and fine-tuned on the training split of the our new dataset using only color as input and a loss computed for all rendered pixels.
Optimizations were performed with $λ_D = 10^3$ , $λ_N = 1$ , and $λ_S = 10^{-3}$ . Evaluations were performed on the test split of our new dataset.

time cost

Task	time	Hardware
normals & occlusion boundaries	~0.3 s	NVIDIA TITAN X
solving equations	~1.5 s	Intel Xeon 2.4GHz CPU

Ablation Studies

Evaluation metrics

median error relative to the rendered depth(Rel) 相對于渲染深度的==中值誤差==
the Root Mean Squares Error in meters(RMSE) 以米為單位的均方根誤差
percentages of pixels with predicted depths falling within an interval( $\delta=|predicted-true|/true|)$ $\delta=1.05,1.10 ,1.25,1.25^2,1.25^3$

(以上衡量depth error,下面是衡量surface normals)

mean and median errors in degrees 以度為單位的平均誤差述呐、中值誤差
the percentages of pixels with predicted normals less than thresholds of 11.25、22.5蕉毯、30 degrees 預(yù)測==法線小于這些閾值的百分比==?

how different
test inputs, training data, loss functions, depth representations, and optimization methods affect the depth prediction results

1. What data should be input to the network

table 1展示了不同輸入下的結(jié)果（表中箭頭向上越大越好乓搬；反之，越小越好）

例如 normal 的 median error 17.28 < 23.59代虾；depth的 Rel 0.089<0.09

==補(bǔ)充材料==里還展示了不同loss 設(shè)置下(observed only VS unobserved only)进肯，這個優(yōu)勢依然存在

作者認(rèn)為當(dāng)為observed depth時，網(wǎng)絡(luò)會學(xué)習(xí)進(jìn)行插值而不是在holes合成新的depth棉磨。

Table 1

++這個實驗結(jié)果促使本文將整個方法分為兩個步驟 two stage system++ !!

2. What depth representation is best

train networks separately to predict depths(D),surface normals(N),and depth derivatives(DD) 深度導(dǎo)數(shù),then use different combinations to complete the depth by optimization Equation 1.

表二注意這里的D是從depth 預(yù)測 depth

Table 2

以Rel為例 N 0.089 < N+DD 0.092 < DD 0.100 < D 0.167江掩。

作者認(rèn)為由于表面法線只代表了orientation of surfaces ，比較好預(yù)測乘瓤，詳見[31]环形；而==且他不隨深度的變化而變化，在不同的視圖里更一致==

3. Does prediction of occlusion boundaries help

Whether down-weighting the effect of surface normals near predicted occlusion boundaries helps the optimizer solve for better depths

表2 yes 表示有B衙傀，No 表示沒有down-weights 對比 0.089<0.110抬吟，提升約 19%。
occlusion boundaries ==區(qū)域的surface normals是嘈雜统抬，不準(zhǔn)確的火本？== 圖6

Fidure 6

第2列是網(wǎng)絡(luò)輸出的法線和遮擋邊界，第2行第3聪建、4列為是否有boundary weight 的對比钙畔。第1行的3、4列是從輸出的深度圖計算的surface normal金麸。遮擋（閉塞）邊界==提供了深度不連續(xù)性信息擎析，有助于保持邊界的清晰度/銳度==看從深度計算的法線圖

4. How much observed depth is necessary

to test how much depth completion method depends on the quantity 數(shù)量 of input depths
degraded the input depth images by ==randomly masking different numbers of pixels== before giving them to the optimizer to solve for completed depths from predicted normals and boundaries

圖 7

圖像橫軸是圖像中具有深度的像素的個數(shù)（未被masked），左圖展示了predicted depth accuracy of observed pixels,右圖為predicted depth accuracy of unobserved pixels

顯然unobserved 的accuracy 低于 observed挥下；但是只要有一小部分的輸入depth(==2000 depths 只占all pixels 的 2.5%==) .這從側(cè)面說明即使是其他depth sensors designs with sparse measurements叔锐，也能得到比較客觀的預(yù)測效果挪鹏，==也不用重訓(xùn)練網(wǎng)絡(luò)（網(wǎng)絡(luò)輸入只是顏色啊）== 但是你訓(xùn)練網(wǎng)絡(luò)時的ground truth normals 來自rendered depth image 坝淅印讨盒？？如果只做個測試感覺確實不特別依靠raw depth的數(shù)目

Comparison to Baseline Methods

compares to baseline depth inpainting and depth estimation methods.

1.Comparison to Inpainting Methods

non-data-driven alternatives for depth inpainting
The focus of this study is to establish how well-known methods perform to provide a baseline on how hard the problem is for this new dataset 本研究的重點是確定知名方法的性能步责，以提供有關(guān)此新數(shù)據(jù)集的問題難度的基準(zhǔn)

表3

Table 3

表中的對比方法分別是聯(lián)合雙線性濾波返顺、快速雙邊求解、全局邊緣感知能量優(yōu)化
發(fā)現(xiàn)Rel是所有方法中最小的

圖8展示了與聯(lián)合雙線性濾波的比較

圖 8 上

圖 8 下

圖8展示的結(jié)果看蔓肯，本文方法的深度圖邊界更精確

2. Comparison to Depth Estimation Methods

和color to depth的深度估計方法對比

[33] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 239–248. IEEE, 2016
[5] A. Chakrabarti, J. Shao, and G. Shakhnarovich. Depth from a single image by harmonizing overcomplete local network predictions. In Advances in Neural Information Processing Systems, pages 2658–2666, 2016

表 4

Table 4

本文方法個指標(biāo)都為最佳遂鹊，提升23-40%。 Y表示 observed depth N表示 unobserved
這也表明預(yù)測法線對于深度估計問題也是不錯的方法

圖 9 上

圖 9 下

注意看蔗包，不僅預(yù)測的深度更準(zhǔn)確秉扑，而且通過對比計算出的surface normals，說明本文方法學(xué)習(xí)到了更好的場景幾何結(jié)構(gòu)

Conclusion

two main research contribution
First, it proposes to complete depth with a two stage process where surface normals and occlusion boundaries are predicted from color, and then completed depths are solved from those predictions.

Second, it learns to complete depth images by supervised training on data rendered from large-scale surface reconstruction 從大規(guī)模表面重建渲染得到的數(shù)據(jù)進(jìn)行監(jiān)督訓(xùn)練

搭建橋梁溝通了彩色圖和深度圖信息橋就是normals!

顯而易見调限，這是一個犧牲時間換取圖像質(zhì)量的游戲

1.速度很慢舟陆。

分辨率320x256的圖像，使用NVIDIA TITAN X GPU還需要大約0.3秒耻矮；Intel Xeon 2.4GHz CPU上大約1.5秒.

2.依賴高性能硬件秦躯。難以控制成本

最后編輯于：2018.11.10 22:23:34

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市裆装，隨后出現(xiàn)的幾起案子踱承，更是在濱河造成了極大的恐慌，老刑警劉巖哨免，帶你破解...
沈念sama閱讀 216,843評論 6贊 502
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件茎活，死亡現(xiàn)場離奇詭異，居然都是意外死亡琢唾，警方通過查閱死者的電腦和手機(jī)载荔，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,538評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來慧耍，“玉大人，你說我怎么就攤上這事丐谋∩直蹋” “怎么了？”我有些...
開封第一講書人閱讀 163,187評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵号俐，是天一觀的道長泌豆。經(jīng)常有香客問我，道長吏饿，這世上最難降的妖魔是什么踪危？我笑而不...
開封第一講書人閱讀 58,264評論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任蔬浙，我火速辦了婚禮，結(jié)果婚禮上贞远，老公的妹妹穿的比我還像新娘畴博。我一直安慰自己，他們只是感情好蓝仲，可當(dāng)我...
茶點故事閱讀 67,289評論 6贊 390
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布俱病。她就那樣靜靜地躺著，像睡著了一般袱结。火紅的嫁衣襯著肌膚如雪亮隙。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,231評論 1贊 299
城市分裂傳說
那天垢夹，我揣著相機(jī)與錄音溢吻，去河邊找鬼。笑死果元，一個胖子當(dāng)著我的面吹牛促王，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播噪漾，決...
沈念sama閱讀 40,116評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼硼砰，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了欣硼？” 一聲冷哼從身側(cè)響起题翰，我...
開封第一講書人閱讀 38,945評論 0贊 275
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎诈胜，沒想到半個月后豹障，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,367評論 1贊 313
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡焦匈，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,581評論 2贊 333
?白月光啟示錄
正文我和宋清朗相戀三年累魔，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片够滑。...
茶點故事閱讀 39,754評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡垦写，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出彰触，到底是詐尸還是另有隱情梯投，我是刑警寧澤，帶...
沈念sama閱讀 35,458評論 5贊 344
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站分蓖，受9級特大地震影響尔艇，放射性物質(zhì)發(fā)生泄漏么鹤。R本人自食惡果不足惜终娃，卻給世界環(huán)境...
茶點故事閱讀 41,068評論 3贊 327
男人毒藥：我在死后第九天來索命
文/蒙蒙一尝抖、第九天我趴在偏房一處隱蔽的房頂上張望迅皇。院中可真熱鬧昧辽，春花似錦、人聲如沸登颓。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,692評論 0贊 22
一樁弒父案咕痛，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽者铜。三九已至，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,842評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工讨勤，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人屉来。一個月前我還...
沈念sama閱讀 47,797評論 2贊 369
代替公主和親
正文我出身青樓，卻偏偏與公主長得像狈癞，于是被迫代替她去往敵國和親茄靠。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 44,654評論 2贊 354