Abstract
Specifically, an orientation-encoding unit is designed to describe eight crucial orientations, and multi-scale representation is achieved by stacking several orientation-encoding units. 具體地狭姨,一個(gè)方向編碼單元被設(shè)計(jì)來(lái)描述八個(gè)關(guān)鍵的方向,并且通過(guò)堆疊幾個(gè)方向編碼單元來(lái)實(shí)現(xiàn)多尺度表示法。
1. Introduction
在3D物體分類(lèi)、3D物體偵測(cè)和3D語(yǔ)義分割這些任務(wù)中,對(duì)點(diǎn)云進(jìn)行語(yǔ)義標(biāo)注的3D語(yǔ)義分割是比較有挑戰(zhàn)性的檬贰。?
Firstly, the sparseness of point cloud in 3D space makes most spatial operators inefficient. 首先,點(diǎn)云在三維空間中的稀疏性使得大部分空間算子效率低下。
Moreover, the relationship between points is implicit and difficult to be represented due to the unordered and unstructured property of point cloud. 其次锋玲,由于點(diǎn)云的無(wú)序性和非結(jié)構(gòu)化,點(diǎn)與點(diǎn)之間的關(guān)系是隱式的涵叮,難以表示惭蹂。
有幾種解決方法:handcrafted voxel feature & 2D CNN features from RGBD images。
Additionally, there is a dilemma between 2D convolution and 3D convolution: 2D convolution fails to capture 3D geometry information such as normal and shape while 3D convolution requires heavy computation. 另外割粮,在二維卷積和三維卷積之間存在一個(gè)兩難的問(wèn)題:二維卷積無(wú)法捕捉到法線盾碗、形狀等三維幾何信息,而三維卷積需要大量的計(jì)算舀瓢。
Recently, PointNet architecture [22] directly operates on point cloud instead of 3D voxel grid or mesh. It not only accelerates computation but also notably improves the segmentation performance.?最近廷雅,PointNet體系結(jié)構(gòu)[22]直接運(yùn)行在點(diǎn)云上,而不是3D體素網(wǎng)格或網(wǎng)格京髓。它不僅加快了計(jì)算速度榜轿,而且顯著提高了分割性能。
We get inspiration from the successful feature detection algorithm Scale-invariant feature transform (SIFT) [15] which involves two key properties: scale-awareness and orientation-encoding.
we design a novel module called PointSIFT for 3D understanding that possesses the properties.
The basic building block of our PointSIFT module is an orientation-encoding (OE) unit that convolves the features of the nearest points in 8 orientations.?我們的PointSIFT模塊的基本構(gòu)件是一個(gè)方向編碼(OE)單元朵锣,它在8個(gè)方向上卷積最近點(diǎn)的特征谬盐。
In comparison to K-nearest neighbor search in PointNet++ [24] where K neighbors may fall in one orientation, our OE unit captures information of all orientations. We further stack several OE units in one PointSIFT module for representation of different scales.??與PointNet++[24]中的K近鄰搜索(其中K個(gè)近鄰可能位于一個(gè)方向)相比,我們的OE單元捕獲所有方向的信息诚些。我們進(jìn)一步堆疊幾個(gè)OE單位在一個(gè)PointSIFT模塊表示不同的比例飞傀。
In order to make the whole architecture scale-aware, we connect these OE units by shortcuts and jointly optimize for adaptive scales.?為了使整個(gè)架構(gòu)具有尺度感知能力,我們通過(guò)快捷方式將這些OE單元連接起來(lái)诬烹,共同優(yōu)化自適應(yīng)尺度砸烦。
We further build a hierarchical architecture that recursively applies the PointSIFT module as local feature descriptor.?我們進(jìn)一步構(gòu)建了遞歸應(yīng)用PointSIFT模塊作為局部特征描述符的層次結(jié)構(gòu)。
Resembling conventional segmentation framework in 2D [26] and 3D [24], our two-stage network first downsamples the point cloud for effective calculation and then upsamples to get dense predictions.?與傳統(tǒng)的二維[26]和三維[24]分割框架類(lèi)似绞吁,我們的兩階段網(wǎng)絡(luò)首先對(duì)點(diǎn)云進(jìn)行降采樣以進(jìn)行有效計(jì)算幢痘,然后再對(duì)點(diǎn)云進(jìn)行上采樣以獲得稠密預(yù)測(cè)。
3. Problem Statement
4. Our Method
encode-decode (downsample-upsample) framework
In the downsampling stage, we recursivelyapply our proposed PointSIFT module combined with setabstraction (SA) module introduced in [24] for hierarchicalfeature embedding.?
For upsampling stage dense feature isenabled by effectively interleaving feature propagation (FP)module [24] with PointSIFT module.?
4.1. PointSIFT Module
Given an n * d matrix as input which describes a point set of size n with d dimension feature for every point, PointSIFT module outputs an n * d matrix that assigns a new d dimension feature to every point.
4.1.1 Orientation-encoding
Local descriptors in previous methods typically apply unordered operation (e.g., max pooling [24, 32]) based on the observation that point cloud is unordered and unstructured.
However, using ordered operator could be much more informative(max pooling discards all inputs except for the maximum) while still preserves the invariance to order ofinput points.?然而家破,使用有序操作符可以提供更多的信息(最大池丟棄除最大值以外的所有輸入)颜说,同時(shí)仍然保持輸入點(diǎn)的順序不變购岗。
One natural ordering for point cloud is the one induced by the ordering of the three coordinates.?點(diǎn)云的一種自然排序是由三個(gè)坐標(biāo)的排序?qū)С龅摹his observation leads us to the Orientation-encoding(OE) unit which is a point-wise local feature descriptor that encodes information of eight orientations.
The first stage of OE embedding is Stacked 8-eighborhood(S8N) Search which finds nearest neighbors in each of the eight octants partitioned by ordering of three coordinates.?該算法的第一個(gè)階段是8-鄰域(S8N)搜索门粪,通過(guò)三個(gè)坐標(biāo)的排序喊积,在每個(gè)分區(qū)中找到最近的鄰域。
Since distant points provides little information for description of local patterns, when no point exists within searching radius r in some octant, we duplicate p0 as the nearest neighbor of itself.?由于距離遠(yuǎn)的點(diǎn)提供的局部模式描述信息較少玄妈,當(dāng)某個(gè)八分域的搜索半徑r內(nèi)沒(méi)有點(diǎn)存在時(shí)乾吻,我們將p0復(fù)制為其最近鄰。
We further process features of those neighbors which resides in a 2 * 2 * 2 cube for local pattern description centering at p0.?我們進(jìn)一步處理位于2 * 2 * 2立方體內(nèi)的鄰居的特征拟蜻,以p0為中心進(jìn)行局部模式描述绎签。
Many previous works ignore the structure of data and do max pooling on feature vectors along d dimensions to get new features. However, we believe that ordered operators such as convolution can better exploit the structure of data. Thus we propose orientation-encoding convolution which is a three-stage operator that convolves the 2 * 2 * 2 cube along X, Y , and Z axis successively.?
4.1.2 Scale-awareness
stacking several Orientation-encoding (OE) units in PointSIFT module