1. 官方標(biāo)記文件
<annotation>
<folder>VOC2007</folder>
<filename>000007.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>194179466</flickrid>
</source>
<owner>
<flickrid>monsieurrompu</flickrid>
<name>Thom Zemanek</name>
</owner>
<size>
<width>500</width>
<height>333</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>car</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>141</xmin>
<ymin>50</ymin>
<xmax>500</xmax>
<ymax>330</ymax>
</bndbox>
</object>
</annotation>
- bndbox: [xmin, ymin, xmax, ymax] , 值為absolute值豺谈, 具體讀取方法參考github SSD Pytorch 的實(shí)現(xiàn)。
- 一般的檢測(cè),默認(rèn)的是當(dāng) difficult==1 時(shí)拷橘, 該bndbox不作為訓(xùn)練的ground truth术幔。
SSD Pytorch
1.1 送入 data augmentation 的格式
image:
- dtype=np.uint8
- shape = (height, width, channel)
heitht, width 保持圖片原尺寸
color channel : RGB
code:
img = cv2.imread()
img = img[:, :, (2, 1, 0)]
cv2.imread() 默認(rèn)讀取數(shù)據(jù)格式:
- dtype=np.uint8 (0-255)
- img: (height, width, channels)
- colar format : BGR channels(b, g, r)
通過
img = img[:, :, (2, 1, 0)]
將color channel 從BGR 轉(zhuǎn)到 RGB
Box:
[
[xmin1, ymin1, xmax1, ymax1, label1],
[xmin2, ymin2, xmax2, yamx2, label2],
......
]
value:
- coordinate : [0~1]
- label: [0~19]
ssd pytorch data augmentation
ConvertFromInts(),
ToAbsoluteCoords(),
PhotometricDistort(),
Expand(self.mean),
RandomSampleCrop(),
RandomMirror(),
ToPercentCoords(),
Resize(self.size),
SubtractMeans(self.mean)
data augmentation的輸出數(shù)據(jù)格式
image
image:
- dtype=np.float32
- shape = (height, width, channel)
heitht, width = (300, 300) 或者(512筝家, 512) 指定的網(wǎng)絡(luò)輸入圖片尺寸
color channel : RGB
Box:
[
[xmin1, ymin1, xmax1, ymax1, label1],
[xmin2, ymin2, xmax2, yamx2, label2],
......
]
value:
- coordinate : [0~1]
- label: [0~19]