2 - YOLO
2.1 Model Details 模型細(xì)節(jié)
First things to know:
- The input is a batch of images of shape (m, 608, 608, 3)
- The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (pc,bx,by,bh,bw,c)(pc,bx,by,bh,bw,c) as explained above. If you expand c into an 80-dimensional vector, each bounding box is then represented by 85 numbers.
We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).
- 輸入是一批數(shù)據(jù)(m,608,608,3)
m為樣本數(shù)量品嚣,圖像分辨率為608x608x3 , 3是通道數(shù)量,代表RGB
- 輸出是一列帶有分類標(biāo)志的向量以躯。每個(gè)邊界框向量由6個(gè)元素組成,如果你把參數(shù)C擴(kuò)展成80維的向量,那么邊界框向量就由85個(gè)數(shù)字元素組成迹炼。
6個(gè)元素分別是辨識(shí)對(duì)象的概率pc顾彰,對(duì)象中心點(diǎn)的橫、縱坐標(biāo)(bx增显、by)雁佳,對(duì)象邊界框的高、寬(bh同云、bw)糖权,還有類別代碼(c)。
我們使用5個(gè)目標(biāo)框炸站,所以YOLO結(jié)構(gòu)最終輸出(m, 19, 19, 5, 85)星澳。
Now, for each box (of each cell) we will compute the following elementwise product and extract a probability that the box contains a certain class.
Here's one way to visualize what YOLO is predicting on an image:
簡單概括,用可能性pc乘以80個(gè)對(duì)象的標(biāo)識(shí)旱易,得到每個(gè)對(duì)象的分?jǐn)?shù)score禁偎,即為算法認(rèn)為此處是該對(duì)象的可能性腿堤。
其中用(bx,by,bh,bw)定位,用c的數(shù)值標(biāo)識(shí)對(duì)象類型如暖。
- For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes).
- Color that grid cell according to what object that grid cell considers the most likely.
Doing this results in this picture:
Note that this visualization isn't a core part of the YOLO algorithm itself for making predictions; it's just a nice way of visualizing an intermediate result of the algorithm.
Another way to visualize YOLO's output is to plot the bounding boxes that it outputs. Doing that results in a visualization like this:
In the figure above, we plotted only boxes that the model had assigned a high probability to, but this is still too many boxes. You'd like to filter the algorithm's output down to a much smaller number of detected objects. To do so, you'll use non-max suppression. Specifically, you'll carry out these steps:
- Get rid of boxes with a low score (meaning, the box is not very confident about detecting a class)
- Select only one box when several boxes overlap with each other and detect the same object.
簡單翻譯一下笆檀,有兩種辦法標(biāo)記出anchor box,
- 第一種盒至,對(duì)于19x19的每一個(gè)網(wǎng)格中的5個(gè)可能的對(duì)象酗洒,把得分最高的那個(gè)用顏色標(biāo)記出來。
- 第二種枷遂,把檢測到的每個(gè)對(duì)象邊界框都畫出來
對(duì)于第二種樱衷,我們雖然只標(biāo)記了可能性較大的對(duì)象,但是仍然還有很多框酒唉,所以我們繼續(xù)做以下工作:
- 放棄那些分?jǐn)?shù)低的標(biāo)記框
- 當(dāng)多個(gè)框重疊標(biāo)記同一個(gè)對(duì)象時(shí)矩桂,只選擇一個(gè)
2.2 - Filtering with a threshold on class scores 依據(jù)scores參數(shù)過濾
You are going to apply a first filter by thresholding. You would like to get rid of any box for which the class "score" is less than a chosen threshold.
The model gives you a total of 19x19x5x85 numbers, with each box described by 85 numbers. It'll be convenient to rearrange the (19,19,5,85) (or (19,19,425)) dimensional tensor into the following variables:
-
box_confidence
: tensor of shape (19 x 19, 5, 1) containing pc (confidence probability that there's some object) for each of the 5 boxes predicted in each of the 19x19 cells. -
boxes
: tensor of shape (19 x 19, 5, 4) containing (b_x, b_y, b_h, b_w) for each of the 5 boxes per cell. -
box_class_probs
: tensor of shape (19 x 19, 5, 80) containing the detection probabilities (c_1, c_2, ... c_{80}) for each of the 80 classes for each of the 5 boxes per cell.
Exercise: Implement yolo_filter_boxes()
.
- Compute box scores by doing the elementwise product as described in Figure 4. The following code may help you choose the right operator:
a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)
- For each box, find:
- Create a mask by using a threshold. As a reminder:
([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4)
returns:[False, True, False, False, True]
. The mask should be True for the boxes you want to keep. - Use TensorFlow to apply the mask to box_class_scores, boxes and box_classes to filter out the boxes we don't want. You should be left with just the subset of boxes you want to keep. (Hint)
Reminder: to call a Keras function, you should use K.function(...)
.
翻譯一下:
- box_confidence :即為19*19的每個(gè)區(qū)域中,生成5個(gè)anchorbox痪伦,每個(gè)anchorbox生成一個(gè)pc
- boxes :即為19*19的每個(gè)區(qū)域中侄榴,生成5個(gè)anchorbox,每個(gè)anchorbox的邊界框
- box_class_probs :19195個(gè)anchorbox中流妻,每個(gè)box的邊界參數(shù)牲蜀,前面已經(jīng)解釋了,4個(gè)參數(shù)各代表什么意義
實(shí)現(xiàn)yolo_filter_boxes():
- 實(shí)現(xiàn)圖片4中的的運(yùn)算绅这,用乘法就行涣达,box_confidence * box_class_probs , 其中box_confidence不足的維度將自動(dòng)擴(kuò)充,所以運(yùn)算結(jié)果是 (19x19x5x80)的向量证薇。
- 對(duì)于每個(gè)anchor box(19x19x5個(gè))度苔,找出:
- 盒子中最大的分?jǐn)?shù)score的類別序號(hào)(即80個(gè)中找到最大的那個(gè))
- 該類別對(duì)應(yīng)的分?jǐn)?shù)socre
- 創(chuàng)造一個(gè)掩碼,這個(gè)掩碼將你想保留的anchorbox設(shè)為true
- 使用TensorFlow對(duì)box_class_probs浑度,boxes和box_classes應(yīng)用掩碼寇窑,將我們不想要的boxes篩選掉,你應(yīng)當(dāng)留下你想留下的boxes子集箩张。
注意甩骏,想使用keras的函數(shù),需要用k.function(...)
# GRADED FUNCTION: yolo_filter_boxes
def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
"""Filters YOLO boxes by thresholding on object and class confidence.
Arguments:
box_confidence -- tensor of shape (19, 19, 5, 1)
boxes -- tensor of shape (19, 19, 5, 4)
box_class_probs -- tensor of shape (19, 19, 5, 80)
threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
Returns:
scores -- tensor of shape (None,), containing the class probability score for selected boxes
boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold.
For example, the actual output size of scores would be (10,) if there are 10 boxes.
"""
# Step 1: Compute box scores
### START CODE HERE ### (≈ 1 line) 算出得分可能性
box_scores = box_confidence * box_class_probs
### END CODE HERE ###
# Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
### START CODE HERE ### (≈ 2 lines)
#獲得最高分?jǐn)?shù)的序號(hào) 19x19x5x1
box_classes = K.argmax(box_scores, axis=-1)
#獲得最高分?jǐn)?shù)的分?jǐn)?shù) 19x19x5x1
box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
### END CODE HERE ###
# Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
### START CODE HERE ### (≈ 1 line)
#將分?jǐn)?shù)大于輸入值threshold的標(biāo)記為true先慷,創(chuàng)造掩碼
filtering_mask = box_class_scores >= threshold
### END CODE HERE ###
# Step 4: Apply the mask to scores, boxes and classes
### START CODE HERE ### (≈ 3 lines) 獲得符合mask最高分?jǐn)?shù)饮笛,該分?jǐn)?shù)所屬對(duì)象的邊界框,該分?jǐn)?shù)所屬對(duì)象類別
scores = tf.boolean_mask(box_class_scores, filtering_mask)
boxes = tf.boolean_mask(boxes, filtering_mask)
classes = tf.boolean_mask(box_classes, filtering_mask)
### END CODE HERE ###
return scores, boxes, classes
2.3 - Non-max suppression 非極大值抑制
Even after filtering by thresholding over the classes scores, you still end up a lot of overlapping boxes. A second filter for selecting the right boxes is called non-maximum suppression (NMS).
Non-max suppression uses the very important function called "Intersection over Union", or IoU.
Exercise: Implement iou(). Some hints:
- In this exercise only, we define a box using its two corners (upper left and lower right):
(x1, y1, x2, y2)
rather than the midpoint and height/width. - To calculate the area of a rectangle you need to multiply its height
(y2 - y1)
by its width(x2 - x1)
. - You'll also need to find the coordinates
(xi1, yi1, xi2, yi2)
of the intersection of two boxes. Remember that:- xi1 = maximum of the x1 coordinates of the two boxes
- yi1 = maximum of the y1 coordinates of the two boxes
- xi2 = minimum of the x2 coordinates of the two boxes
- yi2 = minimum of the y2 coordinates of the two boxes
- In order to compute the intersection area, you need to make sure the height and width of the intersection are positive, otherwise the intersection area should be zero. Use
max(height, 0)
andmax(width, 0)
.
In this code, we use the convention that (0,0) is the top-left corner of an image, (1,0) is the upper-right corner, and (1,1) the lower-right corner.
非極大值抑制這部分论熙,其實(shí)吳恩達(dá)老師在課程里講得很清楚了福青,我簡單翻譯一下:
即使經(jīng)過了用掩碼對(duì)類別得分進(jìn)行過濾,你仍然有許多重疊的邊界框(如圖七),下一個(gè)用來選擇正確邊界框的過濾器被稱作非極大值抑制(NMS)无午。
而非極大值抑制需要用到一個(gè)非常重要的函數(shù)媒役,交并比(IoU,Intersection over Union)宪迟,如圖8酣衷。
練習(xí):實(shí)現(xiàn)iou()函數(shù)
- 僅在此練習(xí)中,我們用兩個(gè)頂點(diǎn)來定義邊界框(x1,y1,x2,y2)踩验,而不是中點(diǎn)和寬高鸥诽。
- 你需要用高
(y2 - y1)
乘以寬(x2 - x1)
來計(jì)算矩形區(qū)域(的面積)商玫。
在這段代碼中箕憾,(0,0)是圖像的左上角坐標(biāo),(1,1)是左下角坐標(biāo)拳昌。
- 你還需要找到兩個(gè)邊界框相交部分的交點(diǎn)
(xi1, yi1, xi2, yi2)
袭异。- xi1 = 兩個(gè)邊界框x1坐標(biāo)(左上角坐標(biāo))的最大值
- yi1 = 兩個(gè)邊界框y1坐標(biāo)(左上角坐標(biāo))的最大值
- xi2 = 兩個(gè)邊界框的x2坐標(biāo)(右下角坐標(biāo))的最小值
- yi2 = 兩個(gè)邊界框的y2坐標(biāo)(右下角坐標(biāo))的最小值
- 為了計(jì)算香蕉區(qū)域,你得確保相交區(qū)域的寬和高為正值炬藤,否則相交區(qū)域就歸零御铃。用
max(height, 0)
和max(width, 0)