MTCNN解讀 - 簡書

1. 整體流程

將圖像按照特定的比例resize成多個尺度下的圖像
P-Net（Proposal Net）

[圖片上傳失敗...(image-725dd8-1597038245948)]
- 對于步驟1中的每一個尺度的圖像都輸入P-Net，輸出一個降采樣一倍的網(wǎng)格，網(wǎng)格中帶有每個位置可能存在的bounding box proposal拥诡，包括是否有人臉和位置回歸信息椅野。原論文中還會輸出關(guān)鍵點的proposal，但是在后續(xù)的實現(xiàn)中都將這一部分放在最后一個Net中實現(xiàn)榜旦。
- 以原始圖片為200x400為例幽七，首先由縮放因子0.5縮放至輸入圖片為100x200，經(jīng)過PNet之后輸出網(wǎng)格大小為50x100溅呢，網(wǎng)格中每一個cell會輸出該點位置對應(yīng)的是否有人臉（onehot*2）澡屡，以及該cell對應(yīng)的回歸框的偏移
- 這里的偏移是相對于網(wǎng)格坐標映射到原圖（200x400）上的偏移猿挚，每一個cell會自帶一個框的尺度，這個尺度和圖片的縮放尺度相關(guān)挪蹭，比如設(shè)置成12/縮放因子亭饵。
- 從偏移到原圖框坐標的代碼如下：
  
  <pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="c++" cid="n118" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-size: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; padding: 8px 4px 6px 0px; margin-bottom: 15px; margin-top: 15px; width: inherit; background-position: inherit inherit; background-repeat: inherit inherit;"> void MTCNN::generateBbox(cv::Mat score, cv::Mat location, std::vector<Bbox>& boundingBox_, float scale)
  {
  const int stride = 2; // 表示PNet對輸入圖片的降采樣
  const int cellsize = 12; // 預(yù)設(shè)的框尺度大小
  ?
  int sc_rows, sc_cols;
  if ( 4 == score.dims)
  {
  sc_rows = score.size[2]; // 網(wǎng)格行數(shù)
  sc_cols = score.size[3]; // 網(wǎng)格列數(shù)
  }
  ?
  float* p = (float *)score.data + sc_rows * sc_cols;
  float inv_scale = 1.0f / scale;
  for(int row = 0; row < sc_rows; row++)
  {
  for(int col = 0; col < sc_cols; col++)
  {
  Bbox bbox;
  if( *p > threshold[0] )
  {
  bbox.score = p;
  // 下面四行可以看作是anchor box
  bbox.x1 = round((stride * col + 1) * inv_scale);
  bbox.y1 = round((stride * row + 1) * inv_scale);
  bbox.x2 = round((stride * col + 1 + cellsize) * inv_scale);
  bbox.y2 = round((stride * row + 1 + cellsize) * inv_scale);
  const int index = row * sc_cols + col;
  for(int channel = 0;channel < 4; channel++)
  {
  float tmp = (float *)(location.data) + channel * sc_rows * sc_cols;
  bbox.regreCoord[channel] = tmp[index]; // anchor + offset
  }
  boundingBox_.push_back(bbox);
  }
  p++;
  }
  }
  ?
  return;
  }</pre>
- 通過邊界框回歸對所有的BBox的坐標進行refine。集合得到第一階段的Proposals梁厉。refine代碼如下：
  
  <pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="cpp" cid="n123" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-size: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; padding: 8px 4px 6px 0px; margin-bottom: 15px; margin-top: 15px; width: inherit; background-position: inherit inherit; background-repeat: inherit inherit;"> void MTCNN::refine(std::vector<Bbox>& vecBbox, const int& height, const int& width, bool square)
  {
  if (vecBbox.empty())return;
  
  float bbw = 0, bbh = 0, max_side = 0;
  float h = 0, w = 0;
  float x1 = 0, x2 = 0, y1 = 0, y2 = 0;
  ?
  for (auto it = vecBbox.begin(); it != vecBbox.end(); it++)
  {
  bbw = it->x2 - it->x1 + 1;
  bbh = it->y2 - it->y1 + 1;
  ?
  x1 = it->x1 + bbw * it->regreCoord[1];
  y1 = it->y1 + bbh * it->regreCoord[0];
  x2 = it->x2 + bbw * it->regreCoord[3];
  y2 = it->y2 + bbh * it->regreCoord[2];
  ?
  if(square)
  {
  w = x2 - x1 + 1;
  h = y2 - y1 + 1;
  int maxSide = ( h > w ) ? h:w;
  x1 = x1 + w * 0.5 - maxSide * 0.5;
  y1 = y1 + h * 0.5 - maxSide * 0.5;
  x2 = round(x1 + maxSide - 1);
  y2 = round(y1 + maxSide - 1);
  x1 = round(x1);
  y1 = round(y1);
  }
  ?
  it->x1 = x1 < 0 ? 0 : x1;
  it->y1 = y1 < 0 ? 0 : y1;
  it->x2 = x2 >= width ? width - 1 : x2;
  it->y2 = y2 >= height ? height - 1 : y2;
  }
  }</pre>
- 整體流程：
  
  <pre mdtype="fences" cid="n156" lang="cpp" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-size: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; padding: 8px 4px 6px 0px; margin-bottom: 15px; margin-top: 15px; width: inherit; background-position: inherit inherit; background-repeat: inherit inherit;"> void MTCNN::detectInternal(cv::Mat& img_, std::vector<Bbox>& finalBbox_)
  {
  const float nms_threshold[3] = {0.7f, 0.7f, 0.7f};
  ?
  img = img_;
  PNet();
  if ( !firstBbox_.empty())
  {
  nms(firstBbox_, nms_threshold[0]);
  refine(firstBbox_, img_.rows, img_.cols, true);
  ?
  RNet();
  if( !secondBbox_.empty())
  {
  nms(secondBbox_, nms_threshold[1]);
  refine(secondBbox_, img_.rows, img_.cols, true);
  ?
  ONet();
  if ( !thirdBbox_.empty())
  {
  refine(thirdBbox_, img_.rows, img_.cols, false);
  ?
  std::string ts = "Min";
  nms(thirdBbox_, nms_threshold[2], ts);
  }
  }
  }
  finalBbox_ = thirdBbox_;
  thirdBbox_.clear();
  }</pre>
- O-Net（output net）
  
  [圖片上傳失敗...(image-1246b1-1597038245942)]
  - 對于3中得到的粗BBox辜羊，再輸入網(wǎng)絡(luò)獲得一個refine過的人臉分類、邊界框回歸和關(guān)鍵點坐標词顾。通過該網(wǎng)絡(luò)結(jié)果對bbox進行refine八秃，再通過最終的NMS得到最終結(jié)果。
- R-Net（Refine Net）
  
  [圖片上傳失敗...(image-4aef89-1597038245942)]
  - 對于2中得到的每一個proposal肉盹，從原圖中按照bbox將圖像摳出來并resize成固定大小輸入R-Net昔驱，輸出人臉分類、邊界框回歸和5個關(guān)鍵點坐標上忍。
  - 這里有一個疑惑骤肛，輸入圖片是一個patch，而邊界框的坐標信息是一個全圖的全局信息窍蓝，這是怎么回歸出來的呢腋颠？
  - NMS之后，通過該網(wǎng)絡(luò)結(jié)果對bbox進行refine吓笙。
- 將每個尺度下得到bounding boxes分別進行NMS之后淑玫，再對所有尺度下的結(jié)果進行NMS