使用OpenCv+ENet實(shí)現(xiàn)語義分割

效果圖

轉(zhuǎn)載自 https://www.pyimagesearch.com/2018/09/03/semantic-segmentation-with-opencv-and-deep-learning/

介紹

在本教程中挡逼，您將學(xué)習(xí)如何使用OpenCV族阅，深度學(xué)習(xí)和ENet架構(gòu)執(zhí)行語義分割乌逐。閱讀本教程后峦朗，您將能夠使用OpenCV對(duì)圖像和視頻應(yīng)用語義分割公给。深度學(xué)習(xí)有助于增加計(jì)算機(jī)視覺的前所未有的準(zhǔn)確性，包括圖像分類姐帚，對(duì)象檢測衷旅，現(xiàn)在甚至是分割器一。傳統(tǒng)分割涉及將圖像分割成若干模塊（Normalized Cuts, Graph Cuts, Grab Cuts, superpixels,等）; 但是课锌，算法并沒有真正理解這些部分代表什么。

另一方面盹舞，語義分割算法會(huì)作如下的工作：

1产镐、分割圖像劃分成有意義的部分
2、同時(shí)踢步，將輸入圖像中的每個(gè)像素與類標(biāo)簽（即人癣亚，道路，汽車获印，公共汽車等）相關(guān)聯(lián)述雾。

語義分割算法很強(qiáng)大，有很多用例兼丰，包括自動(dòng)駕駛汽車 - 在今天的文章中玻孟，我將向您展示如何將語義分割應(yīng)用于道路場景圖像以及視頻尊惰！

OpenCV和深度學(xué)習(xí)的語義分割

在這篇文章中蛹锰，我們將討論ENet深度學(xué)習(xí)框架，并且演示如何使用ENet對(duì)圖像和視頻流進(jìn)行語義分割腻要。

ENet語義分割框架

在本教程中我們將使用的語義分割框架是ENet艳丛，他是基于Paszke等人的2016年的論文Net：ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation：

Abstract: ...In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models. ...（大概就是速度提高了18倍匣掸，參數(shù)減少了79倍，然后精度更高速度更快）氮双。

一個(gè)正向傳播在我的（垃圾）筆記本CPU（i5-6200）上花費(fèi)了0.5S左右的時(shí)間碰酝，如果使用GPU將更快。Paszke等人在The Cityscapes Dataset訓(xùn)練了他們的數(shù)據(jù)集戴差，你可以根據(jù)需求選擇你需要的數(shù)據(jù)集進(jìn)行訓(xùn)練送爸。并且這個(gè)數(shù)據(jù)集還帶有用于城市場景理解的圖像示例。

我們使用訓(xùn)練了20種類的模型暖释，包括：

Unlabeled (i.e., background)
Road
Sidewalk
Building
Wall
Fence
Pole
TrafficLight
TrafficSign
Vegetation
Terrain
Sky
Person
Rider
Car
Truck
Bus
Train
Motorcycle
Bicycle

接下來袭厂，您將學(xué)習(xí)如何應(yīng)用語義分段來提取圖像和視頻流中每個(gè)類別，像素之間的映射關(guān)系饭入。如果您有興趣訓(xùn)練自己的ENet模型以便在自己的自定義數(shù)據(jù)集上進(jìn)行分割嵌器，可以參考此頁面，作者已提供了有關(guān)如何進(jìn)行訓(xùn)練的教程谐丢。

工程結(jié)構(gòu)

若需要工程源碼可以直接在下方留言郵箱或者公眾號(hào)留言郵箱爽航。
下面讓我們在工程目錄下面運(yùn)行 tree：

.
├── enet-cityscapes
│   ├── enet-classes.txt
│   ├── enet-colors.txt
│   └── enet-model.net
├── images
│   ├── example_01.png
│   ├── example_02.jpg
│   ├── example_03.jpg
│   └── example_04.png
├── output
│   └── massachusetts_output.avi
├── segment.py
├── segment.pyc
├── segment_video.py
└── videos
    ├── massachusetts.mp4
    └── toronto.mp4

4 directories, 13 files

工程包括四個(gè)目錄：

enet-cityscapes/: 包含了訓(xùn)練好了的深度學(xué)習(xí)模型，顏色列表乾忱，顏色labels讥珍。
images/: 包含四個(gè)測試用的圖片。
output/: 生成的輸出視頻窄瘟。
videos/: 包含了兩個(gè)示例視頻用于測試我們的程序衷佃。

接下來，我們將分析兩個(gè)python腳本：

segment.py: 對(duì)單個(gè)圖片進(jìn)行深度學(xué)習(xí)語義分割蹄葱，我們將首先在單個(gè)圖像進(jìn)行測試然后再將其運(yùn)用到視頻中氏义。
segment_video.py: 對(duì)視頻進(jìn)行語義分割锄列。

使用OpenCv對(duì)圖像進(jìn)行語義分割：

# import the necessary packages

import numpy as np
import argparse
import imutils
import time
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True, 
    help="path to deep learning segmentation model")
ap.add_argument("-c", "--classes", required=True, 
    help="path to .txt file containing class labels")
ap.add_argument("-i", "--image", required=True,
    help="path to input image")
ap.add_argument("-l", "--colors", type=str,
    help="path to .txt file containing colors for labels")
ap.add_argument("-w", "--width", type=int, default=500,
    help="desired width (in pixels) of input image")
args = vars(ap.parse_args())

首先我們需要導(dǎo)入相應(yīng)的包，并且設(shè)置相應(yīng)的參數(shù)：

numpy Python 科學(xué)計(jì)算基礎(chǔ)包惯悠。
argparse: python的一個(gè)命令行解析包邻邮。
imutils: Python圖像操作函數(shù)庫,提供一系列的便利功能。
time: Time access and conversions克婶。
cv2 ：建議安裝3.4+的版本筒严。

接下來讓我們解析類標(biāo)簽文件和顏色：

# load the class label names
CLASSES = open(args["classes"]).read().strip().split("\n")
 
# if a colors file was supplied, load it from disk
if args["colors"]:
    COLORS = open(args["colors"]).read().strip().split("\n")
    COLORS = [np.array(c.split(",")).astype("int") for c in COLORS]
    COLORS = np.array(COLORS, dtype="uint8")
 
# otherwise, we need to randomly generate RGB colors for each class
# label
else:
    # initialize a list of colors to represent each class label in
    # the mask (starting with 'black' for the background/unlabeled
    # regions)
    np.random.seed(42)
    COLORS = np.random.randint(0, 255, size=(len(CLASSES) - 1, 3),
        dtype="uint8")
    COLORS = np.vstack([[0, 0, 0], COLORS]).astype("uint8")

首先將CLASSES加載到內(nèi)存中，如果我們提供了每一個(gè)類別的標(biāo)簽的COLORS情萤，那么我們就將其加載到內(nèi)存; 若沒有則為每一個(gè)標(biāo)簽隨機(jī)生成 COLORS鸭蛙。
為了更好的可視化，我們使用OpenCv繪制一個(gè)顏色和類別的圖列（legend）：

# initialize the legend visualization
legend = np.zeros(((len(CLASSES) * 25) + 25, 300, 3), dtype="uint8")
 
# loop over the class names + colors
for (i, (className, color)) in enumerate(zip(CLASSES, COLORS)):
    # draw the class name + color on the legend
    color = [int(c) for c in color]
    cv2.putText(legend, className, (5, (i * 25) + 17),
        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
    cv2.rectangle(legend, (100, (i * 25)), (300, (i * 25) + 25),
        tuple(color), -1)

如圖左邊所示為所繪制的legend 的效果圖：

效果圖

然后我們將深度學(xué)習(xí)分割應(yīng)用于圖像：

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNet(args["model"])
 
# load the input image, resize it, and construct a blob from it,
# but keeping mind mind that the original input image dimensions
# ENet was trained on was 1024x512
image = cv2.imread(args["image"])
image = imutils.resize(image, width=args["width"])
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (1024, 512), 0,
    swapRB=True, crop=False)
 
# perform a forward pass using the segmentation model
net.setInput(blob)
start = time.time()
output = net.forward()
end = time.time()
 
# show the amount of time inference took
print("[INFO] inference took {:.4f} seconds".format(end - start))

上面這段代碼筋岛，使用Python和OpenCv對(duì)圖像進(jìn)行語義分割：

cv2.dnn.readNet(): 記載模型娶视。
構(gòu)建一個(gè) blob：由于我們訓(xùn)練的ENet模型的輸入圖像的大小為1024X512因此這里應(yīng)該使用相同的大小。
將blob輸入到網(wǎng)絡(luò)中睁宰，并且通過這個(gè)神經(jīng)網(wǎng)絡(luò)執(zhí)行一個(gè) forward pass歇万，并且輸出使用的時(shí)間。

可視化我們的結(jié)果

最后我們需要可視化我們的結(jié)果：

在程序的其余行中勋陪，我們將生成一個(gè)顏色蒙層以覆蓋原始圖像贪磺。每個(gè)像素都有一個(gè)相應(yīng)的類標(biāo)簽索引，使我們可以看到屏幕上的語義分割結(jié)果诅愚。

# infer the total number of classes along with the spatial dimensions
# of the mask image via the shape of the output array
(numClasses, height, width) = output.shape[1:4]

# our output class ID map will be num_classes x height x width in
# size, so we take the argmax to find the class label with the
# largest probability for each and every (x, y)-coordinate in the
# image

classMap = np.argmax(output[0], axis=0)

# given the class ID map, we can map each of the class IDs to its
# corresponding color

mask = COLORS[classMap]
cv2.imshow("mask", mask)

# resize the mask and class map such that its dimensions match the
# original size of the input image (we're not using the class map
# here for anything else but this is how you would resize it just in
# case you wanted to extract specific pixels/classes)
mask = cv2.resize(mask, (image.shape[1], image.shape[0]),
    interpolation=cv2.INTER_NEAREST)
classMap = cv2.resize(classMap, (image.shape[1], image.shape[0]),
    interpolation=cv2.INTER_NEAREST)

# perform a weighted combination of the input image with the mask to
# form an output visualization
output = ((0.4 * image) + (0.6 * mask)).astype("uint8")

# show the input and output images
cv2.imshow("Legend", legend)
cv2.imshow("Input", image)
cv2.imshow("Output", output)
cv2.waitKey(0)
if cv2.waitKey(1) & 0xFF == ord('q'):
    exit

我們首先是從output中提取出 numClasses, height, width寒锚，然后計(jì)算 classMap和mask。其中 classMap是output的每個(gè)（x违孝，y）坐標(biāo)的最大概率的類標(biāo)簽索引（class label index）刹前。通過 calssMap作為Numpy的數(shù)組索引來找到每個(gè)像素相對(duì)應(yīng)的可視化顏色。
之后就是簡單的尺寸變換以使得尺寸相同雌桑，之后進(jìn)行疊加喇喉。

單個(gè)圖像的結(jié)果：

根據(jù)用法輸入相應(yīng)的命令行參數(shù)，運(yùn)行程序校坑，以下是一個(gè)示例：


python3 segment.py --model enet-cityscapes/enet-model.net --classes enet-cityscapes/enet-classes.txt --colors enet-cityscapes/enet-colors.txt --image images/example_03.jpg

最終得到的結(jié)果：

效果圖

很容易發(fā)現(xiàn)拣技，它可以清晰地分類并準(zhǔn)確識(shí)別人和自行車。確定了道路耍目，人行道膏斤，汽車。邪驮。

在視頻中執(zhí)行語義分割：

這個(gè)部分的代碼位于 segment_video.py中莫辨，首先加載模型，初始化視頻流：

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNet(args["model"])

# initialize the video stream and pointer to output video file
vs = cv2.VideoCapture(args["video"])
writer = None

# try to determine the total number of frames in the video file
try:
    prop =  cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2() \
        else cv2.CAP_PROP_FRAME_COUNT
    total = int(vs.get(prop))
    print("[INFO] {} total frames in video".format(total))

# an error occurred while trying to determine the total
# number of frames in the video file
except:
    print("[INFO] could not determine # of frames in video")
    total = -1

之后讀取視頻流，并且作為網(wǎng)絡(luò)的輸入沮榜，這部分和 segment.py大致相同：

# loop over frames from the video file stream
while True:
    # read the next frame from the file
    (grabbed, frame) = vs.read()

    # if the frame was not grabbed, then we have reached the end
    # of the stream
    if not grabbed:
        break

    # construct a blob from the frame and perform a forward pass
    # using the segmentation model
    frame = imutils.resize(frame, width=args["width"])
    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (1024, 512), 0,
        swapRB=True, crop=False)
    net.setInput(blob)
    start = time.time()
    output = net.forward()
    end = time.time()

    # infer the total number of classes along with the spatial
    # dimensions of the mask image via the shape of the output array
    (numClasses, height, width) = output.shape[1:4]

    # our output class ID map will be num_classes x height x width in
    # size, so we take the argmax to find the class label with the
    # largest probability for each and every (x, y)-coordinate in the
    # image
    classMap = np.argmax(output[0], axis=0)

    # given the class ID map, we can map each of the class IDs to its
    # corresponding color
    mask = COLORS[classMap]

    # resize the mask such that its dimensions match the original size
    # of the input frame
    mask = cv2.resize(mask, (frame.shape[1], frame.shape[0]),
        interpolation=cv2.INTER_NEAREST)

    # perform a weighted combination of the input frame with the mask
    # to form an output visualization
    output = ((0.3 * frame) + (0.7 * mask)).astype("uint8")

然后我們將輸出的視頻流寫入到文件中：

    # check if the video writer is None
    if writer is None:
        # initialize our video writer
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter(args["output"], fourcc, 30,
            (output.shape[1], output.shape[0]), True)

        # some information on processing single frame
        if total > 0:
            elap = (end - start)
            print("[INFO] single frame took {:.4f} seconds".format(elap))
            print("[INFO] estimated total time: {:.4f}".format(
                elap * total))

    # write the output frame to disk
    writer.write(output)

    # check to see if we should display the output frame to our screen
    if args["show"] > 0:
        cv2.imshow("Frame", output)
        key = cv2.waitKey(1) & 0xFF
 
        # if the `q` key was pressed, break from the loop
        if key == ord("q"):
            break

最終的視頻演示可以查看下面的視頻：


python3 segment_video.py --model enet-cityscapes/enet-model.net \
    --classes enet-cityscapes/enet-classes.txt \
    --colors enet-cityscapes/enet-colors.txt \
    --video videos/massachusetts.mp4 \
    --output output/massachusetts_output.avi

最后如何訓(xùn)練自己的模型：

如果你想訓(xùn)練自己的模型盘榨，可以參考 ENet作者提供的教程

總結(jié)

在這個(gè)文章中，我們學(xué)習(xí)了如何使用OpenCV蟆融，深度學(xué)習(xí)和ENet架構(gòu)來應(yīng)用語義分割较曼。在Cityscapes數(shù)據(jù)集上使用預(yù)先訓(xùn)練的ENet模型，我們能夠在自動(dòng)駕駛汽車和道路場景分割的背景下將圖像和視頻流分成20個(gè)類別振愿，包括人（步行和騎自行車），車輛（汽車弛饭，卡車冕末，公共汽車，摩托車等）侣颂，建筑（建筑物档桃，墻壁，圍欄等）憔晒，以及植被藻肄，地形和地面本身。如果您喜歡今天的文章拒担，請(qǐng)分享哦嘹屯！

需要源代碼的朋友可以在下方留言郵箱即可。

參考文獻(xiàn)

十分鐘看懂圖像語義分割技術(shù)
[論文]ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
訓(xùn)練自己的ENet模型
什么是神經(jīng)網(wǎng)絡(luò)中的前向和后向傳遞从撼？

不足之處州弟，敬請(qǐng)斧正; 若你覺得文章還不錯(cuò)，請(qǐng)關(guān)注微信公眾號(hào)“SLAM 技術(shù)交流”繼續(xù)支持我們低零，筆芯：D婆翔。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市掏婶，隨后出現(xiàn)的幾起案子啃奴，更是在濱河造成了極大的恐慌，老刑警劉巖雄妥，帶你破解...
沈念sama閱讀 217,509評(píng)論 6贊 504
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件最蕾，死亡現(xiàn)場離奇詭異，居然都是意外死亡老厌，警方通過查閱死者的電腦和手機(jī)揖膜，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,806評(píng)論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來壹粟，“玉大人趁仙，你說我怎么就攤上這事「缮荩” “怎么了忿峻？”我有些...
開封第一講書人閱讀 163,875評(píng)論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵逛尚，是天一觀的道長刁愿。經(jīng)常有香客問我铣口，道長脑题，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,441評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任停团，我火速辦了婚禮佑稠，結(jié)果婚禮上舌胶，老公的妹妹穿的比我還像新娘幔嫂。我一直安慰自己履恩，他們只是感情好切心，可當(dāng)我...
茶點(diǎn)故事閱讀 67,488評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著协屡，像睡著了一般肤晓。火紅的嫁衣襯著肌膚如雪认然。梳的紋絲不亂的頭發(fā)上卷员，一...
開封第一講書人閱讀 51,365評(píng)論 1贊 302
城市分裂傳說
那天，我揣著相機(jī)與錄音挺峡，去河邊找鬼担钮。笑死，一個(gè)胖子當(dāng)著我的面吹牛狭姨，可吹牛的內(nèi)容都是我干的饼拍。我是一名探鬼主播师抄，決...
沈念sama閱讀 40,190評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼叨吮，長吁一口氣：“原來是場噩夢啊……” “哼瞬矩！你這毒婦竟也來了景用？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,062評(píng)論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤剿干，失蹤者是張志新（化名）和其女友劉穎置尔，沒想到半個(gè)月后氢伟，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體谬盐，經(jīng)...
沈念sama閱讀 45,500評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡飞傀，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,706評(píng)論 3贊 335
?白月光啟示錄
正文我和宋清朗相戀三年砸烦，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片家破。...
茶點(diǎn)故事閱讀 39,834評(píng)論 1贊 347
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡门粪，死狀恐怖庄拇，靈堂內(nèi)的尸體忽然破棺而出措近，到底是詐尸還是另有隱情女淑，我是刑警寧澤鸭你，帶...
沈念sama閱讀 35,559評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站碳抄，受9級(jí)特大地震影響场绿，放射性物質(zhì)發(fā)生泄漏剖效。R本人自食惡果不足惜璧尸，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,167評(píng)論 3贊 328
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望蛀序。院中可真熱鬧徐裸，春花似錦倦逐、人聲如沸宫补。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,779評(píng)論 0贊 22
一樁弒父案曾我，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽粉怕。三九已至，卻和暖如春抒巢，著一層夾襖步出監(jiān)牢的瞬間贫贝，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,912評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工蛉谜，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留稚晚，地道東北人。一個(gè)月前我還...
沈念sama閱讀 47,958評(píng)論 2贊 370
代替公主和親
正文我出身青樓型诚，卻偏偏與公主長得像客燕，于是被迫代替她去往敵國和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子狰贯，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,779評(píng)論 2贊 354

使用OpenCv+ENet實(shí)現(xiàn)語義分割