對(duì)圖片中的物體進(jìn)行計(jì)數(shù)是一個(gè)非常常見(jiàn)的場(chǎng)景榨呆,尤其是對(duì)人群或者車(chē)輛計(jì)數(shù)凡纳,通過(guò)計(jì)數(shù)我們可以獲得當(dāng)前環(huán)境的流量與擁擠狀況÷伎希現(xiàn)有的人群計(jì)數(shù)方法通称流ⅲ可以分為兩類(lèi):基于檢測(cè)的方法和基于回歸的方法溉仑⊥诤基于目標(biāo)檢測(cè)的方法在密集的小目標(biāo)上效果并不理想,因此很多研究采用了基于像素回歸的方法進(jìn)行計(jì)數(shù)浊竟。本文實(shí)現(xiàn)了一個(gè)基于Keras的MSCNN人群計(jì)數(shù)模型挪圾。
github:https://github.com/xiaochus/MSCNN
paper:Multi-scale convolutional neural network for crowd counting
環(huán)境
- Python 3.6
- Keras 2.2.2
- Tensorflow-gpu 1.8.0
- OpenCV 3.4
數(shù)據(jù)
實(shí)驗(yàn)數(shù)據(jù)采用Mall Dataset crowd counting dataset,該數(shù)據(jù)庫(kù)包括jpeg格式的視頻幀逐沙,地面實(shí)況哲思,透視標(biāo)準(zhǔn)化特征和透視標(biāo)準(zhǔn)化圖,如下所示:
數(shù)據(jù)處理的代碼如下所示:
1.首先根據(jù)標(biāo)注文件讀入圖像和標(biāo)注吩案。
2.根據(jù)網(wǎng)絡(luò)輸入輸出大小處理標(biāo)注文件棚赔。
3.將人群位置映射為密度圖,其中密度圖使用了高斯濾波處理徘郭。
def read_annotations():
"""read annotation data.
Returns:
count: ndarray, head count.
position: ndarray, coordinate.
"""
data = sio.loadmat('data\\mall_dataset\\mall_gt.mat')
count = data['count']
position = data['frame'][0]
return count, position
def map_pixels(img, image_key, annotations, size):
"""map annotations to density map.
Arguments:
img: ndarray, img.
image_key: int, image_key.
annotations: ndarray, annotations.
size: resize size.
Returns:
pixels: ndarray, density map.
"""
gaussian_kernel = 15
h, w = img.shape[:-1]
sh, sw = size / h, size / w
pixels = np.zeros((size, size))
for a in annotations[image_key][0][0][0]:
x, y = int(a[0] * sw), int(a[1] * sh)
if y >= size or x >= size:
print("{},{} is out of range, skipping annotation for {}".format(x, y, image_key))
else:
pixels[y, x] += 1
pixels = cv2.GaussianBlur(pixels, (gaussian_kernel, gaussian_kernel), 0)
return pixels
def get_data(i, size, annotations):
"""get data accoding to the image_key.
Arguments:
i: int, image_key.
size: int, input shape of network.
annotations: ndarray, annotations.
Returns:
img: ndarray, img.
density_map: ndarray, density map.
"""
name = 'data\\mall_dataset\\frames\\seq_{}.jpg'.format(str(i + 1).zfill(6))
img = cv2.imread(name)
density_map = map_pixels(img, i, annotations, size // 4)
img = cv2.resize(img, (size, size))
img = img / 255.
density_map = np.expand_dims(density_map, axis=-1)
return img, density_map
密度圖還要使用高斯濾波處理是因?yàn)樵诳臻g中計(jì)數(shù)時(shí)靠益,每個(gè)人只占一個(gè)像素點(diǎn)導(dǎo)致最終得到的密度分布圖特別稀疏,會(huì)導(dǎo)致模型收斂到全0狀態(tài)残揉。因此通過(guò)高斯處理后胧后,密度圖呈現(xiàn)出熱力圖的形式,一定程度上解決了稀疏問(wèn)題抱环。而且高斯處理后的密度圖壳快,總計(jì)數(shù)是不變的。
處理過(guò)的輸入圖像以及其對(duì)應(yīng)的密度圖如下所示:
模型
模型的整體如下圖所示镇草,是一個(gè)比較簡(jiǎn)單的端對(duì)端網(wǎng)絡(luò)眶痰。
針對(duì)圖像中的目標(biāo)都是小目標(biāo)的問(wèn)題,作者借鑒了Inception模型提出了一個(gè)Multi-Scale Blob (MSB) 結(jié)構(gòu)梯啤,用來(lái)增強(qiáng)特征的多樣性竖伯。
論文中給出的網(wǎng)絡(luò)結(jié)構(gòu)如下所示:
實(shí)現(xiàn)
基于Keras實(shí)現(xiàn)這個(gè)網(wǎng)絡(luò)結(jié)構(gòu):
# -*- coding: utf-8 -*-
from keras.layers import Input, Conv2D, MaxPooling2D, concatenate, Activation
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2
from keras.utils.vis_utils import plot_model
def MSB(filters):
"""Multi-Scale Blob.
Arguments:
filters: int, filters num.
Returns:
f: function, layer func.
"""
params = {'activation': 'relu', 'padding': 'same',
'kernel_regularizer': l2(5e-4)}
def f(x):
x1 = Conv2D(filters, 9, **params)(x)
x2 = Conv2D(filters, 7, **params)(x)
x3 = Conv2D(filters, 5, **params)(x)
x4 = Conv2D(filters, 3, **params)(x)
x = concatenate([x1, x2, x3, x4])
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
return f
def MSCNN(input_shape):
"""Multi-scale convolutional neural network for crowd counting.
Arguments:
input_shape: tuple, image shape with (w, h, c).
Returns:
model: Model, keras model.
"""
inputs = Input(shape=input_shape)
x = Conv2D(64, 9, activation='relu', padding='same')(inputs)
x = MSB(4 * 16)(x)
x = MaxPooling2D()(x)
x = MSB(4 * 32)(x)
x = MSB(4 * 32)(x)
x = MaxPooling2D()(x)
x = MSB(3 * 64)(x)
x = MSB(3 * 64)(x)
x = Conv2D(1000, 1, activation='relu', kernel_regularizer=l2(5e-4))(x)
x = Conv2D(1, 1, activation='relu')(x)
model = Model(inputs=inputs, outputs=x)
return model
if __name__ == '__main__':
model = MSCNN((224, 224, 3))
print(model.summary())
plot_model(model, to_file='images\model.png', show_shapes=True)
實(shí)驗(yàn)
在項(xiàng)目里通過(guò)下列命令訓(xùn)練模型:
python train.py --size 224 --batch 16 --epochs 10
由于目前沒(méi)有足量的計(jì)算資源使用,我們對(duì)模型做了一個(gè)初步訓(xùn)練測(cè)試效果。
下面是測(cè)試集中相同場(chǎng)景的圖片進(jìn)行測(cè)試的結(jié)果七婴,真實(shí)的count是30祟偷,預(yù)測(cè)的count是27,結(jié)果大致接近打厘。而且其人群密度圖也與圖片中的真實(shí)人群分布對(duì)應(yīng)修肠。
下面是隨便找了一張背景和角度都不同的人群圖,可以看出預(yù)測(cè)出現(xiàn)了較大的偏差婚惫。這是因?yàn)橛?xùn)練集較為單一的緣故,想要得到針對(duì)真實(shí)場(chǎng)景的模型魂爪,需要一個(gè)多樣性的訓(xùn)練集先舷。
count:24
count:31