人臉識(shí)別是計(jì)算機(jī)視覺一個(gè)很重要的領(lǐng)域,本文實(shí)現(xiàn)了一個(gè)基于卷積神經(jīng)網(wǎng)絡(luò)的人臉識(shí)別程序剃袍,能夠識(shí)別攝像頭中指定的人臉。
參考:
how i implemented iphone xs faceid using deep learning in python
Github:https://github.com/xiaochus/FaceRecognition
環(huán)境
- Python 3.6
- Tensorflow-gpu 1.5.0
- Keras 2.1.3
- OpenCV 3.4
- Scikit-learn 0.19
模型
特征提取
訓(xùn)練模型主要由兩部分組成息堂,如下圖脐区。其中主要的部分是特征提取網(wǎng)絡(luò)(即model_1),其接收一個(gè)(64, 64, 3)的張量登澜,輸出一個(gè)(128,)的張量阔挠,這一部分我們使用一個(gè)簡(jiǎn)化的MobileNetV2實(shí)現(xiàn)。其主要作用是提取一個(gè)人臉的特征脑蠕。
其次就是雙生網(wǎng)絡(luò)购撼,我們?cè)谔卣魈崛【W(wǎng)絡(luò)的基礎(chǔ)上,輸入成對(duì)的數(shù)據(jù)(input_1和input_2)谴仙,分別計(jì)算出他們的特征迂求,最后求出特征之間的歐式距離(lambda_1)。其主要作用是使得相似的輸入盡量提取到相似的特征晃跺。
需要注意的是揩局,雖然有兩個(gè)輸入,但是他們之間并不會(huì)相互連接掀虎,也不會(huì)單獨(dú)對(duì)網(wǎng)絡(luò)參數(shù)進(jìn)行調(diào)整凌盯。可以理解為兩個(gè)輸入分別通過(guò)一次網(wǎng)絡(luò)求的特征烹玉,最后根據(jù)兩個(gè)特征之間的距離來(lái)計(jì)算損失驰怎。
keras使用共享層的概念來(lái)實(shí)現(xiàn)這個(gè)功能, 其本質(zhì)是層的節(jié)點(diǎn)二打。無(wú)論何時(shí)县忌,當(dāng)你在某個(gè)輸入上調(diào)用層時(shí),你就創(chuàng)建了一個(gè)新的張量(即該層的輸出)继效,同時(shí)你也在為這個(gè)層增加一個(gè)“(計(jì)算)節(jié)點(diǎn)”芹枷。這個(gè)節(jié)點(diǎn)將輸入張量映射為輸出張量。當(dāng)你多次調(diào)用該層時(shí)莲趣,這個(gè)層就有了多個(gè)節(jié)點(diǎn)鸳慈,其下標(biāo)分別為0,1喧伞,2...
因?yàn)樵诤竺娴奶卣魈崛〉娜蝿?wù)中走芋,我們不需要對(duì)比與距離,只需要中間的特征提取模型潘鲫,因此我們可以將其提取出來(lái)翁逞。
def get_feature_model():
"""Get face features extraction model.
# Returns
feat_model: Model, face features extraction model.
"""
model = get_model((64, 64, 3))
model.load_weights('model/weight.h5')
feat_model = Model(inputs=model.get_layer('model_1').get_input_at(0),
outputs=model.get_layer('model_1').get_output_at(0))
return feat_model
對(duì)比損失
為了使模型能夠有效的提取特征,采用的損失函數(shù)是對(duì)比損失(contrastive loss)溉仑,這種損失函數(shù)可以有效的處理成對(duì)數(shù)據(jù)的關(guān)系挖函,其表達(dá)式如下(y代表是否相似, d代表輸出的歐式距離):
這種損失函數(shù)最初來(lái)源于Yann LeCun的Dimensionality Reduction by Learning an Invariant Mapping浊竟,主要是用在降維中怨喘。即本來(lái)相似的樣本津畸,在經(jīng)過(guò)降維后,在特征空間中兩個(gè)樣本仍舊相似必怜;而原本不相似的樣本肉拓,在經(jīng)過(guò)降維后,在特征空間中兩個(gè)樣本仍舊不相似梳庆。
當(dāng)y=1(即樣本相似)時(shí)暖途,損失函數(shù)只剩下左邊的部分,即相似樣本的歐式距離平方和的均值膏执。如果損失值比較大驻售,說(shuō)明相似樣本之間的特征的歐式距離較大。而當(dāng)y=0時(shí)(即樣本不相似)時(shí)更米,損失函數(shù)只剩下右邊的部分欺栗,即不相似樣本的歐式距離的反值。如果損失值比較大壳快,說(shuō)明不相似樣本的特征之間的歐式距離比較小。這樣的組合損失正好能夠符合我們的任務(wù)镇草。
數(shù)據(jù)處理
我們使用Face Recognition Data - grimace (University of Essex, UK)數(shù)據(jù)庫(kù)作為訓(xùn)練和測(cè)試數(shù)據(jù)眶痰。
read_img()函數(shù)用于讀入每個(gè)人的圖片數(shù)據(jù)。
get_paris()函數(shù)用于對(duì)讀入的人臉進(jìn)行配對(duì)梯啤,隨機(jī)搭配為同一個(gè)人與不同的人竖伯。
create_generator()用于將輸入的數(shù)據(jù)生成一個(gè)生成器,用于訓(xùn)練因宇。
get_train_test()用于將數(shù)據(jù)打亂并按照3:1劃分為訓(xùn)練集和測(cè)試集七婴。
"""Data process.
Data process and generation.
"""
import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split
def read_img(path):
"""Read image
This function read images from folders for different person.
# Arguments
path: String, path of database.
# Returns
res: List, images for different person.
"""
res = []
for (root, dirs, files) in os.walk(path):
if files:
tmp = []
files = np.random.choice(files, 4)
for f in files:
img = os.path.join(root, f)
image = cv2.imread(img)
image = cv2.resize(image, (64, 64),
interpolation=cv2.INTER_CUBIC)
image = np.array(image, dtype='float32')
image /= 255.
tmp.append(image)
res.append(tmp)
return res
def get_paris(path):
"""Make pairs.
This function make pairs for same person and different person.
# Arguments
path: String, path of database.
# Returns
sm1: List, first object in pairs.
sm2: List, second object in pairs.
y1: List, pairs mark (same: 0, different: 1).
"""
sm1, sm2, df1, df2 = [], [], [], []
res = read_img(path)
persons = len(res)
for i in range(persons):
for j in range(i, persons):
p1 = res[i]
p2 = res[j]
if i == j:
for pi in p1:
for pj in p2:
sm1.append(pi)
sm2.append(pj)
else:
df1.append(p1[0])
df2.append(p2[0])
df1 = df1[:len(sm1)]
df2 = df2[:len(sm2)]
y1 = list(np.zeros(len(sm1)))
y2 = list(np.ones(len(df1)))
sm1.extend(df1)
sm2.extend(df2)
y1.extend(y2)
return sm1, sm2, y1
def create_generator(x, y, batch):
"""Create data generator.
This function is a data generator.
# Arguments
x: List, Input data.
y: List, Data label.
batch: Integer, batch size for data generator.
# Returns
[x1, x2]: List, pairs data with batch size.
yb: List, Data label.
"""
while True:
index = np.random.choice(len(y), batch)
x1, x2, yb = [], [], []
for i in index:
x1.append(x[i][0])
x2.append(x[i][1])
yb.append(y[i])
x1 = np.array(x1)
x2 = np.array(x2)
yield [x1, x2], yb
def get_train_test(path):
"""Get train and test data
This function split train and test data and shuffle it.
# Arguments
path: String, path of database.
# Returns
X_train: List, Input data for train.
X_test: List, Data label for train.
y_train: List, Input data for test.
y_test: List, Data label for test.
"""
im1, im2, y = get_paris(path)
im = list(zip(im1, im2))
X_train, X_test, y_train, y_test = train_test_split(
im, y, test_size=0.33)
return X_train, X_test, y_train, y_test
實(shí)驗(yàn)
運(yùn)行下列命令來(lái)訓(xùn)練模型。
python train.py
運(yùn)行下列命令來(lái)可視化實(shí)驗(yàn)察滑。
python vis.py
因?yàn)閿?shù)據(jù)集比較小并且姿態(tài)等比較單一打厘,模型訓(xùn)練了50個(gè)epochs后其訓(xùn)練損失與評(píng)估損失基本接近平穩(wěn)。
從數(shù)據(jù)集中隨機(jī)選擇幾個(gè)人贺辰,對(duì)每個(gè)人的20張照片進(jìn)行特征提取户盯,然后通過(guò)t-SNE將他們映射到2維空間上,結(jié)果如下圖饲化。每個(gè)顏色代表一個(gè)人莽鸭,可以看出相同人的照片映射的特征明顯聚集在一起,說(shuō)明模型能夠使同一個(gè)人的人臉特征盡可能的靠近吃靠。
使用不同于訓(xùn)練集的數(shù)據(jù)進(jìn)行模型評(píng)估硫眨,我們使用圖片0作為基準(zhǔn),圖片1是是基準(zhǔn)的另外一張照片巢块,剩下的都是不同的人礁阁。
他們之間的歐式距離計(jì)算結(jié)果如下巧号,可以看出不同人之間人臉的特征距離明顯大于同一個(gè)人的人臉特征距離。
特征距離:
[0.05845242, 0.44077098, 0.1820661, 0.6669458, 0.090522714]
從攝像頭中識(shí)別指定人臉
程序主要有兩個(gè)重要的部分:人臉的檢測(cè)跟指定人臉的識(shí)別氮兵。
人臉檢測(cè)
我們使用OpenCV內(nèi)置的兩種模型來(lái)進(jìn)行人臉檢測(cè)裂逐,分別是 haar cascade classifier 和SSD 300。通過(guò)構(gòu)建檢測(cè)器類時(shí)輸入的type變量來(lái)指定使用哪一種檢測(cè)器泣栈。根據(jù)測(cè)試SSD更為有效卜高。
"""Face detection model.
"""
import cv2
import numpy as np
class FaceDetector:
def __init__(self, type, threshold=0.5):
"""Init.
"""
self.type = type
self.t = threshold
if type == 'harr':
self.detector = self._create_harr_detector()
elif type == 'ssd':
self.detector = self._create_ssd_detector()
else:
raise 'You must select a FaceDetector type!'
def _create_haar_detector(self):
"""Create haar cascade classifier.
# Arguments
path: String, path to xml data.
# Returns
face_cascade: haar cascade classifier.
"""
path = 'data/haarcascades/haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(path)
return face_cascade
def _create_ssd_detector(self):
"""Create ssd face classifier.
# Returns
ssd: ssd 300 * 300 face classifier.
"""
prototxt = 'data/ssd/deploy.prototxt.txt'
model = 'data/ssd/ssd300.caffemodel'
ssd = cv2.dnn.readNetFromCaffe(prototxt, model)
return ssd
def _ssd_box(self, detections, h, w):
"""Resize the detection boxes of ssd.
# Arguments
detections: String, path to xml data.
h: Integer, original height of frame.
w: Integer, original width of frame.
# Returns
rects: detection boxes.
"""
rects = []
for i in range(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence < self.t:
continue
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(x1, y1, x2, y2) = box.astype("int")
rects.append((x1, y1, x2 - x1, y2 - y1))
return rects
def detect(self, frame):
"""Detect face with haar cascade classifier.
# Arguments
frame: ndarray(n, n, 3), video frame.
# Returns
faces: List, faces rectangles in the frame.
"""
pic = frame.copy()
if self.type == 'harr':
gray = cv2.cvtColor(pic, cv2.COLOR_BGR2GRAY)
faces = self.detector.detectMultiScale(gray, 1.3, 5)
if self.type == 'ssd':
h, w = pic.shape[:2]
blob = cv2.dnn.blobFromImage(
cv2.resize(pic, (300, 300)), 1.0,
(300, 300), (104.0, 177.0, 123.0))
self.detector.setInput(blob)
detections = self.detector.forward()
faces = self._ssd_box(detections, h, w)
return faces
人臉識(shí)別
下面是人臉識(shí)別的主程序。
- 首先對(duì)每一幀的圖像進(jìn)行人臉檢測(cè)
- 如果已經(jīng)載入特征值南片,就對(duì)檢測(cè)到的人臉提取特征值掺涛,否則直接顯示檢測(cè)結(jié)果。
- 將提取的特征值與保存的特征值分別進(jìn)行歐式距離的計(jì)算疼进,提取出最小的一個(gè)值薪缆。
- 如果小于閾值,那就是我們要識(shí)別的人伞广,否則就不是拣帽。
- 顯示檢測(cè)結(jié)果,使用不同的顏色標(biāo)注檢測(cè)到的人臉嚼锄。
通過(guò)多次按space鍵進(jìn)行人臉I(yè)D的錄入减拭,每次錄入同一個(gè)人臉的不同姿態(tài),最后并將其特征保存下來(lái)区丑。
"""Face recognition of PC camera.
"""
import os
import cv2
import numpy as np
import utils.utils as u
from utils.window_manager import WindowManager
from utils.face_detector import FaceDetector
class Face:
def __init__(self, threshold):
"""Init.
# Arguments
threshold: Float, threshold for specific face.
"""
self._t = threshold
self._key = self._load_key()
self._key_cache = []
self._model = u.get_feature_model()
self._windowManager = WindowManager('Face', self.on_keypress)
self._faceDetector = FaceDetector('ssd', 0.5)
def run(self):
"""Run the main loop.
"""
capture = cv2.VideoCapture(0)
self._windowManager.create_window()
while self._windowManager.is_window_created:
success = capture.grab()
_, frame = capture.retrieve()
if frame is not None and success:
faces = self._faceDetector.detect(frame)
if self._key is not None and faces is not None:
label = self._compare_distance(frame, faces)
f = self._draw(frame, faces, label)
else:
f = self._draw(frame, faces)
self._windowManager.show(f)
self._windowManager.process_events(frame, faces)
def _load_key(self):
"""Load the key feature.
"""
kpath = 'data/key.npy'
if os.path.exists(kpath):
key = np.load('data/key.npy')
else:
key = None
return key
def _get_feat(self, frame, face):
"""Get face feature from frame.
# Arguments
frame: ndarray, video frame.
face: tuple, coordinates of face in the frame.
# Returns
feat: ndarray (128, ), face feature.
"""
x, y, w, h = face
img = frame[y: y + h, x: x + w, :]
image = u.process_image(img)
feat = self._model.predict(image)[0]
return feat
def _compare_distance(self, frame, faces):
"""Compare faces feature in the frame with key.
# Arguments
frame: ndarray, video frame.
faces: List, coordinates of faces in the frame.
# Returns
label: list, if match the key.
"""
label = []
for (x, y, w, h) in faces:
feat = self._get_feat(frame, (x, y, w, h))
dist = []
for k in self._key:
dist.append(np.linalg.norm(k - feat))
dist = min(dist)
print(dist)
if dist < self._t:
label.append(1)
else:
label.append(0)
print(label)
return label
def _draw(self, frame, faces, label=None):
"""Draw the rectangles in the frame.
# Arguments
frame: ndarray, video frame.
faces: List, coordinates of faces in the frame.
label: List, if match the key.
# Returns
f: ndarray, frame with rectangles.
"""
f = frame.copy()
color = [(0, 0, 255), (255, 0, 0)]
if label is None:
label = [0 for _ in range(len(faces))]
for rect, i in zip(faces, label):
(x, y, w, h) = rect
f = cv2.rectangle(f, (x, y),
(x + w, y + h),
color[i], 2)
return f
def on_keypress(self, keycode, frame, faces):
"""Handle a keypress event.
Press esc to quit window.
Press space 5 times to record different gestures of the face.
# Arguments
keycode: Integer, keypress event.
frame: ndarray, video frame.
faces: List, coordinates of faces in the frame.
"""
if keycode == 32: # space -> save face id.
nums = len(self._key_cache)
if nums < 5:
feat = self._get_feat(frame, faces[0])
self._key_cache.append(feat)
print('Face id {0} recorded!'.format(nums + 1))
else:
np.save('data/key.npy', np.array(self._key_cache))
print('All face ID recorded!')
self._key = self._key_cache
self._key_cache = []
elif keycode == 27: # escape -> quit
self._windowManager.destroy_window()
if __name__ == '__main__':
face = Face(0.3)
face.run()
因?yàn)椴幌肼赌標(biāo)詻](méi)有效果圖~