摘要
Stacked Hourglass(Stacked HG, 堆疊沙漏)網(wǎng)絡(luò)最早出現(xiàn)在2016年的<<Stacked Hourglass Networks for Human Pose Estimation
>>一文,作者來自密西根大學(xué)。作者通過提出該網(wǎng)絡(luò)結(jié)構(gòu)來定位人體的關(guān)鍵點她奥,從而完成人體姿態(tài)的估計冰评。Stacked HG網(wǎng)絡(luò)的作用是通過估計人體關(guān)鍵點的熱度圖來實現(xiàn)關(guān)鍵點的定位。目前朗伶,基于該網(wǎng)絡(luò)結(jié)構(gòu)的各種變種算法,牢牢占據(jù)了姿態(tài)檢測的半壁江山昨稼,并且在人臉關(guān)鍵點定位領(lǐng)域节视,Stacked HG也得到了越來越多的應(yīng)用(如2017年的FAN,2018年的LAB等算法)假栓。
網(wǎng)絡(luò)由來
CNN的多層次特征
CNN深度卷積網(wǎng)絡(luò)如vgg16,resnet寻行,近幾年在推動人工智能的發(fā)展上,做出了巨大的貢獻匾荆,原因就在于CNN可以自動提取對分類/檢測/識別任務(wù)有幫助的特征拌蜘,不再需要傳統(tǒng)人工設(shè)計特征,如SIFT/HOG等牙丽。
CNN通常含有很深的網(wǎng)絡(luò)層简卧,每一層代表對圖像的特征提取,被稱為feature map烤芦。隨著網(wǎng)絡(luò)層數(shù)的加深举娩,通常由于pooling 或者stride=2的conv操作,使得feature map的尺寸逐漸減小构罗,從而形成不同尺度下的特征圖铜涉。卷積網(wǎng)絡(luò)對圖像特征的提取,是隨著網(wǎng)絡(luò)的層數(shù)的加深遂唧,而從低層特征描述逐漸抽象為高層特征描述芙代。以前估計姿態(tài)的網(wǎng)絡(luò)結(jié)構(gòu)(如DeepPose),大多只使用最后一層的卷積特征蠢箩,這樣進利用單一尺度下的特征度來進行人體關(guān)鍵點定位链蕊,會造成信息的丟失。-
多尺度特征
對于人體姿態(tài)估計這種關(guān)聯(lián)型任務(wù)谬泌,全身不同的關(guān)節(jié)點滔韵,并不是在相同的feature map上具有最好的識別精度。舉例來說掌实,胳膊可能在第3層的feature map上容易識別陪蜻,而頭部在第5層上更容易識別,見下圖贱鼻。所以宴卖,需要設(shè)計一種可以同時使用多個feature map的網(wǎng)絡(luò)結(jié)構(gòu)。
HourGlass 捕捉每一個尺度下的信息
如論文所述邻悬,HG的設(shè)計動機是對于在每一個尺度下捕捉信息的需求症昏。盡管局部信息對于識別人面部,手部特征很有效父丰,但對于人體姿態(tài)的最終估計則需要對整個人體的聯(lián)合理解肝谭。人體的方位,四肢的布局,關(guān)節(jié)點之間的關(guān)系等這些線索都可能是在在不同尺度下獲得的最佳識別結(jié)果攘烛。
網(wǎng)絡(luò)結(jié)構(gòu)
- Hourglass
總體來講魏滚,Hourglass網(wǎng)絡(luò)結(jié)構(gòu)是一種簡單的,具有捕捉各尺度下信息能力的最小設(shè)計坟漱。同時鼠次,它兼顧了“ bottom-up”(從高分辨率到低分辨率)和“top-down”之間的對稱分布(FCN這種屬于嚴(yán)重的bottom-up設(shè)計)。從結(jié)構(gòu)上芋齿,HG可看作是conv-deconv或者encoder-decoder的結(jié)構(gòu)腥寇。從輸入開始,經(jīng)過多次的降采樣到4x4沟突,再經(jīng)過同等次數(shù)的上采樣將featuremap 恢復(fù)成原輸入大小花颗。
具體來講捕传,HG是一個遞歸的過程惠拭。一個n階HG的輸入是64x64(對于一個256大小的輸入,需經(jīng)過7x7conv庸论,maxpool將其降采樣至64x64职辅,選擇64的原因是為了節(jié)省計算量),然后輸入會經(jīng)過兩個分支: 低分辨率分支(low resresolution)和高分辨率分支聂示。高分辨率分支是一個殘差塊(up1模塊)域携,低分辨率分支由 一個maxpool-residual(low1模塊),串聯(lián)一個殘差塊或n-
階HG(low2模塊)組成鱼喉,最后秀鞭,低分辨率分支經(jīng)過upSampling-residual(up0模塊) 與高分辨率分支的輸出相加。其中扛禽,整個網(wǎng)絡(luò)中的殘差塊輸入輸出尺寸相等锋边。
- Stacked HG
如下圖所示,含有1個HG單元的網(wǎng)絡(luò)經(jīng)HG 后經(jīng)過參差-conv-relu-bn- 1x1conv輸出N 個64x64
熱度圖[64,64, N_Landmark]编曼。由于一個獨立的HG的輸入輸出都是64x64豆巨,因此多個HG可按順序在深度方向上進行堆疊。其中后一個HG的輸入是由3個部分構(gòu)成:前一個HG的輸入, 輸出經(jīng)1x1卷積(Conv_2), 倒數(shù)第2層經(jīng)1x1卷積(Conv_3)掐场,如下圖所示往扔。最后一層的HG輸出為Conv_1,注: 最后一個1x1卷積(下圖中的conv_1即為最后一個輸出層)的通道數(shù)為關(guān)鍵點數(shù)N_Landmark熊户,尺寸為64x64萍膛,即所有關(guān)鍵點的熱度圖()。
代碼實現(xiàn)
# --------------------------Method 2 --------------------------------------------
class StackedHG2:
def __init__(self, resolution_inp=256, channel=3, name='stackedhg'):
self.name = name
self.channel = channel
self.resolution_inp = resolution_inp
def res_blk(self, x, num_outputs, kernel_size, stride=1, scope=None):
"""
參差單元嚷堡,包含兩個分支: 常規(guī)的深度分支和shortcut分支蝗罗,
深度分支(這里實現(xiàn)的是深層結(jié)構(gòu)resnet50的參差單元結(jié)構(gòu)) 由1個1x1卷積(通道降維),1個3x3卷積,1個1x1卷積(通道升維) 串聯(lián)組成绿饵,其中每個卷積后都做relu和batchnorm
shortcut分支有兩種情況:當(dāng)參差單元的輸入輸出shape不一致時(stride=2)欠肾,shortcut包含一個1x1卷積,否則shortcut等于輸入x
輸出為shortcut分支和深度分支的元素和(帶relu)
:param x: input tensor
:param num_outputs: number channels of output
:param kernel_size:
:param stride:
:param scope:
:return:
"""
with tf.variable_scope(scope, "resBlk"):
with arg_scope([tcl.conv2d],
activation_fn=tf.nn.relu,
normalizer_fn=tcl.batch_norm,
padding="SAME"):
small_ch = num_outputs // 4
conv1 = tcl.conv2d(x, small_ch, kernel_size=1, stride=stride)
conv2 = tcl.conv2d(conv1, small_ch, kernel_size=kernel_size, stride=1)
conv3 = tcl.conv2d(conv2, num_outputs, kernel_size=1, stride=1)
shortcut = x
if stride != 1 or x.get_shape()[-1] != num_outputs:
shortcut = tcl.conv2d(x, num_outputs, kernel_size=1, stride=stride, scope="shortcut")
out = tf.add(conv3, shortcut)
out = tf.nn.relu(out)
return out
def hour_glass(self, x, level, num_outputs, scope=None):
"""
single hour glass network 升級版. 可看做一個遞歸過程: hg(n)的輸入x經(jīng)過兩個分支:下采樣分支和求和分支拟赊,
求和分支是一個殘差快(resblock), 下采樣分支是一個 maxpool-resblock 串聯(lián) 一個殘差快[n=1時]或hg(n-1)刺桃,
然后hg(n-1)經(jīng)過 resblock-上采樣 后會求和分支進行按元素相加,輸出相加的結(jié)果
:param x:input tensor
:param level: times of down sampling, i.e., hg(n) n的最大值
:param num_outputs: number of output channel
:param scope:
:return:
"""
with tf.variable_scope(scope, 'hourglass'):
add_branch = self.res_blk(x, num_outputs, 3, 1, scope='up1')
down_sampling = tf.contrib.layers.max_pool2d(x, [2, 2], [2, 2], 'VALID')
down_sampling = self.res_blk(down_sampling, num_outputs, 3, 1, scope='low1')
if level > 1:
center = hour_glass1(down_sampling, level - 1, num_outputs, scope='low2')
else:
center = self.res_blk(down_sampling, num_outputs, 3, 1, scope='low2')
up_sampling = self.res_blk(center, num_outputs, 3, 1, scope='low3')
up_sampling = tf.image.resize_nearest_neighbor(up_sampling, tf.shape(up_sampling)[1:3] * 2,
name='upsampling')
add_out = tf.add(add_branch, up_sampling)
return add_out
def __call__(self, x, stage=4, is_training=True):
"""
堆疊多個HG吸祟。由基礎(chǔ)網(wǎng)絡(luò)瑟慈,stage x HG 串聯(lián)組成,
基礎(chǔ)網(wǎng)絡(luò)是 1個7x7卷積屋匕,1個參差葛碧,1個池化,2個參差串聯(lián)組成
HG網(wǎng)絡(luò)包括hourglass 和 post網(wǎng)絡(luò)組成过吻,hourglass 的輸出經(jīng)過1個參差进泼,1個卷積-relu-bn, 1個卷積(1x1,)輸出N_landmark個熱度圖
第i(i>1)個HG的輸入是(i-1)個HG 中3部分的元素和: 輸入, 輸出層out經(jīng)1x1卷積纤虽, 輸出out的上一層經(jīng)過1x1卷積乳绕。
:param x: input tensor [batch, 256,256,3]
:param stage: int, number of hourglass to stack, default is 4
:param is_training: bool, train of test
:return:
"""
with tf.variable_scope(self.name) as scope:
with arg_scope([tcl.batch_norm], is_training=is_training, scale=True):
with arg_scope([tcl.conv2d],
activation_fn=None,
padding="SAME"):
base = tcl.conv2d(x, 64, kernel_size=7, stride=2,
activation_fn=tf.nn.relu, normalizer_fn=tcl.batch_norm)
base = self.res_blk(base, 128, 3, 1)
base = tcl.avg_pool2d(base, kernel_size=2, stride=2)
base = self.res_blk(base, 128, 3, 1)
base = self.res_blk(base, 256, 3, 1)
inputs = base
for i in range(0, stage):
with tf.variable_scope('hg%d' % i):
hg = self.hour_glass(inputs, 4, 256)
# post
top_hg = self.res_blk(hg, 256, 3, 1)
previous = tcl.conv2d(top_hg, 256, kernel_size=1, stride=1,
activation_fn=tf.nn.relu, normalizer_fn=tcl.batch_norm)
out = tcl.conv2d(previous, 68, kernel_size=1, stride=1)
if i < stage - 1:
al = tcl.conv2d(out, 256, kernel_size=1, stride=1)
bl = tcl.conv2d(previous, 256, kernel_size=1, stride=1)
sum_ = tf.add(bl, inputs)
sum_ = tf.add(sum_, al)
inputs = sum_
return out
運行
# -------------------------- Demo and Test --------------------------------------------
batch_size = 16
num_batches = 100
def time_tensorflow_run(session, target, feed, info_string):
"""
calculate time for each session run
:param session: tf.Session
:param target: opterator or tensor need to run with session
:param feed: feed dict for session
:param info_string: info message for print
:return:
"""
num_steps_burn_in = 10 # 預(yù)熱輪數(shù)
total_duration = 0.0 # 總時間
total_duration_squared = 0.0 # 總時間的平方和用以計算方差
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target, feed_dict=feed)
duration = time.time() - start_time
if i >= num_steps_burn_in: # 只考慮預(yù)熱輪數(shù)之后的時間
if not i % 10:
print('[%s] step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / num_batches # 平均每個batch的時間
vr = total_duration_squared / num_batches - mn * mn # 方差
sd = math.sqrt(vr) # 標(biāo)準(zhǔn)差
print('[%s] %s across %d steps, %.3f +/- %.3f sec/batch' % (datetime.now(), info_string, num_batches, mn, sd))
# test demo
def run_benchmark():
"""
main function for test or demo
:return:
"""
with tf.Graph().as_default():
image_size = 256 # 輸入圖像尺寸
images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
# method 0
# prediction = hour_glass(images, 256, "hg")
# prediction = hour_glass1(images, 3, 256, "hg")
model = StackedHG2(image_size, 3)
prediction = model(images, 4)
fc = prediction
params = tf.trainable_variables()
for v in params:
print(v)
init = tf.global_variables_initializer()
print("out shape ", prediction)
sess = tf.Session()
print("init...")
sess.run(init)
print("predict..")
writer = tf.summary.FileWriter("./logs")
writer.add_graph(sess.graph)
time_tensorflow_run(sess, prediction, {}, "Forward")
# 用以模擬訓(xùn)練的過程
objective = tf.nn.l2_loss(fc) # 給一個loss
grad = tf.gradients(objective, params) # 相對于loss的 所有模型參數(shù)的梯度
print('grad backword')
time_tensorflow_run(sess, grad, {}, "Forward-backward")
writer.close()
if __name__ == '__main__':
run_benchmark()
參數(shù)量
時間效率
參考
https://www.zhihu.com/question/56024942
https://blog.csdn.net/wangzi371312/article/details/81174452
https://blog.csdn.net/shenxiaolu1984/article/details/51428392
https://blog.csdn.net/u013841196/article/details/81048237