SA-Siam:A Twofold Siamese Network for Real-Time Object Tracking

標(biāo)題：A Twofold Siamese Network for Real-Time Object Tracking

作者：Anfeng He, Chong Luo, Xinmei Tian, Wenjun Zeng.

出處：CVPR2018

領(lǐng)域：單目標(biāo)跟蹤

【code】: 嘗試復(fù)現(xiàn)論文效果和悦，項(xiàng)目doing峻贮。歡迎討論和交流昧廷。

new iders.?兩個(gè)siameseFC脾猛，channel attention

why work??1魔种、deep representation combine(utilize heterogeneous features)(比它的baseline siameseFC效果好的最主要原因)；2辅鲸、大量的訓(xùn)練數(shù)據(jù)，ImageNet屉栓；3、large search regions;

Abstract:

????????作者發(fā)現(xiàn)：圖像分類任務(wù)的語(yǔ)義特征Semantic features耸袜，圖像相似性匹配的表觀特征Appearance feature友多，具有互補(bǔ)的性質(zhì)。兩個(gè)分支S_SiameseNet和A-SiameseNet都是基于siameseFC結(jié)構(gòu)堤框，分開訓(xùn)練域滥。其中A-Net和SiameFC基本相似；S-Net中使用了通道注意力機(jī)制蜈抓。

1. Introduction

? ? ? ? The key to design a high-performance tracker is to find expressive features and corresponding calssifiers that are simultaneously discriminative and generalized.?Being discriminative allows the tracker to differentiate the true target from the cluttered or even deceptive background.?Being generalized means that a tracker would tolerate the appearance changes of the tracked object, even when the object is not known a priori.

? ? ? ? 跟蹤算法的判別能力：能夠?qū)⒛繕?biāo)從復(fù)雜（雜斑骗绕、欺騙性的）背景中區(qū)分出來(lái)；

? ? ? ? 跟蹤算法的泛化能力：能夠應(yīng)對(duì)目標(biāo)的表觀變化资昧。

? ? ? ? To siameFC, the generalization capability remains quite poor and it encounters difficulties when the target has significant appearance change. As a result, SiameFC still has a performance gap to the best online tracker. As a result, SiamFC still has a performance gap to the best online tracker.

? ? ? ? siameFC的泛化能力較差：當(dāng)目標(biāo)發(fā)生較大的表觀變化時(shí)，就會(huì)漂移荆忍。所以論文的目的格带，improve siameFC的泛化能力generalization capability。

????????It is widely understood that, in a deep CNN trained for image classification task, features from deeper layers contain stronger semantic information and is more invariant to object appearance changes. These semantic features are an ideal complement to the appearance features trained in a similarity learning problem

? ? ? ? 大家都知道widely understood that刹枉，來(lái)自圖像分類任務(wù)的預(yù)訓(xùn)練CNN的高層特征較強(qiáng)的語(yǔ)義信息叽唱，對(duì)目標(biāo)表觀變化具有不變性（當(dāng)目標(biāo)變形時(shí)，這個(gè)特征仍然代表這個(gè)目標(biāo)）微宝。

????????For the semantic branch, we further propose a channel attention mechanism to achieve a minimum degree of target adaptation. The motivation is that different objects activate different sets of feature channels. We shall give higher weights to channels that play more important roles in tracking specific targets. This is realized by computing channel-wise weights based on the channel responses at the target object and in the surrounding context. This simplest form of target adaptation improves the discrimination power of the tracker.

? ??????有些特征通道channel（注意是特征通道棺亭，而不是特征）對(duì)某些特定的跟蹤目標(biāo)是很有用的，而另一些對(duì)該跟蹤目標(biāo)的基本沒(méi)什么作用蟋软；所以應(yīng)該give higher weights to channels that play more important roles in tracking specific targets.?

? ??????小結(jié)：

? ??????1镶摘、SiameFC有一個(gè)不足，就是當(dāng)目標(biāo)表觀發(fā)生極大變化岳守，容易跟丟凄敢。而目標(biāo)的語(yǔ)義特征對(duì)目標(biāo)的表觀變化具有不變性。兩者結(jié)合可以互補(bǔ)湿痢。

? ? ? ? 2涝缝、不同特征通道，對(duì)特定的跟蹤目標(biāo)的判別能力不同。有些特征通道對(duì)于跟蹤某些目標(biāo)很重要拒逮，而有些通道對(duì)跟蹤這些目標(biāo)基本不起作用罐氨。

2. Related Work

2.1. Siamese Network Based Trackers

? ??????A notable advantage of this method is that it needs no or little online training. Thus, real-time tracking can be easily achieved.

? ??????The advantage of a fullyconvolutional network is that, instead of a candidate patch of the same size of the target patch, one can provide as input to the network a much larger search image and it will compute the similarity at all translated sub-windows on a dense grid in a single evaluation.

? ??????Significantly better performance is achieved without much speed drop.

????????SA-Siam inherits network architecture from SiamFC. We intend to improve SiamFC with an innovative way to utilize heterogeneous features.

2.2. Ensemble Trackers

????????A common insight of these ensemble trackers is that it is possible to make a strong tracker by utilizing different layers of CNN features. Besides, the correlation across models should be weak. In SA-Siam design, the appearance branch and the semantic branch use features at very different abstraction levels. Besides, they are not jointly trained to avoid becoming homogeneous.

2.3. Adaptive Feature Selection

????????不同特征對(duì)不同的跟蹤目標(biāo)的不同的影響，使用單一對(duì)象跟蹤的所有特性既不高效也不有效滩援。Recently, SENet demonstrates the effectiveness of channel-wise attention on image recognition tasks栅隐。

????????In our SA-Siam network, we perform channel-wise attention based on the channel activations. It can be looked?on as a type of target adaptation, which potentially improves the tracking performance.

3. Our Approach

????????The fundamental idea behind this design ：相似性學(xué)習(xí)的表觀特征和分類任務(wù)的語(yǔ)義特征具有互補(bǔ)性質(zhì)。他們發(fā)現(xiàn)了狠怨。

3.1 SA-Sia Network Architecture

????????The two branches are separately trained and not combined until testing time.

? ? ? ? The appearance branch

????????類似于siameseFC.

? ? ? ? The semantic branch：

????????pretrained CNN(ALexNet)约啊、conv4/conv5、fusion module(1 X 1 ConvNet)佣赖、crop operation恰矩、attention module.

? ? ? ? we only train the fusion module and the channel attention module.

? ? ? ? During testing time

????????按權(quán)重結(jié)合two branches產(chǎn)生的響應(yīng)圖。Similar to SiamFC憎蛤，use multi-scale changes. find that using three scales strikes a good balance between performance and speed.

3.2 Channel Attension in Semantic Branch

????????高層語(yǔ)義特征對(duì)目標(biāo)的表觀變化魯棒外傅，因此使跟蹤算法more generalized，但是less discriminative俩檬，定位不準(zhǔn)萎胰。為了提高semantic branch的discriminative power，設(shè)計(jì)了通道注意力機(jī)制棚辽。

????????直觀上技竟，不同通道在跟蹤不同目標(biāo)中扮演不同的角色。一些通道對(duì)跟蹤某些目標(biāo)極其重要屈藐，但是在跟蹤另一些目標(biāo)時(shí)卻是可有可無(wú)榔组。If we could adapt the channel importance to the tracking target, we achieve the minimum functonality of target adaptation。In order to do so,不僅與目標(biāo)有關(guān)联逻，而且目標(biāo)的背景區(qū)域也很重要搓扯。Therefore，the proposed attention module 的輸入不是目標(biāo)本身包归，而是包含背景信息比目標(biāo)區(qū)域更大的區(qū)域锨推。

? ? ? ? 以conv5特征圖為例。該特征圖的大小是22X22公壤。

? ? ? ? 首先將特征圖分為3X3網(wǎng)格换可，中間一塊為6X6大小，與目標(biāo)區(qū)域一樣大厦幅。

????????然后锦担，在每個(gè)網(wǎng)格上做max pooling。

? ? ? ? 再次慨削，使用兩層的多層感知機(jī)（MLP）為這個(gè)通道產(chǎn)生一個(gè)系數(shù)洞渔。

? ? ? ? 最后套媚，使用帶有bias的sigmoid函數(shù)，生成最后的參數(shù)磁椒。

3.3. Discussions of Design Choices

? ? ? ? We separately train the two branches.

? ? ? ? We do not fine-tune S-Net.

? ? ? ? We keep A-Net as it is in SiameFC.

4. Experiments

4.1. Implementation Details

? ? ? ?Network structure：A-Net和SiamseFC的網(wǎng)絡(luò)結(jié)構(gòu)exactly一樣堤瘤。S-Net采用imageNet上預(yù)訓(xùn)練的AlexNet；對(duì)stride做一點(diǎn)小的改變浆熔，使S-Net的輸出和A-Net有相同的大小本辐。

? ? ? ? 在注意力模塊中，池化后的特征stack into 9維vector医增。The following MLP有一個(gè)有9個(gè)神經(jīng)元的隱藏層慎皱，使用了ReLU非線性函數(shù)。最后在使用Sigmoid函數(shù)叶骨，使用的bias為0.5茫多。this is to ensure that no channel will be suppressed to zero。

? ? ? ? Data dimensions:

? ? ? ??input:127*127*3忽刽、255*255*3天揖。

? ? ? ? output:6*6*256、22*22*256.

? ? ? ? conv4:24*24*384.

? ? ? ? conv5:22*22*256.

????????response maps :17*17.

? ? ? ? Training:

? ??????ILSVRC-2015跪帝，只使用Color images今膊。tensorflow。測(cè)試的平均速度是50fps.

? ? ? ? Hyperpatrameters:

? ? ? ? conbine weight = 0.3伞剑。 three scales斑唬。

4.2. Datasets and Evaluation Metrics

? ? ? ? OTB:

? ? ? ? VOT:

4.3. ?Ablation Analysis

? ? ? ? The semantic branch and the appearance branch complement each other.

? ? ? ? Using multilevel features and channel attention bring gain.

? ? ? ? Separate vs. joint training.

4.4. Comparison with State-of-the-Arts

? ? ? ? OTB benchmarks.

? ? ? ? VOT2015 benchmark.

? ??????VOT2016 benchmark.

? ??????VOT2017 benchmark.

5. Conclusion

????????In the feature, we plan to continue exploring the effective fusion of deep feature in object trcking task.

最后編輯于：2019.05.12 12:41:13

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市黎泣，隨后出現(xiàn)的幾起案子赖钞，更是在濱河造成了極大的恐慌，老刑警劉巖聘裁，帶你破解...
沈念sama閱讀 217,734評(píng)論 6贊 505
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異弓千，居然都是意外死亡衡便，警方通過(guò)查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,931評(píng)論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門洋访，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)镣陕，“玉大人，你說(shuō)我怎么就攤上這事姻政〈粢郑” “怎么了？”我有些...
開封第一講書人閱讀 164,133評(píng)論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵汁展，是天一觀的道長(zhǎng)鹊碍。經(jīng)常有香客問(wèn)我厌殉，道長(zhǎng)，這世上最難降的妖魔是什么侈咕？我笑而不...
開封第一講書人閱讀 58,532評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任公罕，我火速辦了婚禮，結(jié)果婚禮上耀销，老公的妹妹穿的比我還像新娘楼眷。我一直安慰自己，他們只是感情好熊尉，可當(dāng)我...
茶點(diǎn)故事閱讀 67,585評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開白布罐柳。她就那樣靜靜地躺著，像睡著了一般狰住。火紅的嫁衣襯著肌膚如雪张吉。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,462評(píng)論 1贊 302
城市分裂傳說(shuō)
那天转晰，我揣著相機(jī)與錄音芦拿，去河邊找鬼。笑死查邢，一個(gè)胖子當(dāng)著我的面吹牛蔗崎，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播扰藕，決...
沈念sama閱讀 40,262評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼缓苛，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了邓深？” 一聲冷哼從身側(cè)響起未桥，我...
開封第一講書人閱讀 39,153評(píng)論 0贊 276
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎芥备，沒(méi)想到半個(gè)月后冬耿，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,587評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡萌壳，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,792評(píng)論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年亦镶，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片袱瓮。...
茶點(diǎn)故事閱讀 39,919評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡缤骨，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出尺借，到底是詐尸還是另有隱情绊起，我是刑警寧澤，帶...
沈念sama閱讀 35,635評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布燎斩，位于F島的核電站虱歪，受9級(jí)特大地震影響蜂绎，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜实蔽，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,237評(píng)論 3贊 329
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一荡碾、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧局装，春花似錦坛吁、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,855評(píng)論 0贊 22
一樁弒父案拨脉，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至宣增，卻和暖如春玫膀，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背爹脾。一陣腳步聲響...
開封第一講書人閱讀 32,983評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工帖旨，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人灵妨。一個(gè)月前我還...
沈念sama閱讀 48,048評(píng)論 3贊 370
代替公主和親
正文我出身青樓解阅，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親泌霍。傳聞我的和親對(duì)象是個(gè)殘疾皇子货抄，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,864評(píng)論 2贊 354