Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks翻譯

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

深度卷積生成敵對(duì)網(wǎng)絡(luò)無(wú)監(jiān)督表示學(xué)習(xí)

論文：http://arxiv.org/pdf/1511.06434v2.pdf

ABSTRACT

摘要

In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

近年來(lái)，卷積網(wǎng)絡(luò)（CNN）的監(jiān)督式學(xué)習(xí)在計(jì)算機(jī)視覺(jué)應(yīng)用中得到了廣泛的應(yīng)用澎剥。相比之下唬党，無(wú)監(jiān)督的CNN學(xué)習(xí)受到的關(guān)注較少扮宠。在這項(xiàng)工作中丹诀，我們希望能夠幫助彌合有監(jiān)督學(xué)習(xí)的CNN成功與無(wú)監(jiān)督學(xué)習(xí)之間的差距柏卤。我們引入了一類(lèi)稱(chēng)為深度卷積生成對(duì)抗網(wǎng)絡(luò)（CNG）的類(lèi)县貌，它具有一定的架構(gòu)約束缸夹，并證明它們是非監(jiān)督學(xué)習(xí)的有力候選痪寻。對(duì)各種圖像數(shù)據(jù)集進(jìn)行訓(xùn)練，我們展示出令人信服的證據(jù)虽惭，證明我們深層卷積對(duì)抗對(duì)從發(fā)生器和鑒別器中的對(duì)象部分到場(chǎng)景學(xué)習(xí)了表示層次橡类。此外，我們使用學(xué)習(xí)的功能進(jìn)行新穎的任務(wù) - 證明其作為一般圖像表示的適用性芽唇。

1 INTRODUCTION

1引言

Learning reusable feature representations from large unlabeled datasets has been an area of active research. In the context of computer vision, one can leverage the practically unlimited amount of unlabeled images and videos to learn good intermediate representations, which can then be used on a variety of supervised learning tasks such as image classi?cation. We propose that one way to build good image representations is by training Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), and later reusing parts of the generator and discriminator networks as feature extractors for supervised tasks. GANs provide an attractive alternative to maximum likelihood techniques. One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs. There has been very limited published research in trying to understand and visualize what GANs learn, and the intermediate representations of multi-layer GANs.

從大型未標(biāo)記數(shù)據(jù)集學(xué)習(xí)可重用特征表示一直是一個(gè)積極研究的領(lǐng)域顾画。在計(jì)算機(jī)視覺(jué)的背景下，人們可以利用實(shí)際上無(wú)限量的未標(biāo)記圖像和視頻來(lái)學(xué)習(xí)良好的中間表示匆笤，然后可以將其用于各種監(jiān)督學(xué)習(xí)任務(wù)研侣，如圖像分類(lèi)。我們提出建立良好圖像表示的一種方法是通過(guò)對(duì)生成敵對(duì)網(wǎng)絡(luò)（GAN）進(jìn)行訓(xùn)練（Goodfellow等人炮捧，2014）庶诡，并且隨后將生成器和鑒別器網(wǎng)絡(luò)的部分重用為監(jiān)督任務(wù)的特征提取器。GAN為最大似然技術(shù)提供了一個(gè)有吸引力的替代方案咆课。人們還可以爭(zhēng)辯說(shuō)末誓，他們的學(xué)習(xí)過(guò)程和缺乏啟發(fā)式成本函數(shù)（如像素方式的獨(dú)立均方誤差）對(duì)表示學(xué)習(xí)很有吸引力。據(jù)了解书蚪，GAN在訓(xùn)練中不穩(wěn)定喇澡，往往導(dǎo)致產(chǎn)生無(wú)意義輸出的發(fā)電機(jī)。在嘗試?yán)斫夂涂梢暬疓AN學(xué)習(xí)的內(nèi)容以及多層GAN的中間表示方面殊校，發(fā)表的研究非常有限晴玖。

In this paper, we make the following contributions

在本文中，我們做出以下貢獻(xiàn)

? We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to train in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN)

?我們提出并評(píng)估了一系列對(duì)卷積GAN的架構(gòu)拓?fù)涞募s束條件为流，這些約束條件使得它們?cè)诖蠖鄶?shù)環(huán)境中都能夠穩(wěn)定地進(jìn)行訓(xùn)練窜醉。我們將這類(lèi)架構(gòu)命名為Deep Convolutional GAN（DCGAN）

? We use the trained discriminators for image classi?cation tasks, showing competitive performance with other unsupervised algorithms.

?我們使用訓(xùn)練過(guò)的鑒別器進(jìn)行圖像分類(lèi)任務(wù)，顯示與其他無(wú)監(jiān)督算法的競(jìng)爭(zhēng)性能艺谆。

? We visualize the ?lters learnt by GANs and empirically show that speci?c ?lters have learned to draw speci?c objects.

?我們將由GAN學(xué)習(xí)的濾波器可視化，并憑經(jīng)驗(yàn)顯示特定的濾波器已經(jīng)學(xué)會(huì)了繪制特定的對(duì)象拜英。

? We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.

?我們證明生成器具有有趣的矢量算術(shù)屬性静汤，可以輕松處理生成的樣本的許多語(yǔ)義質(zhì)量。

2 RELATED WORK

2相關(guān)工作

2.1 REPRESENTATION LEARNING FROM UNLABELED DATA

2.1表示從UNLABELED數(shù)據(jù)中學(xué)習(xí)

Unsupervised representation learning is a fairly well studied problem in general computer vision research, as well as in the context of images. A classic approach to unsupervised representation learning is to do clustering on the data (for example using K-means), and leverage the clusters for improved classi?cation scores. In the context of images, one can do hierarchical clustering of image patches (Coates & Ng, 2012) to learn powerful image representations. Another popular method is to train auto-encoders (convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)) that encode an image into a compact code, and decode the code to reconstruct the image as accurately as possible. These methods have also been shown to learn good feature representations from image pixels. Deep belief networks (Lee et al., 2009) have also been shown to work well in learning hierarchical representations.

無(wú)監(jiān)督表示學(xué)習(xí)在計(jì)算機(jī)視覺(jué)一般性研究中以及在圖像上下文中是一個(gè)相當(dāng)好的研究問(wèn)題。無(wú)監(jiān)督表示學(xué)習(xí)的經(jīng)典方法是對(duì)數(shù)據(jù)進(jìn)行聚類(lèi)（例如使用K均值）虫给，并利用聚類(lèi)提高分類(lèi)分?jǐn)?shù)藤抡。在圖像上下文中，可以對(duì)圖像塊進(jìn)行分層聚類(lèi)（Coates＆Ng抹估，2012）缠黍，以學(xué)習(xí)強(qiáng)大的圖像表示。另一種流行的方法是訓(xùn)練自動(dòng)編碼器（卷積药蜻，堆疊（Vincent et al瓷式。，2010）语泽，將代碼的內(nèi)容和組成部分分開(kāi)（Zhao et al贸典。，2015）踱卵，階梯結(jié)構(gòu)（Rasmus等廊驼，2015））將圖像編碼成緊湊的代碼，并對(duì)代碼進(jìn)行解碼以盡可能準(zhǔn)確地重建圖像惋砂。這些方法也被證明可以從圖像像素學(xué)習(xí)好的特征表示妒挎。深度信念網(wǎng)絡(luò)（Lee et al。西饵，2009）也被證明在學(xué)習(xí)分層表示方面效果很好酝掩。

2.2 GENERATING NATURAL IMAGES

2.2生成自然圖像

Generative image models are well studied and fall into two categories: parametric and nonparametric.

生成圖像模型已經(jīng)過(guò)很好的研究，分為兩類(lèi)：參數(shù)化和非參數(shù)化罗标。

The non-parametric models often do matching from a database of existing images, often matching patches of images, and have been used in texture synthesis (Efros et al., 1999), super-resolution (Freeman et al., 2002) and in-painting (Hays & Efros, 2007).

非參數(shù)模型通常與現(xiàn)有圖像的數(shù)據(jù)庫(kù)進(jìn)行匹配庸队，通常匹配圖像塊，并且已經(jīng)用于紋理合成（Efros等人闯割，1999）彻消，超分辨率（Freeman等人，2002）和 - 繪畫(huà)（Hays＆Efros宙拉，2007）宾尚。

Parametric models for generating images has been explored extensively (for example on MNIST digits or for texture synthesis (Portilla & Simoncelli, 2000)). However, generating natural images of the real world have had not much success until recently. A variational sampling approach to generating images (Kingma & Welling, 2013) has had some success, but the samples often suffer from being blurry.Another approach generates images using an iterative forward diffusion process (Sohl-Dickstein et al., 2015). Generative Adversarial Networks (Goodfellow et al., 2014) generated images suffering from being noisy and incomprehensible. A laplacian pyramid extension to this approach (Denton et al., 2015) showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models. A recurrent network approach (Gregor et al., 2015) and a deconvolution network approach (Dosovitskiy et al., 2014) have also recently had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.

用于生成圖像的參數(shù)模型已被廣泛探索（例如MNIST數(shù)字或紋理合成（Portilla＆Simoncelli，2000））谢澈。然而煌贴，直到最近，生成真實(shí)世界的自然圖像并沒(méi)有取得太大的成功锥忿。用于生成圖像的變分抽樣方法（Kingma＆Welling牛郑，2013）取得了一些成功，但樣本經(jīng)常遭受模糊敬鬓。另一種方法使用迭代正向擴(kuò)散過(guò)程生成圖像（Sohl-Dickstein等淹朋，2015）笙各。生成敵對(duì)網(wǎng)絡(luò)（Goodfellow et al。础芍，2014）生成的圖像嘈雜和難以理解杈抢。這種方法的拉普拉斯金字塔延伸（Denton等人，2015）顯示出更高質(zhì)量的圖像仑性，但由于鏈接多個(gè)模型中引入的噪聲惶楼，它們?nèi)匀皇艿轿矬w晃動(dòng)的影響。經(jīng)常性網(wǎng)絡(luò)方法（Gregor等诊杆，2015）和去卷積網(wǎng)絡(luò)方法（Dosovitskiy et al歼捐。，2014）最近也在生成自然圖像方面取得了一些成功刽辙。但是窥岩，他們沒(méi)有將發(fā)電機(jī)用于監(jiān)督任務(wù)。

2.3 VISUALIZING THE INTERNALS OF CNNS

2.3可視化CNNS的內(nèi)部

One constant criticism of using neural networks has been that they are black-box methods, with little understanding of what the networks do in the form of a simple human-consumable algorithm. In the context of CNNs, Zeiler et. al. (Zeiler & Fergus, 2014) showed that by using deconvolutions and ?ltering the maximal activations, one can ?nd the approximate purpose of each convolution ?lter in the network. Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of ?lters (Mordvintsev et al.).

對(duì)使用神經(jīng)網(wǎng)絡(luò)的一個(gè)不斷批評(píng)是它們是黑盒子方法宰缤，幾乎不了解網(wǎng)絡(luò)以簡(jiǎn)單的人類(lèi)可消費(fèi)算法的形式做什么颂翼。在CNN的情況下，Zeiler et慨灭。人朦乏。（Zeiler＆Fergus，2014）表明氧骤，通過(guò)使用反卷積和過(guò)濾最大激活呻疹，可以找出網(wǎng)絡(luò)中每個(gè)卷積濾波器的近似目的。類(lèi)似地筹陵，在輸入上使用梯度下降可以讓我們檢查激活某些過(guò)濾器子集的理想圖像（Mordvintsev等人）刽锤。

3 APPROACH AND MODEL ARCHITECTURE

3方法和模型體系結(jié)構(gòu)

Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. This motivated the authors of LAPGAN (Denton et al., 2015) to develop an alternative approach to iteratively upscale low resolution generated images which can be modeled more reliably. We also encountered dif?culties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identi?ed a family of archi

使用CNN擴(kuò)展GAN來(lái)模擬圖像的歷史嘗試已經(jīng)失敗。這促使LAPGAN的作者（Denton等人朦佩，2015）開(kāi)發(fā)了一種替代方法來(lái)迭代地提高可以更可靠地建模的低分辨率生成圖像并思。我們也遇到了困難，試圖使用監(jiān)督文獻(xiàn)中常用的CNN架構(gòu)來(lái)規(guī)挠锍恚化GAN宋彼。然而，經(jīng)過(guò)廣泛的模型探索后仙畦，我們確定了一個(gè)archi系列

tectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.

通過(guò)一系列數(shù)據(jù)集進(jìn)行穩(wěn)定培訓(xùn)并允許培訓(xùn)更高分辨率和更深層次的生成模型输涕。

Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.

我們的方法的核心是采納和修改最近對(duì)CNN架構(gòu)進(jìn)行的三項(xiàng)變更。

The ?rst is the all convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling. We use this approach in our generator, allowing it to learn its own spatial upsampling, and discriminator.

第一個(gè)是全卷積網(wǎng)絡(luò)（Springenberg et al慨畸。莱坎，2014），它用逐步卷積代替確定性空間匯聚函數(shù)（如maxpooling）寸士，允許網(wǎng)絡(luò)學(xué)習(xí)它自己的空間下采樣型奥。我們?cè)谖覀兊纳善髦惺褂眠@種方法瞳收，允許它學(xué)習(xí)它自己的空間上采樣和鑒別器。

Second is the trend towards eliminating fully connected layers on top of convolutional features. The strongest example of this is global average pooling which has been utilized in state of the art image classi?cation models (Mordvintsev et al.). We found global average pooling increased model stability but hurt convergence speed. A middle ground of directly connecting the highest convolutional features to the input and output respectively of the generator and discriminator worked well. The ?rst layer of the GAN, which takes a uniform noise distribution Z as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is ?attened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.

其次是消除卷積特性之上的完全連接層的趨勢(shì)厢汹。這方面最強(qiáng)有力的例子就是全球平均匯集技術(shù)，這種技術(shù)已經(jīng)應(yīng)用于最先進(jìn)的圖像分類(lèi)模型（Mordvintsev等人）谐宙。我們發(fā)現(xiàn)全球平均匯聚增加了模型的穩(wěn)定性烫葬，但卻傷害了收斂速度將最高卷積特征直接連接到發(fā)生器和鑒別器的輸入和輸出的中間地帶運(yùn)行良好。GAN的第一層以統(tǒng)一的噪聲分布Z作為輸入凡蜻，可以稱(chēng)為完全連接搭综，因?yàn)樗皇且粋€(gè)矩陣乘法，但結(jié)果被重新整形為四維張量并用作卷積棧的起點(diǎn)划栓。對(duì)于鑒別器兑巾，最后的卷積層被抖動(dòng)，然后被饋送到單個(gè)sigmoid輸出中忠荞。有關(guān)示例模型體系結(jié)構(gòu)的可視化蒋歌，請(qǐng)參見(jiàn)圖1。

Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient ?ow in deeper models. This proved critical to get deep generators to begin learning, preventing the generator from collapsing all samples to a single point which is a common failure mode observed in GANs. Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.

第三是批量標(biāo)準(zhǔn)化（Ioffe＆Szegedy委煤，2015）堂油，通過(guò)將每個(gè)單元的輸入標(biāo)準(zhǔn)化為零均值和單位差異來(lái)穩(wěn)定學(xué)習(xí)。這有助于處理由于初始化較差而出現(xiàn)的培訓(xùn)問(wèn)題碧绞，并幫助深層模型中的漸變流府框。這對(duì)于讓深層發(fā)生器開(kāi)始學(xué)習(xí)非常重要，可以防止發(fā)生器將所有樣品壓縮到單個(gè)點(diǎn)讥邻，這是GAN中觀察到的常見(jiàn)故障模式迫靖。然而，直接將蝙蝠applying applying應(yīng)用于所有層兴使，導(dǎo)致樣品振蕩和模型不穩(wěn)定系宜。這是通過(guò)不將蝙蝠chnorm應(yīng)用于發(fā)生器輸出層和鑒別器輸入層而避免的。

The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky recti?ed activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013).

ReLU激活（Nair＆Hinton鲫惶，2010）用于發(fā)生器蜈首，但使用Tanh函數(shù)的輸出層除外。我們觀察到欠母，使用有界激活可使模型更快地學(xué)習(xí)欢策，以飽和并覆蓋訓(xùn)練分布的色彩空間。在鑒別器內(nèi)部赏淌，我們發(fā)現(xiàn)泄漏整流激活（Maas et al踩寇。，2013）（Xu et al六水。俺孙，2015）能夠很好地工作辣卒，尤其是對(duì)于更高分辨率的建模。這與使用最大激活的原始GAN紙相反（Goodfellow等睛榄，2013）荣茫。

image

4 DETAILS OF ADVERSARIAL TRAINING

4不良訓(xùn)練的詳情

We trained DCGANs on three datasets, Large-scale Scene Understanding (LSUN) (Yu et al., 2015), Imagenet-1k and a newly assembled Faces dataset. Details on the usage of each of these datasets are given below.

我們?cè)谌齻€(gè)數(shù)據(jù)集（大規(guī)模場(chǎng)景理解（LSUN）（Yu等，2015）场靴，Imagenet-1k和新組裝的Faces數(shù)據(jù)集）上訓(xùn)練DCGAN啡莉。下面給出了每個(gè)數(shù)據(jù)集的使用細(xì)節(jié)。

No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1]. All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128. All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum term

image

at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training.

除了縮放至tanh激活函數(shù)[-1旨剥，1]的范圍外咧欣，沒(méi)有預(yù)處理應(yīng)用于訓(xùn)練圖像。所有模型均采用小批量隨機(jī)梯度下降（SGD）進(jìn)行培訓(xùn)轨帜，最小批量為128魄咕。所有權(quán)重均從零中心正態(tài)分布初始化，標(biāo)準(zhǔn)偏差為0.02蚌父。在LeakyReLU中哮兰，所有型號(hào)的泄漏斜率設(shè)置為0.2。盡管以前的GAN工作利用動(dòng)力來(lái)加速培訓(xùn)梢什，但我們使用了具有調(diào)整超參數(shù)的Adam優(yōu)化器（Kingma＆Ba奠蹬，2014）。我們發(fā)現(xiàn)建議的學(xué)習(xí)率為0.001嗡午，過(guò)高囤躁，使用0.0002代替。此外荔睹，我們發(fā)現(xiàn)將動(dòng)量項(xiàng)

image

保持在0.9的建議值狸演，導(dǎo)致訓(xùn)練振蕩和不穩(wěn)定性，同時(shí)將其降至0.5僻他，這有助于穩(wěn)定訓(xùn)練宵距。

image

*Figure 1: DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribution Z is projected to a small spatial extent convolutional representation with many feature maps. A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called deconvolutions) then convert this high level representation into a

image

pixel image. Notably, no fully connected or pooling layers are used.*

*圖1：用于LSUN場(chǎng)景建模的DCGAN發(fā)生器。100維均勻分布Z被投影到具有許多特征映射的小空間范圍卷積表示吨拗。一系列四個(gè)分步式卷積（在最近的一些論文中满哪，這些被錯(cuò)誤地稱(chēng)為反卷積），然后將這種高級(jí)表示轉(zhuǎn)換成

image

像素圖像劝篷。值得注意的是哨鸭，沒(méi)有使用完全連接或合并層。*

4.1 LSUN

As visual quality of samples from generative image models has improved, concerns of over-?tting and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples. Recent analysis has shown that there is a direct link between how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high quality samples via simply over?tting/memorizing training examples. No data augmentation was applied to the images.

隨著生成圖像模型樣本的視覺(jué)質(zhì)量的提高娇妓，培訓(xùn)樣本的覆蓋和記憶問(wèn)題日益突出像鸡。為了演示我們的模型如何隨著更多數(shù)據(jù)和更高分辨率的生成而擴(kuò)展，我們?cè)诎?00多萬(wàn)個(gè)訓(xùn)練樣例的LSUN臥室數(shù)據(jù)集上訓(xùn)練模型哈恰。最近的分析表明只估，模型學(xué)習(xí)的速度與泛化性能之間存在直接聯(lián)系（Hardt等干签，2015）尼酿。除了收斂后的樣本（圖3）颇玷，我們還展示了來(lái)自一個(gè)培訓(xùn)時(shí)期（圖2）的樣本感耙，模擬在線學(xué)習(xí)，以此來(lái)證明我們的模型不通過(guò)簡(jiǎn)單的過(guò)度訓(xùn)練/記憶培訓(xùn)生成高質(zhì)量樣本例子夸楣。沒(méi)有數(shù)據(jù)增加被應(yīng)用于圖像宾抓。

4.1.1 DEDUPLICATION

4.1.1重復(fù)使用

To further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We ?t a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32x32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving technique (Srivastava et al., 2014) and provides a convenient form of semantic-hashing, allowing for linear time de-duplication. Visual inspection of hash collisions showed high precision with an estimated false positive rate of less than 1 in 100. Additionally, the technique detected and removed approximately 275,000 near duplicates, suggesting a high recall.

為了進(jìn)一步降低生成器記憶輸入示例的可能性（圖2），我們執(zhí)行一個(gè)簡(jiǎn)單的圖像重復(fù)刪除過(guò)程豫喧。我們?cè)?2x32下采樣中心作物的訓(xùn)練實(shí)例中提供了一個(gè)3072-128-3072去噪退出正則化RELU自編碼器。然后通過(guò)對(duì)已被證明是有效的信息保存技術(shù)的ReLU激活進(jìn)行閾值化（Srivastava等人幢泼，2014）紧显，對(duì)得到的代碼層激活進(jìn)行二值化，并提供便利的語(yǔ)義哈希形式缕棵，從而實(shí)現(xiàn)線性時(shí)間重復(fù)刪除孵班。哈希碰撞的目視檢查顯示出高精度，估計(jì)誤報(bào)率小于100招驴。此外篙程，該技術(shù)檢測(cè)到并刪除了近275,000個(gè)重復(fù)數(shù)據(jù)，表明召回率很高别厘。

4.2 FACES

4.2面部

We scraped images containing human faces from random web image queries of peoples names. The people names were acquired from dbpedia, with a criterion that they were born in the modern era. This dataset has 3M images from 10K people. We run an OpenCV face detector on these images, keeping the detections that are suf?ciently high resolution, which gives us approximately 350,000 face boxes. We use these face boxes for training. No data augmentation was applied to the images.

我們從人物名稱(chēng)的隨機(jī)Web圖像查詢(xún)中刮取包含人臉的圖像虱饿。人名是從dbpedia獲得的，其標(biāo)準(zhǔn)是他們出生在現(xiàn)代時(shí)代触趴。該數(shù)據(jù)集包含來(lái)自10K人的3M圖像氮发。我們?cè)谶@些圖像上運(yùn)行OpenCV人臉檢測(cè)器，保持足夠高分辨率的檢測(cè)結(jié)果冗懦，這為我們提供了大約350,000個(gè)面部檢測(cè)盒爽冕。我們使用這些臉盒進(jìn)行訓(xùn)練。沒(méi)有數(shù)據(jù)增加被應(yīng)用于圖像披蕉。

image

Figure 2: Generated bedrooms after one training pass through the dataset. Theoretically, the model could learn to memorize training examples, but this is experimentally unlikely as we train with a small learning rate and minibatch SGD. We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate.

圖2：一次訓(xùn)練后產(chǎn)生的臥室通過(guò)數(shù)據(jù)集颈畸。從理論上講，該模型可以學(xué)習(xí)記憶訓(xùn)練實(shí)例没讲，但這在實(shí)驗(yàn)中不太可能眯娱，因?yàn)槲覀円孕W(xué)習(xí)率和小批量SGD訓(xùn)練。我們知道沒(méi)有先前的經(jīng)驗(yàn)證據(jù)表明用SGD和小的學(xué)習(xí)率記憶食零。

image

Figure 3: Generated bedrooms after ?ve epochs of training. There appears to be evidence of visual under-?tting via repeated noise textures across multiple samples such as the base boards of some of the beds.

圖3：經(jīng)過(guò)五個(gè)培訓(xùn)階段后的臥室困乒。似乎有證據(jù)表明通過(guò)多個(gè)樣品（例如某些床的基板）上的重復(fù)的噪音紋理，可能會(huì)造成視覺(jué)損傷贰谣。

4.3 IMAGENET-1K

We use Imagenet-1k (Deng et al., 2009) as a source of natural images for unsupervised training. We train on

image

min-resized center crops. No data augmentation was applied to the images.

我們使用Imagenet-1k（Deng et al娜搂。迁霎，2009）作為無(wú)監(jiān)督訓(xùn)練的自然圖像源。我們?cè)?div id="cmocsca" class="image-package">

image

最小尺寸的中心作物上進(jìn)行訓(xùn)練百宇。沒(méi)有數(shù)據(jù)增加被應(yīng)用于圖像考廉。

5 EMPIRICAL VALIDATION OF DCGANS CAPABILITIES

5 DCGANS能力的經(jīng)驗(yàn)驗(yàn)證

5.1 CLASSIFYING CIFAR-10 USING GANS AS A FEATURE EXTRACTOR

5.1使用GANS作為特征提取器對(duì)CIFAR-10進(jìn)行分類(lèi)

One common technique for evaluating the quality of unsupervised representation learning algorithms is to apply them as a feature extractor on supervised datasets and evaluate the performance of linear models ?tted on top of these features.

評(píng)估無(wú)監(jiān)督表示學(xué)習(xí)算法的質(zhì)量的一種常用技術(shù)是將它們用作受監(jiān)督數(shù)據(jù)集上的特征提取器，并評(píng)估在這些特征之上擬合的線性模型的性能携御。

On the CIFAR-10 dataset, a very strong baseline performance has been demonstrated from a well tuned single layer feature extraction pipeline utilizing K-means as a feature learning algorithm. When using a very large amount of feature maps (4800) this technique achieves 80.6% accuracy. An unsupervised multi-layered extension of the base algorithm reaches 82.0% accuracy (Coates & Ng, 2011). To evaluate the quality of the representations learned by DCGANs for supervised tasks, we train on Imagenet-1k and then use the discriminator’s convolutional features from all layers, maxpooling each layers representation to produce a

image

spatial grid. These features are then ?attened and concatenated to form a 28672 dimensional vector and a regularized linear L2-SVM classi?er is trained on top of them. This achieves 82.8% accuracy, out performing all K-means based approaches. Notably, the discriminator has many less feature maps (512 in the highest layer) compared to K-means based techniques, but does result in a larger total feature vector size due to the many layers of

image

spatial locations. The performance of DCGANs is still less than that of Exemplar CNNs (Dosovitskiy et al., 2015), a technique which trains normal discriminative CNNs in an unsupervised fashion to differentiate between speci?cally chosen, aggressively augmented, exemplar samples from the source dataset.Further improvements could be made by ?netuning the discriminator’s representations, but we leave this for future work. Additionally, since our DCGAN was never trained on CIFAR-10 this experiment also demonstrates the domain robustness of the learned features.

在CIFAR-10數(shù)據(jù)集上昌粤，從使用K-means作為特征學(xué)習(xí)算法的良好調(diào)諧的單層特征提取流水線中已經(jīng)證明了非常強(qiáng)的基線性能。當(dāng)使用大量的特征映射（4800）時(shí)啄刹，該技術(shù)的準(zhǔn)確性達(dá)到80.6％涮坐。基礎(chǔ)算法的無(wú)監(jiān)督多層擴(kuò)展達(dá)到了82.0％的準(zhǔn)確性（Coates＆Ng誓军，2011）袱讹。為了評(píng)估DCGAN為監(jiān)督任務(wù)學(xué)習(xí)的表示的質(zhì)量，我們?cè)贗magenet-1k上訓(xùn)練昵时，然后使用所有層的鑒別器的卷積特征捷雕，最大化每個(gè)層的表示以產(chǎn)生

image

空間網(wǎng)格。然后將這些特征平滑并連接起來(lái)形成一個(gè)28672維矢量壹甥，并在其上面訓(xùn)練一個(gè)正則化的線性L2-SVM分類(lèi)器救巷。除了執(zhí)行所有基于K-means的方法之外，這實(shí)現(xiàn)了82.8％的準(zhǔn)確度句柠。值得注意的是浦译，與基于K均值的技術(shù)相比，鑒別器具有許多較少的特征映射（最高層中的512）俄占，但由于

image

空間位置的許多層管怠，確實(shí)導(dǎo)致較大的總特征向量大小。DCGANs的性能仍然低于Exemplar CNN（Dosovitskiy等缸榄，2015）渤弛，該技術(shù)以無(wú)監(jiān)督的方式訓(xùn)練正常的區(qū)分性CNN，以區(qū)分源數(shù)據(jù)集中特定選擇的甚带，主動(dòng)增強(qiáng)的示例性樣本她肯。通過(guò)對(duì)鑒別器的表示進(jìn)行網(wǎng)絡(luò)化可以進(jìn)一步改進(jìn)，但我們將其留作未來(lái)工作鹰贵。此外晴氨，由于我們的DCGAN從未在CIFAR-10上進(jìn)行過(guò)培訓(xùn)，因此本實(shí)驗(yàn)還顯示了學(xué)習(xí)功能的域穩(wěn)健性碉输。

Table 1: CIFAR-10 classi?cation results using our pre-trained model. Our DCGAN is not pretrained on CIFAR-10, but on Imagenet-1k, and the features are used to classify CIFAR-10 images.

表1：使用我們的預(yù)先訓(xùn)練的模型的CIFAR-10分類(lèi)結(jié)果籽前。我們的DCGAN不是在CIFAR-10上預(yù)訓(xùn)練的，而是在Imagenet-1k上的，并且這些特征用于對(duì)CIFAR-10圖像進(jìn)行分類(lèi)枝哄。

image

5.2 CLASSIFYING SVHN DIGITS USING GANS AS A FEATURE EXTRACTOR

5.2使用GANS作為特征提取器來(lái)分類(lèi)SVHN數(shù)字

On the StreetView House Numbers dataset (SVHN)(Netzer et al., 2011), we use the features of the discriminator of a DCGAN for supervised purposes when labeled data is scarce. Following similar dataset preparation rules as in the CIFAR-10 experiments, we split off a validation set of 10,000 examples from the non-extra set and use it for all hyperparameter and model selection. 1000 uniformly class distributed training examples are randomly selected and used to train a regularized linear L2-SVM classi?er on top of the same feature extraction pipeline used for CIFAR-10. This achieves state of the art (for classi?cation using 1000 labels) at 22.48% test error, improving upon another modifcation of CNNs designed to leverage unlabled data (Zhao et al., 2015). Additionally, we validate that the CNN architecture used in DCGAN is not the key contributing factor of the model’s performance by training a purely supervised CNN with the same architecture on the same data and optimizing this model via random search over 64 hyperparameter trials (Bergstra & Bengio, 2012). It achieves a sign?cantly higher 28.87% validation error.

在StreetView House Numbers數(shù)據(jù)集（SVHN）（Netzer et al肄梨。，2011）中挠锥，當(dāng)標(biāo)記數(shù)據(jù)稀缺時(shí)众羡，我們將DCGAN的鑒別器的特性用于監(jiān)督目的。按照與CIFAR-10實(shí)驗(yàn)類(lèi)似的數(shù)據(jù)集準(zhǔn)備規(guī)則蓖租，我們從非額外集合中分離出10,000個(gè)實(shí)例的驗(yàn)證集粱侣，并將其用于所有超參數(shù)和模型選擇。隨機(jī)選擇1000個(gè)均勻分布的分布式訓(xùn)練樣本蓖宦，并用于在用于CIFAR-10的相同特征提取流水線之上訓(xùn)練一個(gè)正則化的線性L2-SVM分類(lèi)器齐婴。這實(shí)現(xiàn)了最先進(jìn)的技術(shù)（用1000個(gè)標(biāo)簽進(jìn)行分類(lèi)），測(cè)試誤差為22.48％稠茂，改進(jìn)了CNN的另一種修改尔店，旨在利用非標(biāo)記數(shù)據(jù)（Zhao et al。主慰，2015）。此外鲫售，我們通過(guò)在相同數(shù)據(jù)上訓(xùn)練具有相同架構(gòu)的純監(jiān)督CNN并通過(guò)隨機(jī)搜索優(yōu)化該模型超過(guò)64個(gè)超參數(shù)試驗(yàn)（Bergstra＆Bengio）共螺，驗(yàn)證DCGAN中使用的CNN架構(gòu)不是模型性能的關(guān)鍵貢獻(xiàn)因素，2012）情竹。它實(shí)現(xiàn)了高達(dá)28.87％的驗(yàn)證錯(cuò)誤藐不。

6 INVESTIGATING AND VISUALIZING THE INTERNALS OF THE NETWORKS

6調(diào)查和可視化網(wǎng)絡(luò)內(nèi)部

We investigate the trained generators and discriminators in a variety of ways. We do not do any kind of nearest neighbor search on the training set. Nearest neighbors in pixel or feature space are trivially fooled (Theis et al., 2015) by small image transforms. We also do not use log-likelihood metrics to quantitatively assess the model, as it is a poor (Theis et al., 2015) metric.

我們以各種方式調(diào)查受過(guò)訓(xùn)練的發(fā)生器和鑒別器。我們不在訓(xùn)練集上進(jìn)行任何類(lèi)型的最近鄰搜索秦效。通過(guò)小圖像變換雏蛮，像素或特征空間中最近的鄰居被平凡地愚弄（Theis et al。阱州，2015）挑秉。我們也不使用對(duì)數(shù)似然度量來(lái)定量評(píng)估模型，因?yàn)樗且粋€(gè)很差的（Theis et al苔货。犀概，2015）度量。

Table 2: SVHN classi?cation with 1000 labels

表2：具有1000個(gè)標(biāo)簽的SVHN分類(lèi)

image

6.1 WALKING IN THE LATENT SPACE

6.1在潛在空間中行走

The ?rst experiment we did was to understand the landscape of the latent space. Walking on the manifold that is learnt can usually tell us about signs of memorization (if there are sharp transitions) and about the way in which the space is hierarchically collapsed. If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations. The results are shown in Fig.4.

我們做的第一個(gè)實(shí)驗(yàn)是了解潛在空間的景觀夜惭。在學(xué)習(xí)的流形中行走通骋鲈睿可以告訴我們關(guān)于記憶的跡象（如果存在劇烈的過(guò)渡）以及空間分層崩潰的方式。如果在這個(gè)潛在空間中行走導(dǎo)致圖像世代發(fā)生語(yǔ)義變化（例如添加和刪除的對(duì)象）诈茧，我們可以推斷該模型已經(jīng)學(xué)習(xí)了相關(guān)的和有趣的表示产喉。結(jié)果如圖4所示。

6.2 VISUALIZING THE DISCRIMINATOR FEATURES

6.2可視化辨別器功能

Previous work has demonstrated that supervised training of CNNs on large image datasets results in very powerful learned features (Zeiler & Fergus, 2014). Additionally, supervised CNNs trained on scene classi?cation learn object detectors (Oquab et al., 2014). We demonstrate that an unsupervised DCGAN trained on a large image dataset can also learn a hierarchy of features that are interesting.Using guided backpropagation as proposed by (Springenberg et al., 2014), we show in Fig.5 that the features learnt by the discriminator activate on typical parts of a bedroom, like beds and windows. For comparison, in the same ?gure, we give a baseline for randomly initialized features that are not activated on anything that is semantically relevant or interesting.

以前的工作已經(jīng)證明，對(duì)大圖像數(shù)據(jù)集進(jìn)行有監(jiān)督的CNN培訓(xùn)會(huì)產(chǎn)生非常強(qiáng)大的學(xué)習(xí)功能（Zeiler＆Fergus曾沈，2014）这嚣。此外，受監(jiān)督的CNN在場(chǎng)景分類(lèi)方面進(jìn)行了培訓(xùn)晦譬，學(xué)習(xí)了物體探測(cè)器（Oquab等疤苹，2014）。我們證明在大圖像數(shù)據(jù)集上訓(xùn)練的無(wú)監(jiān)督DCGAN也可以學(xué)習(xí)有趣的功能層次結(jié)構(gòu)敛腌。使用（Springenberg et al卧土。，2014）提出的引導(dǎo)式反向傳播像樊，我們?cè)趫D5中顯示尤莺，鑒別器學(xué)習(xí)的特征在臥室的典型部分（如床和窗）上激活。為了比較生棍，在同一圖中颤霎，我們給出了隨機(jī)初始化特征的基線，這些特征在語(yǔ)義上相關(guān)或有趣的任何事物上都未被激活涂滴。

6.3 MANIPULATING THE GENERATOR REPRESENTATION

6.3操縱發(fā)電機(jī)代表

6.3.1 FORGETTING TO DRAW CERTAIN OBJECTS

6.3.1忘記吸取某些物體

In addition to the representations learnt by a discriminator, there is the question of what representations the generator learns. The quality of samples suggest that the generator learns speci?c object representations for major scene components such as beds, windows, lamps, doors, and miscellaneous furniture. In order to explore the form that these representations take, we conducted an experiment to attempt to remove windows from the generator completely.

除了鑒別者學(xué)習(xí)的表示之外友酱，還有一個(gè)關(guān)于生成器學(xué)習(xí)表示的問(wèn)題。樣本的質(zhì)量表明柔纵，發(fā)生器學(xué)習(xí)了主要場(chǎng)景組件的特定對(duì)象表示缔杉，例如床，窗戶(hù)搁料，燈或详，門(mén)和其他家具。為了探索這些表示所采用的形式郭计，我們進(jìn)行了一個(gè)試驗(yàn)霸琴，試圖從發(fā)生器中完全刪除窗口。

On 150 samples, 52 window bounding boxes were drawn manually. On the second highest convolution layer features, logistic regression was ?t to predict whether a feature activation was on a window (or not), by using the criterion that activations inside the drawn bounding boxes are positives and random samples from the same images are negatives. Using this simple model, all feature maps with weights greater than zero ( 200 in total) were dropped from all spatial locations. Then, random new samples were generated with and without the feature map removal.

在150個(gè)樣本上昭伸，手動(dòng)繪制了52個(gè)窗口邊界框梧乘。在第二高的卷積層特征上，邏輯回歸用于預(yù)測(cè)特征激活是否在窗口上（通過(guò)使用標(biāo)準(zhǔn)勋乾，即繪制的邊界框內(nèi)的激活是肯定的并且來(lái)自相同圖像的隨機(jī)樣本是否定的）宋下。使用這個(gè)簡(jiǎn)單模型，從所有空間位置刪除所有權(quán)重大于零（總共200個(gè)）的特征地圖辑莫。然后学歧，在有和沒(méi)有去除特征圖的情況下生成隨機(jī)新樣本。

The generated images with and without the window dropout are shown in Fig.6, and interestingly, the network mostly forgets to draw windows in the bedrooms, replacing them with other objects.

圖6顯示了帶有或不帶有窗口丟失的生成圖像各吨，并且有趣的是枝笨，網(wǎng)絡(luò)大多忘記在臥室中繪制窗戶(hù)袁铐，用其他物體代替它們。

image

Figure 4: Top rows: Interpolation between a series of 9 random points in Z show that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom. In the 6th row, you see a room without a window slowly transforming into a room with a giant window. In the 10th row, you see what appears to be a TV slowly being transformed into a window.

圖4：頂行：Z中的一系列9個(gè)隨機(jī)點(diǎn)之間的插值表明横浑，學(xué)習(xí)的空間具有平滑的過(guò)渡剔桨，空間中的每個(gè)圖像看起來(lái)都像一間臥室。在第六排徙融，你看到一個(gè)沒(méi)有窗戶(hù)的房間慢慢變成一個(gè)有巨大窗戶(hù)的房間洒缀。在第十行中，你會(huì)看到電視正慢慢變成一扇窗戶(hù)欺冀。

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

6.3.2矢量在面部樣本上的算術(shù)運(yùn)算

In the context of evaluating learned representations of words (Mikolov et al., 2013) demonstrated that simple arithmetic operations revealed rich linear structure in representation space. One canonical example demonstrated that the vector(”King”) - vector(”Man”) + vector(”Woman”) resulted in a vector whose nearest neighbor was the vector for Queen. We investigated whether similar structure emerges in the Z representation of our generators. We performed similar arithmetic on the Z vectors of sets of exemplar samples for visual concepts. Experiments working on only single samples per concept were unstable, but averaging the Z vector for three examplars showed consistent and stable generations that semantically obeyed the arithmetic. In addition to the object manipulation shown in (Fig. 7), we demonstrate that face pose is also modeled linearly in Z space (Fig. 8).

在評(píng)估詞匯的學(xué)習(xí)表征（Mikolov等树绩，2013）中，證明了簡(jiǎn)單的算術(shù)運(yùn)算揭示了表征空間中豐富的線性結(jié)構(gòu)隐轩。一個(gè)典型的例子表明饺饭，矢量（“國(guó)王”） - 矢量（“人”）+矢量（“女人”）產(chǎn)生了一個(gè)矢量，其最近的鄰居是女王的矢量职车。我們調(diào)查了在我們的發(fā)電機(jī)的Z表示中是否出現(xiàn)類(lèi)似的結(jié)構(gòu)瘫俊。我們對(duì)視覺(jué)概念的示例樣本集的Z向量執(zhí)行類(lèi)似的算術(shù)。每個(gè)概念僅對(duì)單個(gè)樣本進(jìn)行實(shí)驗(yàn)的實(shí)驗(yàn)是不穩(wěn)定的悴灵，但對(duì)三個(gè)樣本的平均Z向量顯示了語(yǔ)義上服從算術(shù)的一致且穩(wěn)定的世代扛芽。除了（圖7）所示的對(duì)象操作外，我們還證明了在Z空間中線性模擬人臉姿態(tài)（圖8）积瞒。

These demonstrations suggest interesting applications can be developed using Z representations learned by our models. It has been previously demonstrated that conditional generative models can learn to convincingly model object attributes like scale, rotation, and position (Dosovitskiy et al., 2014). This is to our knowledge the ?rst demonstration of this occurring in purely unsupervised models. Further exploring and developing the above mentioned vector arithmetic could dramatically reduce the amount of data needed for conditional generative modeling of complex image distributions.

這些演示表明可以使用我們的模型學(xué)習(xí)到的Z表示來(lái)開(kāi)發(fā)有趣的應(yīng)用程序胸哥。先前已經(jīng)證明，條件生成模型可以學(xué)會(huì)令人信服地模擬對(duì)象屬性赡鲜，如規(guī)模，旋轉(zhuǎn)和位置（Dosovitskiy et al庐船。银酬，2014）。這是我們的知識(shí)筐钟，這是純粹無(wú)監(jiān)督模型中的第一次演示揩瞪。進(jìn)一步探索和開(kāi)發(fā)上述向量算法可以顯著減少?gòu)?fù)雜圖像分布的條件生成建模所需的數(shù)據(jù)量。

image

Figure 5: On the right, guided backpropagation visualizations of maximal axis-aligned responses for the ?rst 6 learned convolutional features from the last convolution layer in the discriminator. Notice a signi?cant minority of features respond to beds - the central object in the LSUN bedrooms dataset. On the left is a random ?lter baseline. Comparing to the previous responses there is little to no discrimination and random structure.

圖5：在右側(cè)篓冲，針對(duì)來(lái)自鑒別器中最后卷積層的前6個(gè)學(xué)習(xí)卷積特征的最大軸對(duì)齊響應(yīng)的反向傳播可視化李破。注意一些重要特征對(duì)床的響應(yīng) - LSUN臥室數(shù)據(jù)集中的中心對(duì)象。左邊是一個(gè)隨機(jī)過(guò)濾器基線壹将。與之前的回應(yīng)相比嗤攻，幾乎沒(méi)有歧視和隨機(jī)結(jié)構(gòu)。

image

Figure 6: Top row: un-modi?ed samples from model. Bottom row: the same samples generated with dropping out ”window” ?lters. Some windows are removed, others are transformed into objects with similar visual appearance such as doors and mirrors. Although visual quality decreased, overall scene composition stayed similar, suggesting the generator has done a good job disentangling scene representation from object representation. Extended experiments could be done to remove other objects from the image and modify the objects the generator draws.

圖6：頂行：來(lái)自模型的未修改樣本诽俯。底行：通過(guò)刪除“窗口”過(guò)濾器生成相同的樣本妇菱。有些窗戶(hù)被拆除，其他窗戶(hù)被轉(zhuǎn)換成具有類(lèi)似視覺(jué)外觀的物體，如門(mén)和鏡子闯团。盡管視覺(jué)質(zhì)量下降辛臊，但整體場(chǎng)景構(gòu)成保持相似，這表明生成器已經(jīng)從對(duì)象表示中很好地解開(kāi)了場(chǎng)景表示房交〕菇ⅲ可以進(jìn)行擴(kuò)展實(shí)驗(yàn)來(lái)從圖像中移除其他對(duì)象并修改生成器繪制的對(duì)象。

7 CONCLUSION AND FUTURE WORK

7結(jié)論和未來(lái)工作

We propose a more stable set of architectures for training generative adversarial networks and we give evidence that adversarial networks learn good representations of images for supervised learning and generative modeling. There are still some forms of model instability remaining - we noticed as models are trained longer they sometimes collapse a subset of ?lters to a single oscillating mode.

我們提出了一套更穩(wěn)定的架構(gòu)來(lái)訓(xùn)練生成對(duì)抗網(wǎng)絡(luò)候味，并且我們給出證據(jù)表明敵對(duì)網(wǎng)絡(luò)學(xué)習(xí)了監(jiān)督學(xué)習(xí)和生成建模的良好圖像表示刃唤。仍然存在一些形式的模型不穩(wěn)定性 - 我們注意到隨著模型訓(xùn)練時(shí)間更長(zhǎng)，它們有時(shí)會(huì)將一部分濾波器折疊成單個(gè)振蕩模式负溪。

Figure 7: Vector arithmetic for visual concepts. For each column, the Z vectors of samples are averaged.Arithmetic was then performed on the mean vectors creating a new vector Y. The center sample on the right hand side is produce by feeding Y as input to the generator. To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale +-0.25 was added to Y to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples) results in noisy overlap due to misalignment.

圖7：視覺(jué)概念的矢量算法透揣。對(duì)于每列，對(duì)樣本的Z向量進(jìn)行平均川抡。然后對(duì)均值向量進(jìn)行算術(shù)運(yùn)算辐真，創(chuàng)建一個(gè)新的向量Y.右側(cè)的中心樣品是通過(guò)將Y作為輸入發(fā)送到發(fā)生器而生產(chǎn)的。為了演示發(fā)生器的內(nèi)插能力崖堤，將采用比例+ -0.25采樣的均勻噪聲添加到Y(jié)以產(chǎn)生另外8個(gè)采樣侍咱。在輸入空間中應(yīng)用算術(shù)（下面的兩個(gè)示例）會(huì)導(dǎo)致由于未對(duì)齊而產(chǎn)生的噪音重疊。

Further work is needed to tackle this from of instability. We think that extending this framework to other domains such as video (for frame prediction) and audio (pre-trained features for speech synthesis) should be very interesting. Further investigations into the properties of the learnt latent space would be interesting as well.

需要進(jìn)一步的工作來(lái)解決不穩(wěn)定因素密幔。我們認(rèn)為將這個(gè)框架擴(kuò)展到視頻（用于幀預(yù)測(cè)）和音頻（用于語(yǔ)音合成的預(yù)先訓(xùn)練的特征）等其他領(lǐng)域應(yīng)該是非常有趣的楔脯。對(duì)學(xué)習(xí)的潛在空間的屬性的進(jìn)一步研究也會(huì)很有趣。

image

Figure 8: A ”turn” vector was created from four averaged samples of faces looking left vs looking right.By adding interpolations along this axis to random samples we were able to reliably transform their pose.

圖8：一個(gè)“轉(zhuǎn)向”矢量是從四個(gè)平均的面向左看與右看樣本創(chuàng)建的胯甩。通過(guò)沿這個(gè)軸插入隨機(jī)樣本昧廷，我們能夠可靠地轉(zhuǎn)換它們的姿態(tài)。

文章引用于 http://tongtianta.site/paper/351
編輯 Lornatang
校準(zhǔn) Lornatang

最后編輯于：2019.04.09 22:21:37

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末偎箫，一起剝皮案震驚了整個(gè)濱河市木柬，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌淹办，老刑警劉巖眉枕，帶你破解...
沈念sama閱讀 217,509評(píng)論 6贊 504
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異怜森，居然都是意外死亡速挑，警方通過(guò)查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,806評(píng)論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)副硅，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)姥宝，“玉大人，你說(shuō)我怎么就攤上這事恐疲×媸冢” “怎么了断序？”我有些...
開(kāi)封第一講書(shū)人閱讀 163,875評(píng)論 0贊 354
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)糜烹。經(jīng)常有香客問(wèn)我违诗，道長(zhǎng)，這世上最難降的妖魔是什么疮蹦？我笑而不...
開(kāi)封第一講書(shū)人閱讀 58,441評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任诸迟，我火速辦了婚禮，結(jié)果婚禮上愕乎，老公的妹妹穿的比我還像新娘阵苇。我一直安慰自己，他們只是感情好感论，可當(dāng)我...
茶點(diǎn)故事閱讀 67,488評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布绅项。她就那樣靜靜地躺著，像睡著了一般比肄。火紅的嫁衣襯著肌膚如雪快耿。梳的紋絲不亂的頭發(fā)上，一...
開(kāi)封第一講書(shū)人閱讀 51,365評(píng)論 1贊 302
城市分裂傳說(shuō)
那天芳绩，我揣著相機(jī)與錄音掀亥，去河邊找鬼。笑死妥色，一個(gè)胖子當(dāng)著我的面吹牛搪花，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播嘹害，決...
沈念sama閱讀 40,190評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼撮竿，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了笔呀？” 一聲冷哼從身側(cè)響起倚聚，我...
開(kāi)封第一講書(shū)人閱讀 39,062評(píng)論 0贊 276
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎凿可，沒(méi)想到半個(gè)月后，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體授账，經(jīng)...
沈念sama閱讀 45,500評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡枯跑，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,706評(píng)論 3贊 335
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了白热。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片敛助。...
茶點(diǎn)故事閱讀 39,834評(píng)論 1贊 347
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖屋确，靈堂內(nèi)的尸體忽然破棺而出纳击，到底是詐尸還是另有隱情续扔，我是刑警寧澤，帶...
沈念sama閱讀 35,559評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布焕数，位于F島的核電站纱昧，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏堡赔。R本人自食惡果不足惜识脆，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,167評(píng)論 3贊 328
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望善已。院中可真熱鬧灼捂，春花似錦、人聲如沸换团。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 31,779評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)艘包。三九已至的猛，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間辑甜，已是汗流浹背衰絮。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 32,912評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留磷醋，地道東北人猫牡。一個(gè)月前我還...
沈念sama閱讀 47,958評(píng)論 2贊 370
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像邓线，于是被迫代替她去往敵國(guó)和親淌友。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,779評(píng)論 2贊 354

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks翻譯

推薦閱讀更多精彩內(nèi)容