Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
深度卷積生成敵對(duì)網(wǎng)絡(luò)無(wú)監(jiān)督表示學(xué)習(xí)
論文:http://arxiv.org/pdf/1511.06434v2.pdf
ABSTRACT
摘要
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.
近年來(lái),卷積網(wǎng)絡(luò)(CNN)的監(jiān)督式學(xué)習(xí)在計(jì)算機(jī)視覺(jué)應(yīng)用中得到了廣泛的應(yīng)用澎剥。相比之下唬党,無(wú)監(jiān)督的CNN學(xué)習(xí)受到的關(guān)注較少扮宠。在這項(xiàng)工作中丹诀,我們希望能夠幫助彌合有監(jiān)督學(xué)習(xí)的CNN成功與無(wú)監(jiān)督學(xué)習(xí)之間的差距柏卤。我們引入了一類(lèi)稱(chēng)為深度卷積生成對(duì)抗網(wǎng)絡(luò)(CNG)的類(lèi)县貌,它具有一定的架構(gòu)約束缸夹,并證明它們是非監(jiān)督學(xué)習(xí)的有力候選痪寻。對(duì)各種圖像數(shù)據(jù)集進(jìn)行訓(xùn)練,我們展示出令人信服的證據(jù)虽惭,證明我們深層卷積對(duì)抗對(duì)從發(fā)生器和鑒別器中的對(duì)象部分到場(chǎng)景學(xué)習(xí)了表示層次橡类。此外,我們使用學(xué)習(xí)的功能進(jìn)行新穎的任務(wù) - 證明其作為一般圖像表示的適用性芽唇。
1 INTRODUCTION
1引言
Learning reusable feature representations from large unlabeled datasets has been an area of active research. In the context of computer vision, one can leverage the practically unlimited amount of unlabeled images and videos to learn good intermediate representations, which can then be used on a variety of supervised learning tasks such as image classi?cation. We propose that one way to build good image representations is by training Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), and later reusing parts of the generator and discriminator networks as feature extractors for supervised tasks. GANs provide an attractive alternative to maximum likelihood techniques. One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs. There has been very limited published research in trying to understand and visualize what GANs learn, and the intermediate representations of multi-layer GANs.
從大型未標(biāo)記數(shù)據(jù)集學(xué)習(xí)可重用特征表示一直是一個(gè)積極研究的領(lǐng)域顾画。在計(jì)算機(jī)視覺(jué)的背景下,人們可以利用實(shí)際上無(wú)限量的未標(biāo)記圖像和視頻來(lái)學(xué)習(xí)良好的中間表示匆笤,然后可以將其用于各種監(jiān)督學(xué)習(xí)任務(wù)研侣,如圖像分類(lèi)。我們提出建立良好圖像表示的一種方法是通過(guò)對(duì)生成敵對(duì)網(wǎng)絡(luò)(GAN)進(jìn)行訓(xùn)練(Goodfellow等人炮捧,2014)庶诡,并且隨后將生成器和鑒別器網(wǎng)絡(luò)的部分重用為監(jiān)督任務(wù)的特征提取器。GAN為最大似然技術(shù)提供了一個(gè)有吸引力的替代方案咆课。人們還可以爭(zhēng)辯說(shuō)末誓,他們的學(xué)習(xí)過(guò)程和缺乏啟發(fā)式成本函數(shù)(如像素方式的獨(dú)立均方誤差)對(duì)表示學(xué)習(xí)很有吸引力。據(jù)了解书蚪,GAN在訓(xùn)練中不穩(wěn)定喇澡,往往導(dǎo)致產(chǎn)生無(wú)意義輸出的發(fā)電機(jī)。在嘗試?yán)斫夂涂梢暬疓AN學(xué)習(xí)的內(nèi)容以及多層GAN的中間表示方面殊校,發(fā)表的研究非常有限晴玖。
In this paper, we make the following contributions
在本文中,我們做出以下貢獻(xiàn)
? We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to train in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN)
?我們提出并評(píng)估了一系列對(duì)卷積GAN的架構(gòu)拓?fù)涞募s束條件为流,這些約束條件使得它們?cè)诖蠖鄶?shù)環(huán)境中都能夠穩(wěn)定地進(jìn)行訓(xùn)練窜醉。我們將這類(lèi)架構(gòu)命名為Deep Convolutional GAN(DCGAN)
? We use the trained discriminators for image classi?cation tasks, showing competitive performance with other unsupervised algorithms.
?我們使用訓(xùn)練過(guò)的鑒別器進(jìn)行圖像分類(lèi)任務(wù),顯示與其他無(wú)監(jiān)督算法的競(jìng)爭(zhēng)性能艺谆。
? We visualize the ?lters learnt by GANs and empirically show that speci?c ?lters have learned to draw speci?c objects.
?我們將由GAN學(xué)習(xí)的濾波器可視化,并憑經(jīng)驗(yàn)顯示特定的濾波器已經(jīng)學(xué)會(huì)了繪制特定的對(duì)象拜英。
? We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.
?我們證明生成器具有有趣的矢量算術(shù)屬性静汤,可以輕松處理生成的樣本的許多語(yǔ)義質(zhì)量。
2 RELATED WORK
2相關(guān)工作
2.1 REPRESENTATION LEARNING FROM UNLABELED DATA
2.1表示從UNLABELED數(shù)據(jù)中學(xué)習(xí)
Unsupervised representation learning is a fairly well studied problem in general computer vision research, as well as in the context of images. A classic approach to unsupervised representation learning is to do clustering on the data (for example using K-means), and leverage the clusters for improved classi?cation scores. In the context of images, one can do hierarchical clustering of image patches (Coates & Ng, 2012) to learn powerful image representations. Another popular method is to train auto-encoders (convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)) that encode an image into a compact code, and decode the code to reconstruct the image as accurately as possible. These methods have also been shown to learn good feature representations from image pixels. Deep belief networks (Lee et al., 2009) have also been shown to work well in learning hierarchical representations.
無(wú)監(jiān)督表示學(xué)習(xí)在計(jì)算機(jī)視覺(jué)一般性研究中以及在圖像上下文中是一個(gè)相當(dāng)好的研究問(wèn)題。無(wú)監(jiān)督表示學(xué)習(xí)的經(jīng)典方法是對(duì)數(shù)據(jù)進(jìn)行聚類(lèi)(例如使用K均值)虫给,并利用聚類(lèi)提高分類(lèi)分?jǐn)?shù)藤抡。在圖像上下文中,可以對(duì)圖像塊進(jìn)行分層聚類(lèi)(Coates&Ng抹估,2012)缠黍,以學(xué)習(xí)強(qiáng)大的圖像表示。另一種流行的方法是訓(xùn)練自動(dòng)編碼器(卷積药蜻,堆疊(Vincent et al瓷式。,2010)语泽,將代碼的內(nèi)容和組成部分分開(kāi)(Zhao et al贸典。,2015)踱卵,階梯結(jié)構(gòu)(Rasmus等廊驼,2015) )將圖像編碼成緊湊的代碼,并對(duì)代碼進(jìn)行解碼以盡可能準(zhǔn)確地重建圖像惋砂。這些方法也被證明可以從圖像像素學(xué)習(xí)好的特征表示妒挎。深度信念網(wǎng)絡(luò)(Lee et al。西饵,2009)也被證明在學(xué)習(xí)分層表示方面效果很好酝掩。
2.2 GENERATING NATURAL IMAGES
2.2生成自然圖像
Generative image models are well studied and fall into two categories: parametric and nonparametric.
生成圖像模型已經(jīng)過(guò)很好的研究,分為兩類(lèi):參數(shù)化和非參數(shù)化罗标。
The non-parametric models often do matching from a database of existing images, often matching patches of images, and have been used in texture synthesis (Efros et al., 1999), super-resolution (Freeman et al., 2002) and in-painting (Hays & Efros, 2007).
非參數(shù)模型通常與現(xiàn)有圖像的數(shù)據(jù)庫(kù)進(jìn)行匹配庸队,通常匹配圖像塊,并且已經(jīng)用于紋理合成(Efros等人闯割,1999)彻消,超分辨率(Freeman等人,2002)和 - 繪畫(huà)(Hays&Efros宙拉,2007)宾尚。
Parametric models for generating images has been explored extensively (for example on MNIST digits or for texture synthesis (Portilla & Simoncelli, 2000)). However, generating natural images of the real world have had not much success until recently. A variational sampling approach to generating images (Kingma & Welling, 2013) has had some success, but the samples often suffer from being blurry.Another approach generates images using an iterative forward diffusion process (Sohl-Dickstein et al., 2015). Generative Adversarial Networks (Goodfellow et al., 2014) generated images suffering from being noisy and incomprehensible. A laplacian pyramid extension to this approach (Denton et al., 2015) showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models. A recurrent network approach (Gregor et al., 2015) and a deconvolution network approach (Dosovitskiy et al., 2014) have also recently had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.
用于生成圖像的參數(shù)模型已被廣泛探索(例如MNIST數(shù)字或紋理合成(Portilla&Simoncelli,2000))谢澈。然而煌贴,直到最近,生成真實(shí)世界的自然圖像并沒(méi)有取得太大的成功锥忿。用于生成圖像的變分抽樣方法(Kingma&Welling牛郑,2013)取得了一些成功,但樣本經(jīng)常遭受模糊敬鬓。另一種方法使用迭代正向擴(kuò)散過(guò)程生成圖像(Sohl-Dickstein等淹朋,2015)笙各。生成敵對(duì)網(wǎng)絡(luò)(Goodfellow et al。础芍,2014)生成的圖像嘈雜和難以理解杈抢。這種方法的拉普拉斯金字塔延伸(Denton等人,2015)顯示出更高質(zhì)量的圖像仑性,但由于鏈接多個(gè)模型中引入的噪聲惶楼,它們?nèi)匀皇艿轿矬w晃動(dòng)的影響。經(jīng)常性網(wǎng)絡(luò)方法(Gregor等诊杆,2015)和去卷積網(wǎng)絡(luò)方法(Dosovitskiy et al歼捐。,2014)最近也在生成自然圖像方面取得了一些成功刽辙。但是窥岩,他們沒(méi)有將發(fā)電機(jī)用于監(jiān)督任務(wù)。
2.3 VISUALIZING THE INTERNALS OF CNNS
2.3可視化CNNS的內(nèi)部
One constant criticism of using neural networks has been that they are black-box methods, with little understanding of what the networks do in the form of a simple human-consumable algorithm. In the context of CNNs, Zeiler et. al. (Zeiler & Fergus, 2014) showed that by using deconvolutions and ?ltering the maximal activations, one can ?nd the approximate purpose of each convolution ?lter in the network. Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of ?lters (Mordvintsev et al.).
對(duì)使用神經(jīng)網(wǎng)絡(luò)的一個(gè)不斷批評(píng)是它們是黑盒子方法宰缤,幾乎不了解網(wǎng)絡(luò)以簡(jiǎn)單的人類(lèi)可消費(fèi)算法的形式做什么颂翼。在CNN的情況下,Zeiler et慨灭。人朦乏。 (Zeiler&Fergus,2014)表明氧骤,通過(guò)使用反卷積和過(guò)濾最大激活呻疹,可以找出網(wǎng)絡(luò)中每個(gè)卷積濾波器的近似目的。類(lèi)似地筹陵,在輸入上使用梯度下降可以讓我們檢查激活某些過(guò)濾器子集的理想圖像(Mordvintsev等人)刽锤。
3 APPROACH AND MODEL ARCHITECTURE
3方法和模型體系結(jié)構(gòu)
Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. This motivated the authors of LAPGAN (Denton et al., 2015) to develop an alternative approach to iteratively upscale low resolution generated images which can be modeled more reliably. We also encountered dif?culties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identi?ed a family of archi
使用CNN擴(kuò)展GAN來(lái)模擬圖像的歷史嘗試已經(jīng)失敗。這促使LAPGAN的作者(Denton等人朦佩,2015)開(kāi)發(fā)了一種替代方法來(lái)迭代地提高可以更可靠地建模的低分辨率生成圖像并思。我們也遇到了困難,試圖使用監(jiān)督文獻(xiàn)中常用的CNN架構(gòu)來(lái)規(guī)挠锍恚化GAN宋彼。然而,經(jīng)過(guò)廣泛的模型探索后仙畦,我們確定了一個(gè)archi系列
tectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.
通過(guò)一系列數(shù)據(jù)集進(jìn)行穩(wěn)定培訓(xùn)并允許培訓(xùn)更高分辨率和更深層次的生成模型输涕。
Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.
我們的方法的核心是采納和修改最近對(duì)CNN架構(gòu)進(jìn)行的三項(xiàng)變更。
The ?rst is the all convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling. We use this approach in our generator, allowing it to learn its own spatial upsampling, and discriminator.
第一個(gè)是全卷積網(wǎng)絡(luò)(Springenberg et al慨畸。莱坎,2014),它用逐步卷積代替確定性空間匯聚函數(shù)(如maxpooling)寸士,允許網(wǎng)絡(luò)學(xué)習(xí)它自己的空間下采樣型奥。我們?cè)谖覀兊纳善髦惺褂眠@種方法瞳收,允許它學(xué)習(xí)它自己的空間上采樣和鑒別器。
Second is the trend towards eliminating fully connected layers on top of convolutional features. The strongest example of this is global average pooling which has been utilized in state of the art image classi?cation models (Mordvintsev et al.). We found global average pooling increased model stability but hurt convergence speed. A middle ground of directly connecting the highest convolutional features to the input and output respectively of the generator and discriminator worked well. The ?rst layer of the GAN, which takes a uniform noise distribution Z as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is ?attened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.
其次是消除卷積特性之上的完全連接層的趨勢(shì)厢汹。這方面最強(qiáng)有力的例子就是全球平均匯集技術(shù),這種技術(shù)已經(jīng)應(yīng)用于最先進(jìn)的圖像分類(lèi)模型(Mordvintsev等人)谐宙。我們發(fā)現(xiàn)全球平均匯聚增加了模型的穩(wěn)定性烫葬,但卻傷害了收斂速度將最高卷積特征直接連接到發(fā)生器和鑒別器的輸入和輸出的中間地帶運(yùn)行良好。GAN的第一層以統(tǒng)一的噪聲分布Z作為輸入凡蜻,可以稱(chēng)為完全連接搭综,因?yàn)樗皇且粋€(gè)矩陣乘法,但結(jié)果被重新整形為四維張量并用作卷積棧的起點(diǎn)划栓。對(duì)于鑒別器兑巾,最后的卷積層被抖動(dòng),然后被饋送到單個(gè)sigmoid輸出中忠荞。有關(guān)示例模型體系結(jié)構(gòu)的可視化蒋歌,請(qǐng)參見(jiàn)圖1。
Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient ?ow in deeper models. This proved critical to get deep generators to begin learning, preventing the generator from collapsing all samples to a single point which is a common failure mode observed in GANs. Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.
第三是批量標(biāo)準(zhǔn)化(Ioffe&Szegedy委煤,2015)堂油,通過(guò)將每個(gè)單元的輸入標(biāo)準(zhǔn)化為零均值和單位差異來(lái)穩(wěn)定學(xué)習(xí)。這有助于處理由于初始化較差而出現(xiàn)的培訓(xùn)問(wèn)題碧绞,并幫助深層模型中的漸變流府框。這對(duì)于讓深層發(fā)生器開(kāi)始學(xué)習(xí)非常重要,可以防止發(fā)生器將所有樣品壓縮到單個(gè)點(diǎn)讥邻,這是GAN中觀察到的常見(jiàn)故障模式迫靖。然而,直接將蝙蝠applying applying應(yīng)用于所有層兴使,導(dǎo)致樣品振蕩和模型不穩(wěn)定系宜。這是通過(guò)不將蝙蝠chnorm應(yīng)用于發(fā)生器輸出層和鑒別器輸入層而避免的。
The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky recti?ed activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013).
ReLU激活(Nair&Hinton鲫惶,2010)用于發(fā)生器蜈首,但使用Tanh函數(shù)的輸出層除外。我們觀察到欠母,使用有界激活可使模型更快地學(xué)習(xí)欢策,以飽和并覆蓋訓(xùn)練分布的色彩空間。在鑒別器內(nèi)部赏淌,我們發(fā)現(xiàn)泄漏整流激活(Maas et al踩寇。,2013)(Xu et al六水。俺孙,2015)能夠很好地工作辣卒,尤其是對(duì)于更高分辨率的建模。這與使用最大激活的原始GAN紙相反(Goodfellow等睛榄,2013)荣茫。
4 DETAILS OF ADVERSARIAL TRAINING
4不良訓(xùn)練的詳情
We trained DCGANs on three datasets, Large-scale Scene Understanding (LSUN) (Yu et al., 2015), Imagenet-1k and a newly assembled Faces dataset. Details on the usage of each of these datasets are given below.
我們?cè)谌齻€(gè)數(shù)據(jù)集(大規(guī)模場(chǎng)景理解(LSUN)(Yu等,2015)场靴,Imagenet-1k和新組裝的Faces數(shù)據(jù)集)上訓(xùn)練DCGAN啡莉。下面給出了每個(gè)數(shù)據(jù)集的使用細(xì)節(jié)。
No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1]. All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128. All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum termat the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training.
除了縮放至tanh激活函數(shù)[-1旨剥,1]的范圍外咧欣,沒(méi)有預(yù)處理應(yīng)用于訓(xùn)練圖像。所有模型均采用小批量隨機(jī)梯度下降(SGD)進(jìn)行培訓(xùn)轨帜,最小批量為128魄咕。所有權(quán)重均從零中心正態(tài)分布初始化,標(biāo)準(zhǔn)偏差為0.02蚌父。在LeakyReLU中哮兰,所有型號(hào)的泄漏斜率設(shè)置為0.2。盡管以前的GAN工作利用動(dòng)力來(lái)加速培訓(xùn)梢什,但我們使用了具有調(diào)整超參數(shù)的Adam優(yōu)化器(Kingma&Ba奠蹬,2014)。我們發(fā)現(xiàn)建議的學(xué)習(xí)率為0.001嗡午,過(guò)高囤躁,使用0.0002代替。此外荔睹,我們發(fā)現(xiàn)將動(dòng)量項(xiàng)保持在0.9的建議值狸演,導(dǎo)致訓(xùn)練振蕩和不穩(wěn)定性,同時(shí)將其降至0.5僻他,這有助于穩(wěn)定訓(xùn)練宵距。
pixel image. Notably, no fully connected or pooling layers are used.*
*圖1:用于LSUN場(chǎng)景建模的DCGAN發(fā)生器。100維均勻分布Z被投影到具有許多特征映射的小空間范圍卷積表示吨拗。一系列四個(gè)分步式卷積(在最近的一些論文中满哪,這些被錯(cuò)誤地稱(chēng)為反卷積),然后將這種高級(jí)表示轉(zhuǎn)換成像素圖像劝篷。值得注意的是哨鸭,沒(méi)有使用完全連接或合并層。*
4.1 LSUN
4.1 LSUN
As visual quality of samples from generative image models has improved, concerns of over-?tting and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples. Recent analysis has shown that there is a direct link between how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high quality samples via simply over?tting/memorizing training examples. No data augmentation was applied to the images.
隨著生成圖像模型樣本的視覺(jué)質(zhì)量的提高娇妓,培訓(xùn)樣本的覆蓋和記憶問(wèn)題日益突出像鸡。為了演示我們的模型如何隨著更多數(shù)據(jù)和更高分辨率的生成而擴(kuò)展,我們?cè)诎?00多萬(wàn)個(gè)訓(xùn)練樣例的LSUN臥室數(shù)據(jù)集上訓(xùn)練模型哈恰。最近的分析表明只估,模型學(xué)習(xí)的速度與泛化性能之間存在直接聯(lián)系(Hardt等干签,2015)尼酿。除了收斂后的樣本(圖3)颇玷,我們還展示了來(lái)自一個(gè)培訓(xùn)時(shí)期(圖2)的樣本感耙,模擬在線學(xué)習(xí),以此來(lái)證明我們的模型不通過(guò)簡(jiǎn)單的過(guò)度訓(xùn)練/記憶培訓(xùn)生成高質(zhì)量樣本例子夸楣。沒(méi)有數(shù)據(jù)增加被應(yīng)用于圖像宾抓。
4.1.1 DEDUPLICATION
4.1.1重復(fù)使用
To further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We ?t a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32x32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving technique (Srivastava et al., 2014) and provides a convenient form of semantic-hashing, allowing for linear time de-duplication. Visual inspection of hash collisions showed high precision with an estimated false positive rate of less than 1 in 100. Additionally, the technique detected and removed approximately 275,000 near duplicates, suggesting a high recall.
為了進(jìn)一步降低生成器記憶輸入示例的可能性(圖2),我們執(zhí)行一個(gè)簡(jiǎn)單的圖像重復(fù)刪除過(guò)程豫喧。我們?cè)?2x32下采樣中心作物的訓(xùn)練實(shí)例中提供了一個(gè)3072-128-3072去噪退出正則化RELU自編碼器。然后通過(guò)對(duì)已被證明是有效的信息保存技術(shù)的ReLU激活進(jìn)行閾值化(Srivastava等人幢泼,2014)紧显,對(duì)得到的代碼層激活進(jìn)行二值化,并提供便利的語(yǔ)義哈希形式缕棵,從而實(shí)現(xiàn)線性時(shí)間重復(fù)刪除孵班。哈希碰撞的目視檢查顯示出高精度,估計(jì)誤報(bào)率小于100招驴。此外篙程,該技術(shù)檢測(cè)到并刪除了近275,000個(gè)重復(fù)數(shù)據(jù),表明召回率很高别厘。
4.2 FACES
4.2面部
We scraped images containing human faces from random web image queries of peoples names. The people names were acquired from dbpedia, with a criterion that they were born in the modern era. This dataset has 3M images from 10K people. We run an OpenCV face detector on these images, keeping the detections that are suf?ciently high resolution, which gives us approximately 350,000 face boxes. We use these face boxes for training. No data augmentation was applied to the images.
我們從人物名稱(chēng)的隨機(jī)Web圖像查詢(xún)中刮取包含人臉的圖像虱饿。人名是從dbpedia獲得的,其標(biāo)準(zhǔn)是他們出生在現(xiàn)代時(shí)代触趴。該數(shù)據(jù)集包含來(lái)自10K人的3M圖像氮发。我們?cè)谶@些圖像上運(yùn)行OpenCV人臉檢測(cè)器,保持足夠高分辨率的檢測(cè)結(jié)果冗懦,這為我們提供了大約350,000個(gè)面部檢測(cè)盒爽冕。我們使用這些臉盒進(jìn)行訓(xùn)練。沒(méi)有數(shù)據(jù)增加被應(yīng)用于圖像披蕉。
Figure 2: Generated bedrooms after one training pass through the dataset. Theoretically, the model could learn to memorize training examples, but this is experimentally unlikely as we train with a small learning rate and minibatch SGD. We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate.
圖2:一次訓(xùn)練后產(chǎn)生的臥室通過(guò)數(shù)據(jù)集颈畸。從理論上講,該模型可以學(xué)習(xí)記憶訓(xùn)練實(shí)例没讲,但這在實(shí)驗(yàn)中不太可能眯娱,因?yàn)槲覀円孕W(xué)習(xí)率和小批量SGD訓(xùn)練。我們知道沒(méi)有先前的經(jīng)驗(yàn)證據(jù)表明用SGD和小的學(xué)習(xí)率記憶食零。
Figure 3: Generated bedrooms after ?ve epochs of training. There appears to be evidence of visual under-?tting via repeated noise textures across multiple samples such as the base boards of some of the beds.
圖3:經(jīng)過(guò)五個(gè)培訓(xùn)階段后的臥室困乒。似乎有證據(jù)表明通過(guò)多個(gè)樣品(例如某些床的基板)上的重復(fù)的噪音紋理,可能會(huì)造成視覺(jué)損傷贰谣。
4.3 IMAGENET-1K
4.3 IMAGENET-1K
We use Imagenet-1k (Deng et al., 2009) as a source of natural images for unsupervised training. We train onmin-resized center crops. No data augmentation was applied to the images.
我們使用Imagenet-1k(Deng et al娜搂。迁霎,2009)作為無(wú)監(jiān)督訓(xùn)練的自然圖像源。我們?cè)?div id="cmocsca" class="image-package">