Improved Training of Wasserstein GANs翻譯[下]

Improved Training of Wasserstein GANs翻譯 上

code

4 Gradient penalty

4梯度罰款

We now propose an alternative way to enforce the Lipschitz constraint. A differentiable function is 1-Lipschtiz if and only if it has gradients with norm at most 1 everywhere, so we consider directly constraining the gradient norm of the critic’s output with respect to its input. To circumvent tractability issues, we enforce a soft version of the constraint with a penalty on the gradient norm for random samples
image

. Our new objective is

我們現(xiàn)在提出一種強(qiáng)制Lipschitz約束的替代方法抛蚁《罡鳎可區(qū)分函數(shù)是1-Lipschtiz赊颠,當(dāng)且僅當(dāng)它具有最多1個(gè)范數(shù)的梯度時(shí)虚吟,所以我們考慮直接約束評(píng)論者輸出相對(duì)于其輸入的梯度范數(shù)。為了避免易處理性問(wèn)題技即,我們強(qiáng)制執(zhí)行約束的軟版本厢岂,對(duì)隨機(jī)樣本
image

的梯度范數(shù)進(jìn)行懲罰。我們的新目標(biāo)是

image

Sampling distribution We implicitly de?ne
image

sampling uniformly along straight lines between pairs of points sampled from the data distribution
image
and the generator distribution
image
. This is motivated by the fact that the optimal critic contains straight lines with gradient norm 1 connecting coupled points from
image

and
image
(see Proposition 1). Given that enforcing the unit gradient norm constraint everywhere is intractable, enforcing it only along these straight lines seems suf?cient and experimentally results in good performance.

采樣分布我們隱式地定義
image

沿著從數(shù)據(jù)分布
image
和生成器分布
image
采樣的點(diǎn)對(duì)之間的直線均勻采樣焕蹄。這是因?yàn)樽顑?yōu)評(píng)論家包含直線,其中梯度范數(shù)1連接來(lái)自
image

image
的耦合點(diǎn)(參見(jiàn)命題1)阀溶。鑒于在任何地方強(qiáng)制執(zhí)行單位梯度范數(shù)約束是難以處理的腻脏,僅沿著這些直線強(qiáng)制執(zhí)行它似乎是足夠的,并且在實(shí)驗(yàn)上導(dǎo)致良好的性能银锻。

Penalty coef?cient All experiments in this paper use
image

, which we found to work well across a variety of architectures and datasets ranging from toy tasks to large ImageNet CNNs.

懲罰系數(shù)本文中的所有實(shí)驗(yàn)都使用
image

永品,我們發(fā)現(xiàn)它可以很好地適用于從玩具任務(wù)到大型ImageNet CNN的各種架構(gòu)和數(shù)據(jù)集。

No critic batch normalization Most prior GAN implementations [22, 23, 2] use batch normalization in both the generator and the discriminator to help stabilize training, but batch normalization changes the form of the discriminator’s problem from mapping a single input to a single output to mapping from an entire batch of inputs to a batch of outputs [23]. Our penalized training objective is no longer valid in this setting, since we penalize the norm of the critic’s gradient with respect to each input independently, and not the entire batch. To resolve this, we simply omit batch normalization in the critic in our models, ?nding that they perform well without it. Our method works with normalization schemes which don’t introduce correlations between examples. In particular, we recommend layer normalization [3] as a drop-in replacement for batch normalization.

沒(méi)有評(píng)論批量標(biāo)準(zhǔn)化大多數(shù)先前的GAN實(shí)現(xiàn)[22,23,2]在生成器和鑒別器中都使用批量標(biāo)準(zhǔn)化來(lái)幫助穩(wěn)定訓(xùn)練击纬,但批量標(biāo)準(zhǔn)化會(huì)將鑒別器問(wèn)題的形式從單個(gè)輸入映射到單個(gè)輸出變?yōu)閺囊慌斎胗成涞揭慌敵鯷23]鼎姐。我們的懲罰性培訓(xùn)目標(biāo)在此設(shè)置中不再有效,因?yàn)槲覀儠?huì)獨(dú)立地懲罰評(píng)論者關(guān)于每個(gè)輸入的梯度的標(biāo)準(zhǔn)更振,而不是整個(gè)批次炕桨。為了解決這個(gè)問(wèn)題,我們?cè)谀P椭泻雎耘u(píng)規(guī)范化肯腕,發(fā)現(xiàn)它們?cè)跊](méi)有它的情況下表現(xiàn)良好献宫。我們的方法適用于規(guī)范化方案,這些方案不會(huì)引入示例之間的相關(guān)性实撒。特別是姊途,我們建議將層標(biāo)準(zhǔn)化[3]作為批量標(biāo)準(zhǔn)化的直接替代。

Two-sided penalty We encourage the norm of the gradient to go towards 1 (two-sided penalty) instead of just staying below 1 (one-sided penalty). Empirically this seems not to constrain the critic too much, likely because the optimal WGAN critic anyway has gradients with norm 1 almost everywhere under
image

and
image
and in large portions of the region in between (see subsection 2.3). In our early observations we found this to perform slightly better, but we don’t investigate this fully. We describe experiments on the one-sided penalty in the appendix.

雙邊懲罰我們鼓勵(lì)梯度的范數(shù)朝向1(雙面懲罰)知态,而不是僅僅保持在1以下(單側(cè)懲罰)捷兰。根據(jù)經(jīng)驗(yàn),這似乎并沒(méi)有過(guò)多地限制批評(píng)者负敏,可能是因?yàn)樽罴训腤GAN評(píng)論家無(wú)論如何都在
image

image
之間的幾乎所有地方都有規(guī)范1的漸變贡茅,并且在其間的大部分地區(qū)(見(jiàn)2.3小節(jié))。在我們的早期觀察中原在,我們發(fā)現(xiàn)它的表現(xiàn)略好一些友扰,但我們并未對(duì)此進(jìn)行全面調(diào)查。我們描述了附錄中片面懲罰的實(shí)驗(yàn)庶柿。

5 Experiments

5實(shí)驗(yàn)

5.1 Training random architectures within a set

5.1培訓(xùn)集合中的隨機(jī)體系結(jié)構(gòu)

We experimentally demonstrate our model’s ability to train a large number of architectures which we think are useful to be able to train. Starting from the DCGAN architecture, we de?ne a set of architecture variants by changing model settings to random corresponding values in Table 1. We believe that reliable training of many of the architectures in this set is a useful goal, but we do not claim that our set is an unbiased or representative sample of the whole space of useful architectures: it is designed to demonstrate a successful regime of our method, and readers should evaluate whether it contains architectures similar to their intended application.

我們通過(guò)實(shí)驗(yàn)證明了我們的模型訓(xùn)練大量架構(gòu)的能力村怪,我們認(rèn)為這些架構(gòu)對(duì)訓(xùn)練有用。從DCGAN架構(gòu)開(kāi)始浮庐,我們通過(guò)將模型設(shè)置更改為表1中的隨機(jī)對(duì)應(yīng)值來(lái)定義一組架構(gòu)變體甚负。我們相信柬焕,對(duì)這一系列中的許多架構(gòu)進(jìn)行可靠的培訓(xùn)是一個(gè)有用的目標(biāo),但我們并不認(rèn)為我們的集合是整個(gè)有用架構(gòu)空間的公正或有代表性的樣本:它旨在展示我們的成功制度梭域。方法斑举,讀者應(yīng)評(píng)估它是否包含與其預(yù)期應(yīng)用類(lèi)似的架構(gòu)。

Table 1: We evaluate WGAN-GP’s ability to train the architectures in this set.

表1:我們?cè)u(píng)估WGAN-GP在該組中訓(xùn)練架構(gòu)的能力病涨。

image

From this set, we sample 200 architectures and train each on
image

ImageNet with both WGAN-GP and the standard GAN objectives. Table 2 lists the number of instances where either: only the standard GAN succeeded, only WGAN-GP succeeded, both succeeded, or both failed, where success is de?ned as
image
. For most choices of score threshold, WGAN-GP successfully trains many architectures from this set which we were unable to train with the standard GAN objective. We give more experimental details in the appendix.

從這個(gè)集合中富玷,我們對(duì)
image

ImageNet中的200個(gè)體系結(jié)構(gòu)進(jìn)行了采樣,并使用WGAN-GP和標(biāo)準(zhǔn)GAN目標(biāo)進(jìn)行訓(xùn)練既穆。表2列出了以下任一情況的實(shí)例數(shù):只有標(biāo)準(zhǔn)GAN成功赎懦,只有WGAN-GP成功,成功或兩者都失敗幻工,成功定義為
image
励两。對(duì)于大多數(shù)得分閾值的選擇,WGAN-GP成功地訓(xùn)練了許多我們無(wú)法用標(biāo)準(zhǔn)GAN目標(biāo)訓(xùn)練的架構(gòu)囊颅。我們?cè)诟戒浿刑峁┝烁鄬?shí)驗(yàn)細(xì)節(jié)当悔。

Table 2: Outcomes of training 200 random architectures, for different success thresholds. For comparison, our standard DCGAN scored 7.24.

表2:針對(duì)不同的成功閾值,培訓(xùn)200個(gè)隨機(jī)體系結(jié)構(gòu)的結(jié)果踢代。相比之下盲憎,我們的標(biāo)準(zhǔn)DCGAN得分為7.24。

image
image

101-layer ResNet G and D 5.2 Training varied architectures on LSUN bedrooms To demonstrate our model’s ability to train many architectures with its default settings, we train six different GAN architectures on the LSUN bedrooms dataset [31]. In addition to the baseline DCGAN architecture from [22], we choose six architectures whose successful training we demonstrate: (1) no BN and a constant number of ?lters in the generator, as in [2], (2) 4-layer 512-dim ReLU MLP generator, as in [2], (3) no normalization in either the discriminator or generator (4) gated multiplicative nonlinearities, as in [24], (5) tanh nonlinearities, and (6) 101-layer ResNet generator and discriminator.

基于LSUN臥室的101層ResNet G和D 5.2培訓(xùn)各種架構(gòu)為了展示我們的模型能夠以默認(rèn)設(shè)置訓(xùn)練許多架構(gòu)胳挎,我們?cè)贚SUN臥室數(shù)據(jù)集上訓(xùn)練了六種不同的GAN架構(gòu)[31]焙畔。除了[22]的基線DCGAN架構(gòu)外,我們選擇了六種架構(gòu)串远,我們展示了它們的成功訓(xùn)練:(1)發(fā)生器中沒(méi)有BN和恒定數(shù)量的濾波器,如[2]儿惫,(2)4層512 -dim ReLU MLP發(fā)生器澡罚,如[2]中所述,(3)在鑒別器或發(fā)生器中沒(méi)有歸一化(4)門(mén)控乘法非線性肾请,如[24]留搔,(5)tanh非線性,和(6)101層ResNet發(fā)生器和鑒別器铛铁。

image
image

Figure 2: Different GAN architectures trained with different methods. We only succeeded in training every architecture with a shared set of hyperparameters using WGAN-GP.

圖2:使用不同方法訓(xùn)練的不同GAN架構(gòu)隔显。我們只使用WGAN-GP成功地使用一組共享的超參數(shù)來(lái)訓(xùn)練每個(gè)架構(gòu)。

Although we do not claim it is impossible without our method, to the best of our knowledge this is the ?rst time very deep residual networks were successfully trained in a GAN setting. For each architecture, we train models using four different GAN methods: WGAN-GP, WGAN with weight clipping, DCGAN [22], and Least-Squares GAN [18]. For each objective, we used the default set of optimizer hyperparameters recommended in that work (except LSGAN, where we searched over learning rates).

雖然我們沒(méi)有聲稱(chēng)沒(méi)有我們的方法是不可能的饵逐,但據(jù)我們所知括眠,這是第一次在GAN設(shè)置中成功訓(xùn)練非常深的殘留網(wǎng)絡(luò)。對(duì)于每種架構(gòu)倍权,我們使用四種不同的GAN方法訓(xùn)練模型:WGAN-GP掷豺,帶權(quán)重限幅的WGAN,DCGAN [22]和最小二乘GAN [18]。對(duì)于每個(gè)目標(biāo)当船,我們使用了該工作中推薦的默認(rèn)優(yōu)化器超參數(shù)集(除了LSGAN题画,我們搜索了學(xué)習(xí)率)。

For WGAN-GP, we replace any batch normalization in the discriminator with layer normalization (see section 4). We train each model for 200K iterations and present samples in Figure 2. We only succeeded in training every architecture with a shared set of hyperparameters using WGAN-GP. For every other training method, some of these architectures were unstable or suffered from mode collapse.

對(duì)于WGAN-GP德频,我們用層規(guī)范化替換鑒別器中的任何批量標(biāo)準(zhǔn)化(參見(jiàn)第4節(jié))苍息。我們訓(xùn)練每個(gè)模型進(jìn)行200K次迭代,并在圖2中顯示樣本壹置。我們只使用WGAN-GP成功地使用一組共享的超參數(shù)來(lái)訓(xùn)練每個(gè)架構(gòu)竞思。對(duì)于其他所有訓(xùn)練方法,其中一些架構(gòu)不穩(wěn)定或遭受模式崩潰蒸绩。

5.3 Improved performance over weight clipping

5.3改善了重量削減的性能

One advantage of our method over weight clipping is improved training speed and sample quality. To demonstrate this, we train WGANs with weight clipping and our gradient penalty on CIFAR10 [13] and plot Inception scores [23] over the course of training in Figure 3. For WGAN-GP, we train one model with the same optimizer (RMSProp) and learning rate as WGAN with weight clipping, and another model with Adam and a higher learning rate. Even with the same optimizer, our method converges faster and to a better score than weight clipping. Using Adam further improves performance. We also plot the performance of DCGAN [22] and ?nd that our method converges more slowly (in wall-clock time) than DCGAN, but its score is more stable at convergence.

我們的方法優(yōu)于減重的一個(gè)優(yōu)點(diǎn)是提高了訓(xùn)練速度和樣本質(zhì)量衙四。為了證明這一點(diǎn),我們?cè)趫D3中的訓(xùn)練過(guò)程中訓(xùn)練WGAN進(jìn)行了體重削減和CIFAR10 [13]的梯度懲罰以及初始得分[23]患亿。對(duì)于WGAN-GP传蹈,我們訓(xùn)練一個(gè)模型使用相同的優(yōu)化器(RMSProp)和學(xué)習(xí)率作為WGAN進(jìn)行權(quán)重削減,另一個(gè)模型使用Adam和更高的學(xué)習(xí)率步藕。即使使用相同的優(yōu)化器惦界,我們的方法收斂速度更快,并且比重量限幅更好咙冗。使用Adam進(jìn)一步提高了性能沾歪。我們還繪制了DCGAN [22]的性能,并發(fā)現(xiàn)我們的方法比DCGAN收斂得更慢(在掛鐘時(shí)間內(nèi))雾消,但其收斂在收斂時(shí)更穩(wěn)定灾搏。

image

Figure 3: CIFAR-10 Inception score over generator iterations (left) or wall-clock time (right) for four models: WGAN with weight clipping, WGAN-GP with RMSProp and Adam (to control for the optimizer), and DCGAN. WGAN-GP signi?cantly outperforms weight clipping and performs comparably to DCGAN.

圖3:CIFAR-10在四個(gè)模型的生成器迭代(左)或掛鐘時(shí)間(右)上的初始得分:具有權(quán)重削減的WGAN,具有RMSProp和Adam的WGAN-GP(用于控制優(yōu)化器)和DCGAN立润。WGAN-GP顯著優(yōu)于減重并且與DCGAN相當(dāng)狂窑。

5.4 Sample quality on CIFAR-10 and LSUN bedrooms

5.4 CIFAR-10和LSUN臥室的樣品質(zhì)量

For equivalent architectures, our method achieves comparable sample quality to the standard GAN objective. However the increased stability allows us to improve sample quality by exploring a wider range of architectures. To demonstrate this, we ?nd an architecture which establishes a new state of the art Inception score on unsupervised CIFAR-10 (Table 3). When we add label information (using the method in [20]), the same architecture outperforms all other published models except for SGAN.

對(duì)于等效架構(gòu),我們的方法實(shí)現(xiàn)了與標(biāo)準(zhǔn)GAN目標(biāo)相當(dāng)?shù)臉颖举|(zhì)量桑腮。然而泉哈,增加的穩(wěn)定性使我們能夠通過(guò)探索更廣泛的架構(gòu)來(lái)提高樣品質(zhì)量。為了證明這一點(diǎn)破讨,我們找到了一種架構(gòu)丛晦,它在無(wú)人監(jiān)督的CIFAR-10上建立了一種新的最先進(jìn)的入門(mén)分?jǐn)?shù)(表3)。當(dāng)我們添加標(biāo)簽信息時(shí)(使用[20]中的方法)提陶,相同的架構(gòu)優(yōu)于除SGAN之外的所有其他已發(fā)布模型烫沙。

Table 3: Inception scores on CIFAR-10. Our unsupervised model achieves state-of-the-art performance, and our conditional model outperforms all others except SGAN.

表3:CIFAR-10的初始分?jǐn)?shù)。我們的無(wú)監(jiān)督模型實(shí)現(xiàn)了最先進(jìn)的性能搁骑,我們的條件模型優(yōu)于除SGAN之外的所有其他模型斧吐。

Unsupervised Supervised believe these samples are at least competitive with the best reported so far on any resolution for this We also train a deep ResNet on
image

LSUN bedrooms and show samples in Figure 4. We dataset.

無(wú)監(jiān)督的監(jiān)督認(rèn)為這些樣本至少與迄今為止報(bào)道的最佳報(bào)告競(jìng)爭(zhēng)對(duì)手又固。我們還在
image

LSUN臥室培訓(xùn)深度ResNet并在圖4中顯示樣本。我們的數(shù)據(jù)集煤率。

image

5.5 Modeling discrete data with a continuous generator

5.5使用連續(xù)發(fā)電機(jī)建模離散數(shù)據(jù)

To demonstrate our method’s ability to model degenerate distributions, we consider the problem of modeling a complex discrete distribution with a GAN whose generator is de?ned over a continuous space. As an instance of this problem, we train a character-level GAN language model on the Google Billion Word dataset [6]. Our generator is a simple 1D CNN which deterministically transforms a latent vector into a sequence of 32 one-hot character vectors through 1D convolutions. We apply a softmax nonlinearity at the output, but use no sampling step: during training, the softmax output is to the best published results so far. Figure 4: Samples of
image

LSUN bedrooms. We believe these samples are at least comparable passed directly into the critic (which, likewise, is a simple 1D CNN). When decoding samples, we just take the argmax of each output vector.

為了證明我們的方法能夠?qū)ν嘶植歼M(jìn)行建模仰冠,我們考慮使用GAN對(duì)復(fù)雜離散分布建模的問(wèn)題,其中GAN的生成器是在連續(xù)空間上定義的蝶糯。作為這個(gè)問(wèn)題的一個(gè)例子洋只,我們?cè)贕oogle Billion Word數(shù)據(jù)集上訓(xùn)練了一個(gè)字符級(jí)的GAN語(yǔ)言模型[6]。我們的生成器是一個(gè)簡(jiǎn)單的1D CNN昼捍,通過(guò)1D卷積確定性地將潛在向量轉(zhuǎn)換為32個(gè)單熱字符向量的序列识虚。我們?cè)谳敵龆藨?yīng)用softmax非線性,但不使用采樣步驟:在訓(xùn)練期間妒茬,softmax輸出到目前為止發(fā)布的最佳結(jié)果担锤。圖4:
image

LSUN臥室的樣品。我們相信這些樣本至少可以直接傳遞給評(píng)論家(同樣乍钻,這是一個(gè)簡(jiǎn)單的1D CNN)肛循。解碼樣本時(shí),我們只取每個(gè)輸出向量的argmax银择。

image

We present samples from the model in Table 4. Our model makes frequent spelling errors (likely because it has to output each character independently) but nonetheless manages to learn quite a lot about the statistics of language. We were unable to produce comparable results with the standard GAN objective, though we do not claim that doing so is impossible.

我們?cè)诒?中提供了模型中的樣本多糠。我們的模型經(jīng)常出現(xiàn)拼寫(xiě)錯(cuò)誤(可能是因?yàn)樗仨毆?dú)立輸出每個(gè)字符),但仍然能夠?qū)W到很多關(guān)于語(yǔ)言統(tǒng)計(jì)的知識(shí)浩考。我們無(wú)法與標(biāo)準(zhǔn)GAN目標(biāo)產(chǎn)生可比較的結(jié)果夹孔,但我們并未聲稱(chēng)這樣做是不可能的。

Table 4: Samples from a WGAN-GP character-level language model trained on sentences from the Billion Word dataset, truncated to 32 characters. The model learns to directly output one-hot character embeddings from a latent vector without any discrete sampling step. We were unable to achieve comparable results with the standard GAN objective and a continuous generator.

表4:來(lái)自WGAN-GP字符級(jí)語(yǔ)言模型的樣本析孽,該模型使用Billion Word數(shù)據(jù)集中的句子進(jìn)行訓(xùn)練搭伤,截?cái)酁?2個(gè)字符。該模型學(xué)習(xí)直接從潛在向量輸出單熱字符嵌入而無(wú)需任何離散采樣步驟袜瞬。我們無(wú)法使用標(biāo)準(zhǔn)GAN物鏡和連續(xù)發(fā)電機(jī)獲得可比較的結(jié)果闷畸。

image

The difference in performance between WGAN and other GANs can be explained as follows. Consider the simplex
image

, and the set of vertices on the simplex (or one-hot vectors)
image
. If we have a vocabulary of size n and we have a distribution
image
over sequences of size T , we have that
image

is a distribution on
image
. Since
image
is a subset of
image
, we can also treat
image
as a distribution on
image
(by assigning zero probability mass to all points not in
image
).

WGAN與其他GAN之間的性能差異可以解釋如下⊥讨停考慮單純形
image

,以及單純形(或單熱矢量)
image
上的頂點(diǎn)集盾沫。如果我們有一個(gè)大小為n的詞匯表裁赠,并且我們?cè)诖笮門(mén)的序列上有一個(gè)分布
image
,那么
image

就是
image
上的一個(gè)分布赴精。由于
image
image
的子集佩捞,我們還可以將
image
視為
image
上的分布(通過(guò)為不在
image
中的所有點(diǎn)分配零概率質(zhì)量)。
image

is discrete (or supported on a ?nite number of elements, namely
image

) on
image
, but
image
can easily be a continuous distribution over
image

. The KL divergences between two such distributions are in?nite, and so the JS divergence is saturated. Although GANs do not literally minimize these divergences [16], in practice this means a discriminator might quickly learn to reject all samples that don’t lie on
image
(sequences of one-hot vectors) and give meaningless gradients to the generator. However, it is easily seen that the conditions of Theorem 1 and Corollary 1 of [2] are satis?ed even on this non-standard learning scenario with
image
. This means that
image
is still well de?ned, continuous everywhere and differentiable almost everywhere, and we can optimize it just like in any other continuous variable setting. The way this manifests is that in WGANs, the Lipschitz constraint forces the critic to provide a linear gradient from all
image
towards towards the real points in
image
.
image

image

上是離散的(或在有限數(shù)量的元素上支持蕾哟,即
image
)一忱,但
image
可以很容易地在
image

上連續(xù)分布莲蜘。兩個(gè)這樣的分布之間的KL差異是有限的,因此JS分歧是飽和的帘营。盡管GAN并沒(méi)有從字面上最小化這些差異[16]票渠,但實(shí)際上這意味著鑒別器可能會(huì)很快學(xué)會(huì)拒絕所有不在
image
(單熱矢量序列)上的樣本,并為發(fā)生器提供無(wú)意義的梯度芬迄。然而问顷,很容易看出,即使在
image
的非標(biāo)準(zhǔn)學(xué)習(xí)場(chǎng)景中禀梳,[2]的定理1和推論1的條件也令人滿意杜窄。這意味著
image
仍然很好地定義,無(wú)處不在算途,幾乎無(wú)處不在塞耕,我們可以像在任何其他連續(xù)變量設(shè)置中一樣對(duì)其進(jìn)行優(yōu)化。這表明在WGAN中嘴瓤,Lipschitz約束迫使評(píng)論家提供從所有
image
image
中的實(shí)際點(diǎn)的線性漸變扫外。
image

Figure 5: (a) The negative critic loss of our model on LSUN bedrooms converges toward a minimum as the network trains. (b) WGAN training and validation losses on a random 1000-digit subset of MNIST show over?tting when using either our method (left) or weight clipping (right). In particular, with our method, the critic over?ts faster than the generator, causing the training loss to increase gradually over time even as the validation loss drops.

圖5:(a)我們的LSUN臥室模型的負(fù)面批評(píng)損失在網(wǎng)絡(luò)訓(xùn)練時(shí)趨于最小。 (b)當(dāng)使用我們的方法(左)或權(quán)重削減(右)時(shí)纱注,隨機(jī)的1000位MNIST子集上的WGAN訓(xùn)練和驗(yàn)證損失顯示過(guò)度擬合畏浆。特別是,使用我們的方法狞贱,批評(píng)者比發(fā)電機(jī)更快刻获,導(dǎo)致培訓(xùn)損失隨著時(shí)間的推移逐漸增加,即使驗(yàn)證損失下降瞎嬉。

Other attempts at language modeling with GANs [32, 14, 30, 5, 15, 10] typically use discrete models and gradient estimators [28, 12, 17]. Our approach is simpler to implement, though whether it scales beyond a toy language model is unclear.

使用GAN [32,14,30,5,15,10]進(jìn)行語(yǔ)言建模的其他嘗試通常使用離散模型和梯度估計(jì)[28,12,17]蝎毡。我們的方法實(shí)現(xiàn)起來(lái)比較簡(jiǎn)單,但是它是否超出了玩具語(yǔ)言模型還不清楚氧枣。

5.6 Meaningful loss curves and detecting over?tting

5.6有意義的損耗曲線和檢測(cè)過(guò)度擬合

An important bene?t of weight-clipped WGANs is that their loss correlates with sample quality and converges toward a minimum. To show that our method preserves this property, we train a WGAN-GP on the LSUN bedrooms dataset [31] and plot the negative of the critic’s loss in Figure 5a. We see that the loss converges as the generator minimizes
image

.

重量限制WGAN的一個(gè)重要好處是它們的損失與樣品質(zhì)量相關(guān)沐兵,并且收斂到最小。為了表明我們的方法保留了這個(gè)屬性便监,我們?cè)贚SUN臥室數(shù)據(jù)集上訓(xùn)練了一個(gè)WGAN-GP [31]扎谎,并繪制了圖5a中評(píng)論家損失的負(fù)面影響。我們看到損失在發(fā)生器最小化
image

時(shí)收斂烧董。

Given enough capacity and too little training data, GANs will over?t. To explore the loss curve’s behavior when the network over?ts, we train large unregularized WGANs on a random 1000-image subset of MNIST and plot the negative critic loss on both the training and validation sets in Figure 5b. In both WGAN and WGAN-GP, the two losses diverge, suggesting that the critic over?ts and provides an inaccurate estimate of
image

, at which point all bets are off regarding correlation with sample quality. However in WGAN-GP, the training loss gradually increases even while the validation loss drops.

如果有足夠的容量和太少的訓(xùn)練數(shù)據(jù)毁靶,GAN將會(huì)過(guò)度。為了探索網(wǎng)絡(luò)過(guò)度時(shí)的損失曲線的行為逊移,我們?cè)贛NIST的隨機(jī)1000圖像子集上訓(xùn)練大的非正規(guī)化WGAN预吆,并在圖5b中的訓(xùn)練和驗(yàn)證集上繪制負(fù)面評(píng)論者損失。在WGAN和WGAN-GP中胳泉,這兩種損失有所不同拐叉,這表明對(duì)于過(guò)濾器的批評(píng)并提供了對(duì)
image

的不準(zhǔn)確估計(jì)砸讳,此時(shí)所有的投注均與樣本質(zhì)量相關(guān)异希。然而,在WGAN-GP中,即使驗(yàn)證損失下降平绩,訓(xùn)練損失也逐漸增加躺孝。

[29] also measure over?tting in GANs by estimating the generator’s log-likelihood. Compared to that work, our method detects over?tting in the critic (rather than the generator) and measures over?tting against the same loss that the network minimizes.

[29]還通過(guò)估計(jì)發(fā)電機(jī)的對(duì)數(shù)似然來(lái)測(cè)量GAN中的過(guò)量配置鸳址。與該工作相比烘苹,我們的方法檢測(cè)批評(píng)者(而不是發(fā)電機(jī))中的過(guò)度配置,并針對(duì)網(wǎng)絡(luò)最小化的相同損失進(jìn)行測(cè)量坝茎。

6 Conclusion

六涤姊,結(jié)論

In this work, we demonstrated problems with weight clipping in WGAN and introduced an alternative in the form of a penalty term in the critic loss which does not exhibit the same problems. Using our method, we demonstrated strong modeling performance and stability across a variety of architectures. Now that we have a more stable algorithm for training GANs, we hope our work opens the path for stronger modeling performance on large-scale image datasets and language. Another interesting direction is adapting our penalty term to the standard GAN objective function, where it might stabilize training by encouraging the discriminator to learn smoother decision boundaries.

在這項(xiàng)工作中,我們展示了WGAN中減重的問(wèn)題嗤放,并在批評(píng)者損失中以懲罰性術(shù)語(yǔ)的形式引入了替代方案思喊,其沒(méi)有表現(xiàn)出相同的問(wèn)題。使用我們的方法次酌,我們展示了各種架構(gòu)的強(qiáng)大建模性能和穩(wěn)定性『蘅危現(xiàn)在我們有了一個(gè)更穩(wěn)定的GAN訓(xùn)練算法,我們希望我們的工作為大規(guī)模圖像數(shù)據(jù)集和語(yǔ)言打開(kāi)更強(qiáng)大的建模性能之路岳服。另一個(gè)有趣的方向是使我們的懲罰項(xiàng)適應(yīng)標(biāo)準(zhǔn)的GAN目標(biāo)函數(shù)剂公,它可以通過(guò)鼓勵(lì)鑒別器學(xué)習(xí)更平滑的決策邊界來(lái)穩(wěn)定訓(xùn)練。

Acknowledgements

致謝

We would like to thank Mohamed Ishmael Belghazi, L′eon Bottou, Zihang Dai, Stefan Doerr, Ian Goodfellow, Kyle Kastner, Kundan Kumar, Luke Metz, Alec Radford, Colin Raffel, Sai Rajeshwar, Aditya Ramesh, Tom Sercu, Zain Shah and Jake Zhao for insightful comments.

我們要感謝Mohamed Ishmael Belghazi吊宋,L'Thon Bottou纲辽,Zihang Dai,Stefan Doerr璃搜,Ian Goodfellow拖吼,Kyle Kastner,Kundan Kumar这吻,Luke Metz吊档,Alec Radford,Colin Raffel唾糯,Sai Rajeshwar怠硼,Aditya Ramesh,Tom Sercu移怯,Zain Shah和杰克趙的見(jiàn)解很有見(jiàn)地拒名。

文章引用于 http://tongtianta.site/paper/3418
編輯 Lornatang
校準(zhǔn) Lornatang

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市芋酌,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌雁佳,老刑警劉巖脐帝,帶你破解...
    沈念sama閱讀 218,941評(píng)論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件同云,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡堵腹,警方通過(guò)查閱死者的電腦和手機(jī)炸站,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,397評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)疚顷,“玉大人旱易,你說(shuō)我怎么就攤上這事⊥鹊蹋” “怎么了阀坏?”我有些...
    開(kāi)封第一講書(shū)人閱讀 165,345評(píng)論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)笆檀。 經(jīng)常有香客問(wèn)我忌堂,道長(zhǎng),這世上最難降的妖魔是什么酗洒? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,851評(píng)論 1 295
  • 正文 為了忘掉前任士修,我火速辦了婚禮,結(jié)果婚禮上樱衷,老公的妹妹穿的比我還像新娘棋嘲。我一直安慰自己,他們只是感情好矩桂,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,868評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布沸移。 她就那樣靜靜地躺著,像睡著了一般耍鬓。 火紅的嫁衣襯著肌膚如雪阔籽。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 51,688評(píng)論 1 305
  • 那天牲蜀,我揣著相機(jī)與錄音笆制,去河邊找鬼。 笑死涣达,一個(gè)胖子當(dāng)著我的面吹牛在辆,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播度苔,決...
    沈念sama閱讀 40,414評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼匆篓,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了寇窑?” 一聲冷哼從身側(cè)響起鸦概,我...
    開(kāi)封第一講書(shū)人閱讀 39,319評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎甩骏,沒(méi)想到半個(gè)月后窗市,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體先慷,經(jīng)...
    沈念sama閱讀 45,775評(píng)論 1 315
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,945評(píng)論 3 336
  • 正文 我和宋清朗相戀三年咨察,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了论熙。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,096評(píng)論 1 350
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡摄狱,死狀恐怖脓诡,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情媒役,我是刑警寧澤祝谚,帶...
    沈念sama閱讀 35,789評(píng)論 5 346
  • 正文 年R本政府宣布,位于F島的核電站刊愚,受9級(jí)特大地震影響踊跟,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜鸥诽,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,437評(píng)論 3 331
  • 文/蒙蒙 一商玫、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧牡借,春花似錦拳昌、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,993評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至碴里,卻和暖如春沈矿,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背咬腋。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,107評(píng)論 1 271
  • 我被黑心中介騙來(lái)泰國(guó)打工羹膳, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人根竿。 一個(gè)月前我還...
    沈念sama閱讀 48,308評(píng)論 3 372
  • 正文 我出身青樓陵像,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親寇壳。 傳聞我的和親對(duì)象是個(gè)殘疾皇子醒颖,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,037評(píng)論 2 355