7 練習(xí)
我們包含三個練習(xí)來檢測大家的理解。而解答在第 8 節(jié)給出沮明。
7.1 最優(yōu)判別器策略
在公式 (8) 中本今,判別器的目標(biāo)是關(guān)于 θ(D) 最小化函數(shù)
假設(shè)判別器可以在函數(shù)空間中優(yōu)化闻蛀,所以 D(x) 對 x 的每個值獨(dú)立確定的。所以 D 的最優(yōu)策略是什么材彪?獲得這樣的結(jié)果我們需要什么樣的假設(shè)观挎?
7.2 博弈的梯度下降
現(xiàn)在考慮一個 minimax 博弈,包含兩個參與人段化,每個參與人控制了單個的標(biāo)量值嘁捷。最小化參與人控制標(biāo)量 x,最大化的參與人控制標(biāo)量 y显熏。對該博弈的值函數(shù)就是:
- 這個博弈有一個均衡點(diǎn)么雄嚣?如果是,這個點(diǎn)在哪里佃延?
- 考慮同步梯度下降的學(xué)習(xí)動態(tài)现诀。為了簡化問題,將梯度下降看作是連續(xù)時間過程履肃。按照一個無窮小學(xué)習(xí)率仔沿,梯度下降可以用偏微分方程組來表示:
然后解這個方程組。
7.3 GAN 框架的最大似然
在練習(xí)中尺棋,我們將會推導(dǎo)出一個代價可以得到(或者近似得到)最大似然學(xué)習(xí)封锉。我們的目標(biāo)是設(shè)計 J(G) 使得如果我們假設(shè)判別器最優(yōu)的,那么 J(G) 期望梯度將會稱為 DKL(pdata||pmodel)期望梯度膘螟。
其解的形式如下:
練習(xí)包含確定 f 的形式成福。
練習(xí)的解
8.1 最優(yōu)判別器策略
我們的目標(biāo)是在函數(shù)空間中最小化
我們先假設(shè) pdata 和 pmodel 處處非零。如果不作假設(shè)荆残,那么在訓(xùn)練過程中有些點(diǎn)永不會遇到奴艾,并且有未定義的行為。
對 D 來最小化 J(G)内斯,我們可以寫下函數(shù)關(guān)于單個項 D(x) 的導(dǎo)數(shù)蕴潦,然后使之為 0:
解方程可得:
估計該比例就是 GANs 使用的關(guān)鍵近似機(jī)制像啼。參看圖 35.
8.2 博弈的梯度下降
值函數(shù)
是最簡單的有一個鞍點(diǎn)的連續(xù)函數(shù)的例子。而理解這個博弈最簡單的方式就是在三維空間中可視化值函數(shù)潭苞,如圖 36.
三維空間可視化給我們清晰地展現(xiàn)了在 x = y = 0 處有一個鞍點(diǎn)忽冻。這就是博弈的均衡點(diǎn)。我們同樣可以用解導(dǎo)數(shù)為 0 的方程來找到該點(diǎn)此疹。
當(dāng)然并不是所有的鞍點(diǎn)都是均衡點(diǎn)僧诚;這里參與人參數(shù)的無窮小的擾動不能降低參與人的代價。這個博弈的鞍點(diǎn)滿足了這個要求蝗碎。這個是某種 pathological 均衡因為值函數(shù)是在保持其他參與人的參數(shù)不變時每個參數(shù)的常量函數(shù)湖笨。
為了得到梯度下降的軌跡,我們求導(dǎo)衍菱,并得到
對方程 28 微分赶么,我們得到
這種形式的微分方程有 sinusoids 作為解的基函數(shù)的他們集合肩豁。關(guān)于邊界條件來求解這個參數(shù)脊串,我們有
這些動態(tài)形成了一個循環(huán)的過程,正如圖 37 中所示清钥。用另外的詞琼锋,以一個無窮小的學(xué)習(xí)率的同步梯度下降將會在均衡點(diǎn)處其初始化的半徑循環(huán)。如果采用一個更大的學(xué)習(xí)率祟昭,那么對同步梯度下降可能會永遠(yuǎn)地遠(yuǎn)離均衡點(diǎn)缕坎。同步梯度下降就不會達(dá)到均衡。
對某些博弈來說篡悟,同步梯度下降會收斂谜叹,而對另外一些博弈,比如練習(xí)中這個搬葬,就不會收斂荷腊。對于 GANs,其實沒有理論預(yù)測來解釋同步梯度下降是否會收斂急凰。解決這個理論問題女仰,并開發(fā)出保證收斂的算法,仍然是開放的研究問題抡锈。
8.3 GAN 框架中的最大似然
我們希望找到一個函數(shù) f 使得
等于 DKL(pdata||pmodel) 的期望梯度疾忍。
首先,我們對 KL 散度求關(guān)于參數(shù) θ 的導(dǎo)數(shù):
現(xiàn)在我們希望找到 f 會讓公式 32 的導(dǎo)數(shù)等于方程 33床三。我們從求公式 32 的導(dǎo)數(shù):
為了得到這個結(jié)果一罩,我們作兩個假設(shè):
- 我們假設(shè)處處都滿足 pg(x) >= 0,這樣能夠使用等式 pg(x) = exp(log pg(x)).
- 假設(shè)可以使用 Leibniz 法則來改變微分和積分的次序(就是說撇簿,函數(shù)和其導(dǎo)數(shù)都是連續(xù)的聂渊,而對 x 無窮小的值函數(shù)值為 0)
我們看到 J(G) 的導(dǎo)數(shù)非常接近于我們的目標(biāo)推汽;唯一的問題就是期望是由從 pg 中采樣計算出來的,而我們想要的其實是由 pdata 中采樣來計算歧沪。我們可以通過 importance sampling 方法來解決歹撒;通過設(shè)置 f(x) = pdata(x)/pg(x) 來重置從每個生成器樣本到補(bǔ)償它是從生成器而非原數(shù)據(jù)中采樣對梯度的貢獻(xiàn)。
注意在構(gòu)造 J(G) 時诊胞,我們必需復(fù)制 pg 到 f(x) 使得 f(x) 對于 pg 參數(shù)的導(dǎo)數(shù)為 0暖夭。如果我們獲得pdata(x)/pg(x) 該情況會自然出現(xiàn)。
從 8.1 節(jié)可以知道撵孤,判別器會估計目標(biāo)比例迈着。使用一些代數(shù)處理,我們可以得到一個 f(x) 數(shù)值穩(wěn)定的實現(xiàn)邪码。如果判別器定義為在輸出層應(yīng)用 logistic sigmoid 函數(shù)裕菠,其中 D(x) = σ(a(x)),那么 f(x) = - exp(a(x))闭专。
這個練習(xí)來自于 Goodfellow 2014 的論文的結(jié)果奴潘。我們可以看到判別器估計密度的比例可以被用來計算散度的差異(variety)。
總結(jié)
GANs 是使用監(jiān)督學(xué)習(xí)來近似難解代價函數(shù)的生成式模型影钉,如同 Boltzmann machine 使用 Markov chain 和 VAE 通用變分下界近似對應(yīng)的代價函數(shù)画髓。GANs 可以使用監(jiān)督的比例估計技巧來近似很多代價函數(shù),比如說用來近似最大似然估計的 KL 散度平委。
GANs 相對新穎奈虾,仍然需要更多的研究來發(fā)掘出其潛力。特別是訓(xùn)練 GANs 需要找到高維連續(xù)非凸博弈的 Nash 均衡廉赔。研究者們應(yīng)該嘗試開發(fā)更好的理論理解和更好的訓(xùn)練算法肉微。所以在這個領(lǐng)域中的突破,會給除了 GANs 之外的應(yīng)用帶來好處蜡塌。
GANs 對很多圖像生成和控制系統(tǒng)的應(yīng)用取得最好效果特別關(guān)鍵碉纳,并且也在增強(qiáng)其他應(yīng)用方面由很多潛力。
感謝
The author would like to thank the NIPS organizers for inviting him to present this tutorial. Many thanks also to those who commented on his Twitter and Facebook posts asking which topics would be of interest to the tutorial audience. Thanks also to D. Kingma for helpful discussions regarding the description of VAEs. Thanks to Zhu Xiaohu, Alex Kurakin and Ilya Edrenkin for spotting typographical errors in the manuscript.
參考文獻(xiàn)
Abadi, M. and Andersen, D. G. (2016). Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918 .
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man′e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,Vasudevan, V., Vi′egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-scale machine learning
on heterogeneous systems. Software available from tensorflow.org.
Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147{169.
Bengio, Y., Thibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014). Deep generative stochastic networks trainable by backprop. In ICML’2014.
Brock, A., Lim, T., Ritchie, J. M., and Weston, N. (2016). Neural photo editing with introspective adversarial networks. CoRR, abs/1609.07093.
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. (2016a). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2172{2180.
Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. (2016b). Variational lossy autoencoder. arXiv preprint arXiv:1611.02731.
Deco, G. and Brauer, W. (1995). Higher order statistical decorrelation without information loss. NIPS.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
52Deng, J., Berg, A. C., Li, K., and Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In Proceedings of the 11th European Conference on Computer Vision: Part V , ECCV’10, pages 71{84, Berlin, Heidelberg. Springer-Verlag.
Denton, E., Chintala, S., Szlam, A., and Fergus, R. (2015). Deep generative image models using a Laplacian pyramid of adversarial networks. NIPS.
Dinh, L., Krueger, D., and Bengio, Y. (2014). NICE: Non-linear independent components estimation. arXiv:1410.8516.
Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.
Donahue, J., Kr¨ahenb¨uhl, P., and Darrell, T. (2016). Adversarial feature learning. arXiv preprint arXiv:1605.09782.
Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., and Courville, A. (2016). Adversarially learned inference. arXiv preprint arXiv:1606.00704.
Dziugaite, G. K., Roy, D. M., and Ghahramani, Z. (2015). Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906.
Edwards, H. and Storkey, A. (2015). Censoring representations with an adversary. arXiv preprint arXiv:1511.05897.
Fahlman, S. E., Hinton, G. E., and Sejnowski, T. J. (1983). Massively parallel architectures for AI: NETL, thistle, and Boltzmann machines. In Proceedings of the National Conference on Artificial Intelligence AAAI-83.
Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696.
Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016a). A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852.
Finn, C., Goodfellow, I., and Levine, S. (2016b). Unsupervised learning for physical interaction through video prediction. NIPS.
Frey, B. J. (1998). Graphical models for machine learning and digital communication. MIT Press.
Frey, B. J., Hinton, G. E., and Dayan, P. (1996). Does the wake-sleep algorithm learn good density estimators? In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8 (NIPS’95), pages 661{670. MIT Press, Cambridge, MA.
53Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2015). Domain-adversarial training of neural networks. arXiv preprint arXiv:1505.07818.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
Goodfellow, I. J. (2014). On distinguishability criteria for estimating generative models. In International Conference on Learning Representations, Workshops Track.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014a). Explaining and harnessing adversarial examples. CoRR, abs/1412.6572.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014b). Generative adversarial networks. In NIPS’2014.
Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS’10).
Hinton, G. E. (2007). Learning multiple layers of representation. Trends in cognitive sciences, 11(10), 428{434.
Hinton, G. E. and Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 7, pages 282{317. MIT Press, Cambridge.
Hinton, G. E., Sejnowski, T. J., and Ackley, D. H. (1984). Boltzmann machines: Constraint satisfaction networks that learn. Technical Report TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science.
Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527{1554.
Ho, J. and Ermon, S. (2016). Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, pages 4565{4573.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004.
Jang, E., Gu, S., and Poole, B. (2016). Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
54Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kingma, D. P. (2013). Fast gradient-based inference with continuous latent variable models in auxiliary form. Technical report, arxiv:1306.0733.
Kingma, D. P., Salimans, T., and Welling, M. (2016). Improving variational inference with inverse autoregressive flow. NIPS.
Ledig, C., Theis, L., Huszar, F., Caballero, J., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., and Shi, W. (2016). Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802.
Li, Y., Swersky, K., and Zemel, R. S. (2015). Generative moment matching networks. CoRR, abs/1502.02761.
Lotter, W., Kreiman, G., and Cox, D. (2015). Unsupervised learning of visual structure using predictive generative networks. arXiv preprint
arXiv:1511.06380.
Maddison, C. J., Mnih, A., and Teh, Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
Metz, L., Poole, B., Pfau, D., and Sohl-Dickstein, J. (2016). Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163.
Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., and Clune, J. (2016).
Plug & play generative networks: Conditional iterative generation of images in latent space. arXiv preprint arXiv:1612.00005.
Nowozin, S., Cseke, B., and Tomioka, R. (2016). f-gan: Training generative neural samplers using variational divergence minimization. arXiv preprint arXiv:1606.00709.
Odena, A. (2016). Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583.
Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
Pfau, D. and Vinyals, O. (2016). Connecting generative adversarial networks and actor-critic methods. arXiv preprint arXiv:1610.01945.
Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
55Ratliff, L. J., Burden, S. A., and Sastry, S. S. (2013). Characterization and computation of local nash equilibria in continuous games. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 917{924. IEEE.
Reed, S., van den Oord, A., Kalchbrenner, N., Bapst, V., Botvinick, M., and de Freitas, N. (2016a). Generating interpretable images with controllable structure. Technical report.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016b). Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
Rezende, D. J. and Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770.
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In ICML’2014.Preprint: arXiv:1401.4082.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2014). ImageNet Large Scale Visual Recognition Challenge.
Salakhutdinov, R. and Hinton, G. (2009). Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448{455.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016). Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2226{2234.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484{489.
Springenberg, J. T. (2015). Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv preprint arXiv:1511.06390.
Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2015). Striving for simplicity: The all convolutional net. In ICLR.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. (2014). Intriguing properties of neural networks. ICLR, abs/1312.6199.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. ArXiv e-prints.
56Theis, L., van den Oord, A., and Bethge, M. (2015). A note on the evaluation of generative models. arXiv:1511.01844.
Warde-Farley, D. and Goodfellow, I. (2016). Adversarial perturbations of deep neural networks. In T. Hazan, G. Papandreou, and D. Tarlow, editors, Perturbations, Optimization, and Statistics, chapter 11. MIT Press.
Williams, R. J. (1992). Simple statistical gradient-following algorithms connectionist reinforcement learning. Machine Learning, 8, 229{256.
Wu, Y., Burda, Y., Salakhutdinov, R., and Grosse, R. (2016). On the quantitative analysis of decoder-based generative models. arXiv preprint arXiv:1611.04273.
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., and Metaxas, D. (2016). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1612.03242 .
Zhu, J.-Y., Kr¨ahenb¨uhl, P., Shechtman, E., and Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597{613. Springer