Are GANs Created Equal? A Large-Scale Study翻譯 下
Are GANs Created Equal? A Large-Scale Study
GAN創(chuàng)建平等嗎崭倘?大規(guī)模研究
論文:http://arxiv.org/pdf/1711.10337v3.pdf
Abstract
摘要
Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted largescale empirical study on state-of-the art models and evaluation measures. We ?nd that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not ?nd evidence that any of the tested algorithms consistently outperforms the original one.
生成對抗網(wǎng)絡(luò)(GAN)是生成模型的強大子類。盡管研究活動非常豐富斋泄,導(dǎo)致了許多有趣的GAN算法莫瞬,但仍然很難評估哪種算法的性能優(yōu)于其他算法。我們對最先進(jìn)的模型和評估措施進(jìn)行了中立的搬素,多方面的大規(guī)模實證研究祈匙。我們發(fā)現(xiàn)大多數(shù)模型可以通過足夠的超參數(shù)優(yōu)化和隨機重啟來達(dá)到相似的分?jǐn)?shù)。這表明鲜侥,與基本的算法變化相比褂始,更高的計算預(yù)算和調(diào)整可以帶來改進(jìn)。為了克服當(dāng)前度量的某些限制剃毒,我們還提出了幾個可以計算精度和召回率的數(shù)據(jù)集病袄。我們的實驗結(jié)果表明,未來的GAN研究應(yīng)基于更系統(tǒng)和客觀的評估程序赘阀。最后益缠,我們沒有發(fā)現(xiàn)任何經(jīng)過測試的算法始終優(yōu)于原始算法的證據(jù)。
1. Introduction
1.簡介
Generative adversarial networks (GAN) are a powerful subclass of generative models and were successfully applied to image generation and editing, semi-supervised learning, and domain adaptation [20, 24]. In the GAN framework the model learns a deterministic transformation G of a simple distribution?Indicates equal authorship. Correspondence to Mario Lucic (lucic@google.com) and Karol Kurach (kkurach@google.com).
?表明同等作者。與Mario Lucic(lucic@google.com)和Karol Kurach(kkurach@google.com)的通訊翠胰。
1 This is partially due to the lack of robust and consistent metric, as well as limited comparisons which put all algorithms on equal footage, including the computational budget to search over all hyperparameters. Why is it important? Firstly, to help the practitioner choose a better algorithm from a very large set. Secondly, to make progress towards better algorithms and their understanding, it is useful to clearly assess which modi?cations are critical, and which ones are only good on paper, but do not make a signi?cant difference in practice.
1這部分是由于缺乏穩(wěn)健和一致的度量標(biāo)準(zhǔn)容贝,以及有限的比較,使所有算法都在相同的鏡頭上之景,包括搜索所有超參數(shù)的計算預(yù)算嗤疯。它為什么如此重要?首先闺兢,幫助從業(yè)者從一個非常大的集合中選擇一個更好的算法。其次戏罢,為了在更好的算法及其理解方面取得進(jìn)展屋谭,有必要清楚地評估哪些修改是關(guān)鍵的,哪些修改只在紙上有用龟糕,但在實踐中沒有顯著差異桐磁。
The main issue with evaluation stems from the fact that one cannot explicitly compute the probability. As a result, classic measures, such as log-likelihood on the test set, cannot be evaluated. Consequently, many researchers focused on qualitative comparison, such as comparing the visual quality of samples. Unfortunately, such approaches are subjective and possibly misleading [7].
評估的主要問題源于一個人無法明確計算的概率。因此讲岁,無法評估經(jīng)典度量我擂,例如測試集上的對數(shù)似然。因此缓艳,許多研究人員專注于定性比較校摩,例如比較樣本的視覺質(zhì)量。不幸的是阶淘,這種方法是主觀的衙吩,可能會產(chǎn)生誤導(dǎo)[7]。
As a remedy, two evaluation metrics were proposed to quantitatively assess the performance of GANs. Both assume access to a pre-trained classi?er. Inception Score (IS)
作為補救措施溪窒,提出了兩個評估指標(biāo)來定量評估GAN的績效坤塞。兩者都假設(shè)訪問預(yù)先訓(xùn)練的分類器冯勉。初始分?jǐn)?shù)(IS)
[21] is based on the fact that a good model should generate samples for which, when evaluated by the classi?er, the class distribution has low entropy. At the same time, it should produce diverse samples covering all classes. In contrast, Fr′echet Inception Distance is computed by considering the difference in embedding of true and fake data [10]. Assuming that the coding layer follows a multivariate Gaussian distribution, the distance between the distributions is reduced to the Fr′echet distance between the corresponding Gaussians.
[21]基于一個好的模型應(yīng)該生成樣本的事實,當(dāng)由分類器評估時摹芙,類分布具有低熵灼狰。同時,它應(yīng)該生成涵蓋所有類別的各種樣本浮禾。相比之下交胚,F(xiàn)r'echet初始距離是通過考慮真假數(shù)據(jù)嵌入的差異來計算的[10]。假設(shè)編碼層遵循多元高斯分布伐厌,則分布之間的距離減小到相應(yīng)高斯之間的Fr'echet距離承绸。
Our main contributions:
我們的主要貢獻(xiàn):
1. We provide a fair and comprehensive comparison of the state-of-the-art GANs, and empirically demonstrate that nearly all of them can reach similar values of FID, given a high enough computational budget.
1.我們提供了對最先進(jìn)的GAN的公平和全面的比較,并且憑經(jīng)驗證明挣轨,在給定足夠高的計算預(yù)算的情況下军熏,幾乎所有的GAN都可以達(dá)到類似的FID值。
2. We provide strong empirical evidence1 that to compare GANs it is necessary to report a summary of distribution of results, rather than the best result achieved, due to the randomness of the optimization process and model instability.
2.我們提供了強有力的經(jīng)驗證據(jù)1卷扮,為了比較GAN荡澎,由于優(yōu)化過程的隨機性和模型不穩(wěn)定性,有必要報告結(jié)果分布的摘要晤锹,而不是最佳結(jié)果摩幔。
1As a note on the scale of the setup, the computational budget to reproduce those experiments is approximately 6.85 GPU years (NVIDIA P100).
1作為設(shè)置規(guī)模的注釋,重現(xiàn)這些實驗的計算預(yù)算約為6.85 GPU年(NVIDIA P100)鞭铆。
3. We assess the robustness of FID to mode dropping, use of a different encoding network, and provide estimates of the best FID achievable on classic data sets.
3.我們評估FID對模式丟棄的穩(wěn)健性或衡,使用不同的編碼網(wǎng)絡(luò),并提供對經(jīng)典數(shù)據(jù)集可實現(xiàn)的最佳FID的估計车遂。
4. We introduce a series of tasks of increasing dif?culty for which undisputed measures, such as precision and recall, can be approximately computed.
4.我們引入了一系列增加困難的任務(wù)封断,可以近似計算無可爭議的衡量標(biāo)準(zhǔn),如精確度和召回率舶担。
5. We open-sourced our experimental setup and model implementations at goo.gl/G8kf5J.
5.我們在goo.gl/G8kf5J開源了我們的實驗設(shè)置和模型實現(xiàn)坡疼。
2. Background and Related Work
2.背景和相關(guān)工作
There are several ongoing challenges in the study of GANs, including their convergence properties [2, 17], and optimization stability [21, 1]. Arguably, the most critical challenge is their quantitative evaluation.
在GAN研究中存在一些持續(xù)的挑戰(zhàn),包括它們的收斂性[2,17]和優(yōu)化穩(wěn)定性[21,1]柄瑰≌罚可以說隆箩,最關(guān)鍵的挑戰(zhàn)是他們的定量評估兜材。
The classic approach towards evaluating generative models is based on model likelihood which is often intractable. While the log-likelihood can be approximated for distributions on low-dimensional vectors, in the context of complex high-dimensional data the task becomes extremely challenging. Wu et al. [23] suggest an annealed importance sampling algorithm to estimate the hold-out log-likelihood. The key drawback of the proposed approach is the assumption of the Gaussian observation model which carries over all issues of kernel density estimation in high-dimensional spaces. Theis et al. [22] provide an analysis of common failure modes and demonstrate that it is possible to achieve high likelihood, but low visual quality, and vice-versa. Furthermore, they argue against using Parzen window density estimates as the likelihood estimate is often incorrect. In addition, ranking models based on these estimates is discouraged [3]. For a discussion on other drawbacks of likelihoodbased training and evaluation consult Husz′ar [11].
評估生成模型的經(jīng)典方法是基于模型可能性举庶,這通常是難以處理的蕊唐。雖然對數(shù)似然可以近似于低維矢量的分布,但在復(fù)雜的高維數(shù)據(jù)的背景下,任務(wù)變得極具挑戰(zhàn)性箕别。吳等人。 [23]建議退火重要性采樣算法來估計保持對數(shù)似然。所提出的方法的主要缺點是假設(shè)高斯觀測模型护蝶,該模型在高維空間中承載核密度估計的所有問題负饲。Theis等人妥泉。 [22]提供了對常見故障模式的分析匈仗,并證明可能實現(xiàn)高可能性火架,但視覺質(zhì)量低牛欢,反之亦然吮炕。此外鳄炉,他們反對使用Parzen窗口密度估計杜耙,因為可能性估計通常是不正確的。此外迎膜,不鼓勵基于這些估計的排名模型[3]泥技。關(guān)于基于可能性的培訓(xùn)和評估的其他缺點的討論,請咨詢Husz'ar [11]磕仅。
Inception Score (IS). Proposed by [21], IS offers a way to quantitatively evaluate the quality of generated samples. The score was motivated by the following considerations: (i) The conditional label distribution of samples containing meaningful objects should have low entropy, and (ii) The variability of the samples should be high, or equivalently, the marginalshould have high entropy. Finally, these desiderata are combined into one score,
初始分?jǐn)?shù)(IS)珊豹。 [21]提出,IS提供了一種定量評估生成樣本質(zhì)量的方法榕订。得分的動機來自以下考慮:(i)含有有意義物體的樣品的條件標(biāo)簽分布應(yīng)具有低熵店茶,以及(ii)樣品的可變性應(yīng)高,或等效地劫恒,邊際應(yīng)具有高熵贩幻。最后,這些需求被合并為一個分?jǐn)?shù)两嘴,
The classi?er is Inception Net trained on Image Net which is publicly available. The authors found that this score is well-correlated with scores from human annotators [21]. Drawbacks include insensitivity to the prior distribution over labels and not being a proper distance.
分類器是在Image Net上訓(xùn)練的Inception Net丛楚,它是公開的。作者發(fā)現(xiàn)這個分?jǐn)?shù)與人類注釋者的分?jǐn)?shù)密切相關(guān)[21]憔辫。缺點包括對標(biāo)簽上的先前分布不敏感而不是適當(dāng)?shù)木嚯x趣些。
Fr′echet Inception Distance (FID). Proposed by [10], FID provides an alternative approach. To quantify the quality of generated samples, they are ?rst embedded into a feature space given by (a speci?c layer) of Inception Net. Then, viewing the embedding layer as a continuous multivariate Gaussian, the mean and covariance is estimated for both the generated data and the real data. The Fr′echet distance between these two Gaussians is then used to quantify the quality of the samples, i.e.
Fr'echet初始距離(FID)。 [10]提出贰您,F(xiàn)ID提供了另一種方法坏平。為了量化生成的樣本的質(zhì)量,它們首先嵌入由Inception Net(特定層)給出的特征空間中锦亦。然后舶替,將嵌入層視為連續(xù)的多元高斯,對生成的數(shù)據(jù)和實際數(shù)據(jù)估計均值和協(xié)方差杠园。然后使用這兩個高斯之間的Fr'echet距離來量化樣本的質(zhì)量顾瞪,即
A signi?cant drawback of both measures is the inability to detect over?tting. A “memory GAN” which stores all training samples would score perfectly.
兩種措施的一個顯著缺點是無法檢測過度配置。存儲所有訓(xùn)練樣本的“記憶GAN”將得分很好抛蚁。
A very recent study comparing several GANs using IS has been presented by Fedus et al. [6]. The authors focus on IS and consider a smaller subset of GANs. In contrast, our focus is on providing a fair assessment of the current state-of-the-art GANs using FID, as well as precision and recall, and also verifying the robustness of these models in a large-scale empirical evaluation.
Fedus等人最近提出了一項使用IS比較幾種GAN的研究玲昧。 [6]。作者關(guān)注IS并考慮較小的GAN子集篮绿。相比之下,我們的重點是使用FID對當(dāng)前最先進(jìn)的GAN進(jìn)行公平評估吕漂,以及精確和召回亲配,并在大規(guī)模實證評估中驗證這些模型的穩(wěn)健性。
3. Flavors of Generative Adversarial Networks
3.生成對抗網(wǎng)絡(luò)的風(fēng)味
In this work we focus on unconditional generative adversarial networks. In this setting, only unlabeled data is available for learning. The optimization problems arising from existing approaches differ by (i) the constraint on the discriminators output and corresponding loss, and the presence and application of gradient norm penalty.
在這項工作中,我們專注于無條件的生成對抗網(wǎng)絡(luò)吼虎。在此設(shè)置中犬钢,只有未標(biāo)記的數(shù)據(jù)可供學(xué)習(xí)。由現(xiàn)有方法引起的優(yōu)化問題的不同之處在于:(i)對鑒別器輸出的約束和相應(yīng)的損失思灰,以及梯度范數(shù)懲罰的存在和應(yīng)用玷犹。
In the original GAN formulation [8] two loss functions were proposed. In the minimax GAN the discriminator outputs a probability and the loss function is the negative loglikelihood of a binary classi?cation task (MM GAN in Table 1). Here the generator learns to generate samples that have a low probability of being fake. To improve the gradient signal, the authors also propose the non-saturating loss (NS GAN in Table 1), where the generator instead aims to maximize the probability of generated samples being real. In Wasserstein GAN [1] the discriminator is allowed to output a real number and the objective function is equivalent to the MM GAN loss without the sigmoid (WGAN in Table 1). The authors prove that, under an optimal (Lipschitz smooth) discriminator, minimizing the value function with respect to the generator minimizes the Wasserstein distance between model and data distributions. Weights of the discriminator are clipped to a small absolute value to enforce smoothness. To improve on the stability of the training, Gulrajani et al. [9] instead add a soft constraint on the norm of the gradient which encourages the discriminator to be 1-Lipschitz. The gradient norm is evaluated on points obtained by linear interpolation between data points and generated samples where the optimal discriminator should have unit gradient norm [9].
在最初的GAN公式[8]中,提出了兩種損失函數(shù)洒疚。在minimax GAN中歹颓,鑒別器輸出概率,而損失函數(shù)是二元分類任務(wù)的負(fù)對數(shù)似然(表1中的MM GAN)油湖。在這里巍扛,生成器學(xué)習(xí)生成具有低假性概率的樣本。為了改善梯度信號乏德,作者還提出了非飽和損耗(表1中的NS GAN)撤奸,其中發(fā)生器的目的是最大化生成樣本的真實概率。在Wasserstein GAN [1]中喊括,允許鑒別器輸出實數(shù)胧瓜,目標(biāo)函數(shù)等效于沒有S形的MM GAN損失(表1中的WGAN)。作者證明郑什,在最優(yōu)(Lipschitz平滑)鑒別器下府喳,最小化相對于發(fā)生器的值函數(shù)最小化模型和數(shù)據(jù)分布之間的Wasserstein距離。將鑒別器的權(quán)重限制為小的絕對值以強制平滑蹦误。為了提高訓(xùn)練的穩(wěn)定性劫拢,Gulrajani等人。 [9]而是在梯度的范數(shù)上添加一個軟約束强胰,這會促使鑒別器為1-Lipschitz舱沧。通過數(shù)據(jù)點和生成樣本之間的線性插值獲得的點評估梯度范數(shù),其中最優(yōu)鑒別器應(yīng)具有單位梯度范數(shù)[9]偶洋。
Table 1: Generator and discriminator loss functions. The main difference whether the discriminator outputs a probability (MM GAN, NS GAN, DRAGAN) or its output is unbounded (WGAN, WGAN GP, LS GAN, BEGAN), whether the gradient penalty is present (WGAN GP, DRAGAN) and where is it evaluated. We chose those models based on their popularity.
表1:發(fā)生器和鑒別器丟失功能熟吏。鑒別器輸出概率(MM GAN,NS GAN玄窝,DRAGAN)或其輸出的主要區(qū)別是無界(WGAN牵寺,WGAN GP,LS GAN恩脂,BEGAN)帽氓,是否存在梯度懲罰(WGAN GP,DRAGAN)以及其中它評估俩块。我們根據(jù)它們的受歡迎程度選擇了這些模
Gradient norm penalty can also be added to both MM GAN and NS GAN and evaluated around the data manifold (DRAGAN [14] in Table 1 based on NS GAN). This encourages the discriminator to be piecewise linear around the data manifold. Note that the gradient norm can also be evaluated between fake and real points, similarly to WGAN GP, and added to either MM GAN or NS GAN [6].
梯度范數(shù)罰分也可以添加到MM GAN和NS GAN黎休,并圍繞數(shù)據(jù)流形進(jìn)行評估(表1中基于NS GAN的DRAGAN [14])浓领。這促使鑒別器在數(shù)據(jù)流形周圍呈分段線性。請注意势腮,漸變范數(shù)也可以在假點和真實點之間進(jìn)行評估联贩,類似于WGAN GP,并添加到MM GAN或NS GAN [6]捎拯。
Mao et al. [16] propose a least-squares loss for the discriminator and show that minimizing the corresponding objective (LS GAN in Table 1) implicitly minimizes the Pearsondivergence. The idea is to provide smooth loss which saturates slower than the sigmoid cross-entropy loss of the original MM GAN.
毛等人泪幌。 [16]提出鑒別器的最小二乘損失,并表明最小化相應(yīng)的目標(biāo)(表1中的LS GAN)隱含地使Pearson偏差最小化署照。這個想法是提供平滑的損失祸泪,其比原始MM GAN的S形交叉熵?fù)p失更慢。
Finally, Berthelot et al. [4] propose to use an autoencoder as a discriminator and optimize a lower bound of the Wasserstein distance between auto-encoder loss distributions on real and fake data. They introduce an additional hyperparameter γ to control the equilibrium between the generator and discriminator.
最后藤树,Berthelot等人浴滴。 [4]建議使用自動編碼器作為鑒別器,并優(yōu)化真實和偽數(shù)據(jù)上自動編碼器損耗分布之間的Wasserstein距離的下限岁钓。他們引入了額外的超參數(shù)γ來控制發(fā)生器和鑒別器之間的平衡升略。
4. Challenges of a Fair Comparison
4.公平比較的挑戰(zhàn)
There are several interesting dimensions to this problem, and there is no single right way to compare these models (i.e. the loss function used in each GAN). Unfortunately, due to the combinatorial explosion in the number of choices and their ordering, not all relevant options can be explored. While there is no de?nite answer on how to best compare two models, in this work we have made several pragmatic choices which were motivated by two practical concerns: providing a neutral and fair comparison, and a hard limit on the computational budget.
這個問題有幾個有趣的方面黍图,沒有一種正確的方法來比較這些模型(即每個GAN中使用的損失函數(shù))注祖。不幸的是,由于選擇數(shù)量和排序的組合爆炸求冷,并非所有相關(guān)選項都可以探索钧大。雖然對如何最好地比較兩個模型沒有明確的答案翰撑,但在這項工作中,我們做出了幾個實用的選擇啊央,這些選擇受到兩個實際問題的推動:提供中立和公平的比較眶诈,以及對計算預(yù)算的硬性限制。
Which metric to use? Comparing models implies access to some metric. As discussed in Section 2, classic measures, such as model likelihood cannot be applied. We will argue for and study two sets of evaluation metrics in Section 5: FID, which can be computed on all data sets, and precision, recall, and, which we can compute for the proposed tasks.
使用哪個指標(biāo)瓜饥?比較模型意味著訪問某些指標(biāo)逝撬。如第2節(jié)所述,不能應(yīng)用經(jīng)典度量乓土,例如模型可能性宪潮。我們將在第5節(jié)中討論并研究兩組評估指標(biāo):可以在所有數(shù)據(jù)集上計算的FID,以及我們可以為建議任務(wù)計算的精度趣苏,召回和狡相。
How to compare models? Even when the metric is ?xed, a given algorithm can achieve very different scores, when varying the architecture, hyperparameters, random initialization (i.e. random seed for initial network weights), or the data set. Sensible targets include best score across all dimensions (e.g. to claim the best performance on a ?xed data set), average or median score (rewarding models which are good in expectation), or even the worst score (rewarding models with worst-case robustness). These choices can even be combined — for example, one might train the model multiple times using the best hyperparameters, and average the score over random initializations).
如何比較模型?即使當(dāng)度量被固定時食磕,當(dāng)改變體系結(jié)構(gòu)尽棕,超參數(shù),隨機初始化(即初始網(wǎng)絡(luò)權(quán)重的隨機種子)或數(shù)據(jù)集時彬伦,給定算法可以實現(xiàn)非常不同的分?jǐn)?shù)滔悉。明智的目標(biāo)包括所有維度的最佳分?jǐn)?shù)(例如蟀悦,在固定數(shù)據(jù)集上獲得最佳表現(xiàn)),平均或中位數(shù)分?jǐn)?shù)(獎勵期望良好的模型)氧敢,甚至最差分?jǐn)?shù)(具有最差情況穩(wěn)健性的獎勵模型) 。甚至可以組合這些選擇 - 例如询张,可以使用最佳超參數(shù)對模型進(jìn)行多次訓(xùn)練孙乖,并對隨機初始化的分?jǐn)?shù)進(jìn)行平均)。
For each of these dimensions, we took several pragmatic choices to reduce the number of possible con?gurations, while still exploring the most relevant options.
對于每個維度份氧,我們采取了幾種實用的選擇來減少可能的配置數(shù)量唯袄,同時仍然在探索最相關(guān)的選項。
1. Architecture: We use the same architecture for all models. The architecture is rich enough to achieve good performance.
1.架構(gòu):我們對所有模型使用相同的架構(gòu)蜗帜。該架構(gòu)足夠豐富恋拷,可以實現(xiàn)良好的性能。
2. Hyperparameters: For both training hyperparameters (e.g. the learning rate), as well as model speci?c ones (e.g. gradient penalty multiplier), there are two valid approaches: (i) perform the hyperparameter optimization for each data set, or (ii) perform the hyperparame
2.超參數(shù):對于訓(xùn)練超參數(shù)(例如學(xué)習(xí)速率)以及模型特定的參數(shù)(例如梯度罰分乘數(shù))厅缺,有兩種有效的方法:(i)對每個數(shù)據(jù)集執(zhí)行超參數(shù)優(yōu)化蔬顾,或(ii)執(zhí)行超級參數(shù)
ter optimization on one data set and infer a good range of hyperparameters to use on other data sets. We explore both avenues in Section 6.
在一個數(shù)據(jù)集上進(jìn)行優(yōu)化,并推斷出在其他數(shù)據(jù)集上使用的一系列超參數(shù)湘捎。我們將探討第6節(jié)中的兩種途徑诀豁。
3. Random seed: Even with everything else being ?xed, varying the random seed may have a non-trivial in?uence on the results. We study this particular effect and report the corresponding con?dence intervals.
3.隨機種子:即使其他一切都被固定,隨機種子的變化也可能對結(jié)果產(chǎn)生不平凡的影響窥妇。我們研究這種特殊效應(yīng)并報告相應(yīng)的信心間隔舷胜。
4. Data set: We chose four popular data sets from GAN literature and report results separately for each data set.
4.數(shù)據(jù)集:我們從GAN文獻(xiàn)中選擇了四個流行數(shù)據(jù)集,并分別為每個數(shù)據(jù)集報告結(jié)果活翩。
5. Computational budget: Depending on the budget to optimize the parameters, different algorithms can achieve the best results. We explore how the results vary depending on the budget.
5.計算預(yù)算:根據(jù)預(yù)算來優(yōu)化參數(shù)烹骨,不同的算法可以獲得最佳結(jié)果。我們將根據(jù)預(yù)算探索結(jié)果的變化情況材泄。
In practice, one can either use hyperparameter values suggested by respective authors, or try to optimize them. Figure 5 and in particular Figure 15 show that optimization is necessary. Hence, we optimize the hyperparameters for each model and data set by performing a random search.2 We concur that the models with fewer hyperparameters have an advantage over models with many hyperparameters, but consider this fair as it re?ects the experience of practitioners searching for good hyperparameters for their setting.
在實踐中沮焕,可以使用各自作者建議的超參數(shù)值,或者嘗試優(yōu)化它們脸爱。圖5遇汞,特別是圖15顯示了優(yōu)化是必要的。因此簿废,我們通過執(zhí)行隨機搜索來優(yōu)化每個模型和數(shù)據(jù)集的超參數(shù)空入。我們同意,具有較少超參數(shù)的模型優(yōu)于具有許多超參數(shù)的模型族檬,但考慮到這一點歪赢,因為它反映了從業(yè)者尋求良好的經(jīng)驗超參數(shù)為他們的設(shè)置。
文章引用于 http://tongtianta.site/paper/3092
編輯 Lornatang
校準(zhǔn) Lornatang