文章作者:Tyan
博客:noahsnail.com ?|? CSDN ?|? 簡(jiǎn)書
聲明:作者翻譯論文僅為學(xué)習(xí),如有侵權(quán)請(qǐng)聯(lián)系作者刪除博文扶供,謝謝筛圆!
翻譯論文匯總:https://github.com/SnailTyan/deep-learning-papers-translation
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Abstract
The Super-Resolution Generative Adversarial Network (SR-GAN) [1] is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN [2] to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge [3]. The code is available at https://github.com/xinntao/ESRGAN.
摘要
超分辨率生成對(duì)抗網(wǎng)絡(luò)(SR-GAN)[1]是一項(xiàng)開創(chuàng)性的工作,其能夠在單圖像超分辨率期間生成逼真的紋理椿浓。然而太援,虛幻的細(xì)節(jié)常常伴隨討厭的偽像。為了進(jìn)一步增強(qiáng)視覺質(zhì)量扳碍,我們充分研究了SRGAN的三個(gè)關(guān)鍵組成部分——網(wǎng)絡(luò)架構(gòu)提岔、對(duì)抗損失和感知損失,并對(duì)每一個(gè)都進(jìn)行了改進(jìn)以取得增強(qiáng)的SRGAN(ESRGAN)左腔。特別的是唧垦,我們引入了沒有批歸一化的Residual-in-Residual Dense Block(RRDB)作為基本的網(wǎng)絡(luò)構(gòu)架單元。此外液样,我們借鑒了相對(duì)GAN[2]中的思想振亮,讓判別器預(yù)測(cè)相對(duì)真實(shí)性而不是絕對(duì)值。最后鞭莽,我們通過使用激活前的特征改進(jìn)感知損失坊秸,這可以對(duì)亮度一致性和紋理復(fù)原提供更強(qiáng)的監(jiān)督。得益于這些改進(jìn)澎怒,相比于SRGAN褒搔,提出的ESRGAN一致地取得了更好的視覺質(zhì)量、更多真實(shí)自然的紋理喷面,并在PIRM2018-SR Challenge[3]中獲得了第一名星瘾。源碼地址:https://github.com/xinntao/ESRGAN。
1 Introduction
Single image super-resolution (SISR), as a fundamental low-level vision problem, has attracted increasing attention in the research community and AI companies. SISR aims at recovering a high-resolution (HR) image from a single low-resolution (LR) one. Since the pioneer work of SRCNN proposed by Dong et al. [4], deep convolution neural network (CNN) approaches have brought prosperous development. Various network architecture designs and training strategies have continuously improved the SR performance, especially the Peak Signal-toNoise Ratio (PSNR) value [5,6,7,1,8,9,10,11,12]. However, these PSNR-oriented approaches tend to output over-smoothed results without sufficient high-frequency details, since the PSNR metric fundamentally disagrees with the subjective evaluation of human observers [1].
1 引言
作為一個(gè)基本的低級(jí)視覺問題惧辈,單圖像超分辨率(SISR)在研究領(lǐng)域和AI公司中引起了越來越多的關(guān)注琳状。SISR目標(biāo)是從一張低分辨率(LR)圖像復(fù)原出一張高分辨率(HR)圖像。從Dong等[4]提出SRCNN的開創(chuàng)性工作以來盒齿,深度卷積神經(jīng)網(wǎng)絡(luò)(CNN)方法帶來了繁榮的發(fā)展念逞。各種網(wǎng)絡(luò)架構(gòu)設(shè)計(jì)和訓(xùn)練策略持續(xù)地改善SR性能,尤其是峰值信噪比(PSNR)的值[5,6,7,1,8,9,10,11,12]边翁。然而翎承,這些面向PSNR的方法趨向于輸出過于平滑的結(jié)果,缺少足夠的高頻細(xì)節(jié)符匾,因?yàn)镻SNR度量從根本上與人類觀察者的主觀評(píng)價(jià)[1]不符叨咖。
Several perceptual-driven methods have been proposed to improve the visual quality of SR results. For instance, perceptual loss [13,14] is proposed to optimize super-resolution model in a feature space instead of pixel space. Generative adversarial network [15] is introduced to SR by [1,16] to encourage the network to favor solutions that look more like natural images. The semantic image prior is further incorporated to improve recovered texture details [17]. One of the milestones in the way pursuing visually pleasing results is SRGAN [1]. The basic model is built with residual blocks [18] and optimized using perceptual loss in a GAN framework. With all these techniques, SRGAN significantly improves the overall visual quality of reconstruction over PSNR-oriented methods.
已經(jīng)提出了一些感知驅(qū)動(dòng)的方法來改進(jìn)SR結(jié)果的視覺質(zhì)量。例如,提出感知損失[13,14]來優(yōu)化在特征空間而不是像素空間中的超分辨率模型甸各。[1,16]引入生成對(duì)抗網(wǎng)絡(luò)[15]到SR中以鼓勵(lì)網(wǎng)絡(luò)支持看起來更像自然圖像的解仰剿。語義圖像先驗(yàn)被進(jìn)一步合并以改善恢復(fù)的紋理細(xì)節(jié)[17]。追尋視覺愉悅效果的方法中的里程碑之一是SRGAN[1]痴晦。基本模型是用殘差塊構(gòu)建的[18]琳彩,并在GAN框架中使用感知損失來進(jìn)行優(yōu)化誊酌。通過所有這些技術(shù),與面向PSNR的方法相比露乏,SRGAN顯著改善了重建的整體視覺質(zhì)量碧浊。
However, there still exists a clear gap between SRGAN results and the ground-truth (GT) images, as shown in Fig. 1. In this study, we revisit the key components of SRGAN and improve the model in three aspects. First, we improve the network structure by introducing the Residual-in-Residual Dense Block (RDDB), which is of higher capacity and easier to train. We also remove Batch Normalization (BN) [19] layers as in [20] and use residual scaling [21,20] and smaller initialization to facilitate training a very deep network. Second, we improve the discriminator using Relativistic average GAN (RaGAN) [2], which learns to judge “whether one image is more realistic than the other” rather than “whether one image is real or fake”. Our experiments show that this improvement helps the generator recover more realistic texture details. Third, we propose an improved perceptual loss by using the VGG features before activation instead of after activation as in SRGAN. We empirically find that the adjusted perceptual loss provides sharper edges and more visually pleasing results, as will be shown in Sec. 4.4. Extensive experiments show that the enhanced SRGAN, termed ESRGAN, consistently outperforms state-of-the-art methods in both sharpness and details (see Fig. 1 and Fig. 7).
Fig.1: The super-resolution results of ×4 for SRGAN, the proposed ESRGAN and the ground-truth. ESRGAN outperforms SRGAN in sharpness and details.
Fig.7: Qualitative results of ESRGAN. ESRGAN produces more natural textures, e.g., animal fur, building structure and grass texture, and also less unpleasant artifacts, e.g., artifacts in the face by SRGAN.
然而,如圖1所示瘟仿,SRGAN結(jié)果與真實(shí)(GT)圖像之間仍然存在明顯的差距箱锐。在本研究中,我們重新審視SRGAN的關(guān)鍵組件劳较,并在三個(gè)方面改進(jìn)模型驹止。首先,我們通過引入Residual-in-Residual Dense Block(RDDB)改進(jìn)網(wǎng)絡(luò)架構(gòu)观蜗,該結(jié)構(gòu)具有較高的能力且更容易訓(xùn)練臊恋。我們像[20]中一樣也移除了批歸一化(BN)[19]層,使用殘差縮放[21,20]和更小的初始化來促進(jìn)訓(xùn)練一個(gè)非常深的網(wǎng)絡(luò)墓捻。其次抖仅,我們使用相對(duì)平均GAN(RaGAN)[2]來改進(jìn)判別器,RaGAN學(xué)習(xí)判斷“一張圖像是否比另一張更真實(shí)”而不是“一張圖像時(shí)真的還是假的”砖第。我們的實(shí)驗(yàn)表明這個(gè)改進(jìn)有助于生成器恢復(fù)更多的真實(shí)紋理細(xì)節(jié)撤卢。第三,我們提出了一種改進(jìn)的感知損失梧兼,使用激活之前的VGG特征來代替SRGAN中激活之后的VGG特征放吩。從經(jīng)驗(yàn)上我們發(fā)現(xiàn)調(diào)整之后的感知損失提供了更清晰的邊緣和視覺上更令人滿意的結(jié)果,如4.4節(jié)所示袱院。大量的實(shí)驗(yàn)表明增強(qiáng)SRGAN(稱為ESRGAN)在清晰度和細(xì)節(jié)方面都始終優(yōu)于最新的方法(見圖1和圖7)屎慢。
圖1:SRGAN、提出的ESRGAN和實(shí)際的4倍超分辨率結(jié)果忽洛。ESRGAN在清晰度和細(xì)節(jié)方面優(yōu)于SRGAN腻惠。
圖7:ESRGAN的定性結(jié)果。ESRGAN生成了更自然的紋理欲虚,例如集灌,動(dòng)物皮毛,建筑物結(jié)構(gòu)和草坪紋理,以及更少的令人不快的偽影欣喧,例如SRGAN中臉上的偽影腌零。
We take a variant of ESRGAN to participate in the PIRM-SR Challenge [3]. This challenge is the first SR competition that evaluates the performance in a perceptual-quality aware manner based on [22], where the authors claim that distortion and perceptual quality are at odds with each other. The perceptual quality is judged by the non-reference measures of Ma’s score [23] and NIQE [24], i.e., perceptual index . A lower perceptual index represents a better perceptual quality.
我們采用ESRGAN的一個(gè)變種來參加PIRM-SR挑戰(zhàn)賽[3]。這個(gè)挑戰(zhàn)是第一個(gè)在[22]的基礎(chǔ)上以察覺感知質(zhì)量的方式評(píng)估性能的SR競(jìng)賽唆阿,[22]中作者聲稱失真和感知質(zhì)量相互矛盾益涧。感知質(zhì)量是通過Ma分?jǐn)?shù)[23]和NIQE[24]的非參考度量來判斷的,即感知指數(shù)驯鳖。更低的感知指數(shù)表示更好的感知質(zhì)量闲询。
As shown in Fig. 2, the perception-distortion plane is divided into three regions defined by thresholds on the Root-Mean-Square Error (RMSE), and the algorithm that achieves the lowest perceptual index in each region becomes the regional champion. We mainly focus on region 3 as we aim to bring the perceptual quality to a new high. Thanks to the aforementioned improvements and some other adjustments as discussed in Sec. 4.6, our proposed ESRGAN won the first place in the PIRM-SR Challenge (region 3) with the best perceptual index.
Fig.2: Perception-distortion plane on PIRM self validation dataset. We show the baselines of EDSR [20], RCAN [12] and EnhanceNet [16], and the submitted ESRGAN model. The blue dots are produced by image interpolation.
如圖2所示,通過均方根誤差(RMSE)的閾值浅辙,將感知失真平面分成三個(gè)區(qū)域扭弧,每個(gè)區(qū)域中取得最低感知指數(shù)的算法為區(qū)域冠軍。我們主要關(guān)注區(qū)域3记舆,因?yàn)槲覀冎荚趯⒏兄|(zhì)量提升到新的高度鸽捻。由于上述的改進(jìn)和4.6節(jié)中討論的一些其它調(diào)整,我們提出的ESRGAN在PIRM-SR挑戰(zhàn)賽(區(qū)域3)中以最好的感知指數(shù)贏得了第一名泽腮。
圖2:PIRM自驗(yàn)證集上的感知失真平面御蒲。我們展示了EDSR[20],RCAN[12],EnhanceNet[16]以及提交的ESRGAN模型的基準(zhǔn)線。藍(lán)色的點(diǎn)通過圖像插值生成荒叶。
In order to balance the visual quality and RMSE/PSNR, we further propose the network interpolation strategy, which could continuously adjust the reconstruction style and smoothness. Another alternative is image interpolation, which directly interpolates images pixel by pixel. We employ this strategy to participate in region 1 and region 2. The network interpolation and image interpolation strategies and their differences are discussed in Sec. 3.4.
為了平衡視覺質(zhì)量和RMSE/PSNR,我們進(jìn)一步提出了網(wǎng)絡(luò)插值策略痰滋,其可以持續(xù)地調(diào)整重建風(fēng)格和平滑度。另一種替代方案是圖像插值续崖,其直接逐像素地插值圖像敲街。我們采用這個(gè)策略來參加區(qū)域1和區(qū)域2。網(wǎng)絡(luò)插值和圖像插值策略以及它們的差異在3.4節(jié)中討論严望。
2 Related Work
We focus on deep neural network approaches to solve the SR problem. As a pioneer work, Dong et al. [4,25] propose SRCNN to learn the mapping from LR to HR images in an end-to-end manner, achieving superior performance against previous works. Later on, the field has witnessed a variety of network architectures, such as a deeper network with residual learning [5], Laplacian pyramid structure [6], residual blocks [1], recursive learning [7,8], densely connected network [9], deep back projection [10] and residual dense network [11]. Specifically, Lim et al. [20] propose EDSR model by removing unnecessary BN layers in the residual block and expanding the model size, which achieves significant improvement. Zhang et al. [11] propose to use effective residual dense block in SR, and they further explore a deeper network with channel attention [12], achieving the state-of-the-art PSNR performance. Besides supervised learning, other methods like reinforcement learning [26] and unsupervised learning [27] are also introduced to solve general image restoration problems.
2 相關(guān)工作
我們專注于解決SR問題的深度神經(jīng)網(wǎng)絡(luò)方法多艇。作為開創(chuàng)性工作,Dong等[4,25]提出了SRCNN以端到端的方式來學(xué)習(xí)從LR到SR圖像的映射像吻,取得了優(yōu)于之前工作的性能峻黍。后來,這個(gè)領(lǐng)域見證了各種網(wǎng)絡(luò)架構(gòu)拨匆,例如具有殘差學(xué)習(xí)的神經(jīng)網(wǎng)絡(luò)[5]姆涩,拉普拉斯金字塔結(jié)構(gòu)[6],殘差塊[1]惭每,遞歸學(xué)習(xí)[7,8]骨饿,密集連接網(wǎng)絡(luò)[9],深度反向投影[10]和殘差密集網(wǎng)絡(luò)[11]。具體來說宏赘,Lim等[20]通過移除殘差塊中不必要的BN層以及擴(kuò)展模型尺寸提出了EDSR模型绒北,取得了顯著的改善。Zhang等[11]在SR中提出了使用有效的殘差密集塊察署,并且他們進(jìn)一步開發(fā)了一個(gè)使用通道注意力[12]的更深網(wǎng)絡(luò)闷游,取得了最佳的PSNR性能。除了監(jiān)督學(xué)習(xí)之外贴汪,也引入了其它的方法像強(qiáng)化學(xué)習(xí)[26]以及無監(jiān)督學(xué)習(xí)[27]來解決一般的圖像復(fù)原問題储藐。
Several methods have been proposed to stabilize training a very deep model. For instance, residual path is developed to stabilize the training and improve the performance [18,5,12]. Residual scaling is first employed by Szegedy et al. [21] and also used in EDSR. For general deep networks, He et al. [28] propose a robust initialization method for VGG-style networks without BN. To facilitate training a deeper network, we develop a compact and effective residual-in-residual dense block, which also helps to improve the perceptual quality.
已經(jīng)提出了一些方法來穩(wěn)定訓(xùn)練非常深的模型。例如嘶是,開發(fā)殘差路徑來穩(wěn)定訓(xùn)練并改善性能[18,5,12]。Szegedy等[21]首次采用殘差縮放蛛碌,也在EDSR中使用聂喇。對(duì)于一般的深度網(wǎng)絡(luò),He等[28]為沒有BN的VGG風(fēng)格的網(wǎng)絡(luò)提出了一個(gè)魯棒的初始化方法蔚携。為了便于訓(xùn)練更深的網(wǎng)絡(luò)希太,我們也開發(fā)了一個(gè)簡(jiǎn)潔有效的殘差套殘差密集塊,這有助于改善感知質(zhì)量酝蜒。
Perceptual-driven approaches have also been proposed to improve the visual quality of SR results. Based on the idea of being closer to perceptual similarity [29,14] perceptual loss [13] is proposed to enhance the visual quality by minimizing the error in a feature space instead of pixel space. Contextual loss [30] is developed to generate images with natural image statistics by using an objective that focuses on the feature distribution rather than merely comparing the appearance. Ledig et al. [1] propose SRGAN model that uses perceptual loss and adversarial loss to favor outputs residing on the manifold of natural images. Sajjadi et al. [16] develop a similar approach and further explored the local texture matching loss. Based on these works, Wang et al. [17] propose spatial feature transform to effectively incorporate semantic prior in an image and improve the recovered textures.
感知驅(qū)動(dòng)的方法已經(jīng)被提出用來改善SR結(jié)果的視覺質(zhì)量誊辉。基于更接近于感知相似度[29,14]的想法提出感知損失[13]亡脑,通過最小化特征空間而不是像素空間的誤差來增強(qiáng)視覺質(zhì)量堕澄。通過使用專注于特征分布而不是只比較外觀的目標(biāo)函數(shù),開發(fā)上下文損失[30]來生成具有自然圖像統(tǒng)計(jì)的圖像霉咨。Ledig等[1]提出SRGAN模型蛙紫,使用感知損失和對(duì)抗損失來支持位于自然圖像流形的輸出。Sajjadi等[16]開發(fā)了類似的方法并進(jìn)一步探索了局部紋理匹配損失途戒】痈担基于這些工作,Wang等[17]提出空間特征變換來有效地將語義先驗(yàn)合并到圖像中并改進(jìn)恢復(fù)的紋理喷斋。
Throughout the literature, photo-realism is usually attained by adversarial training with GAN [15]. Recently there are a bunch of works that focus on developing more effective GAN frameworks. WGAN [31] proposes to minimize a reasonable and efficient approximation of Wasserstein distance and regularizes discriminator by weight clipping. Other improved regularization for discriminator includes gradient clipping [32] and spectral normalization [33]. Relativistic discriminator [2] is developed not only to increase the probability that generated data are real, but also to simultaneously decrease the probability that real data are real. In this work, we enhance SRGAN by employing a more effective relativistic average GAN.
在整個(gè)文獻(xiàn)中唁毒,通常通過與GAN[15]的對(duì)抗訓(xùn)練來獲得寫實(shí)主義照片。最近有很多工作致力于開發(fā)更有效的GAN框架星爪。WGAN[31]提出最小化Wasserstein距離的合理和有效近似浆西,并通過權(quán)重修剪來正則化判別器。其它對(duì)判別器的正則化包括梯度修剪[32]和譜歸一化[33]顽腾。開發(fā)的相對(duì)判別器[2]不僅提高了生成數(shù)據(jù)真實(shí)性的概率室谚,而且同時(shí)降低了真實(shí)數(shù)據(jù)真實(shí)性的概率。在這項(xiàng)工作中,我們通過采用更有效的相對(duì)平均GAN來增強(qiáng)SRGAN秒赤。
SR algorithms are typically evaluated by several widely used distortion measures, e.g., PSNR and SSIM. However, these metrics fundamentally disagree with the subjective evaluation of human observers [1]. Non-reference measures are used for perceptual quality evaluation, including Ma’s score [23] and NIQE [24], both of which are used to calculate the perceptual index in the PIRM-SR Challenge [3]. In a recent study, Blau et al. [22] find that the distortion and perceptual quality are at odds with each other.
SR通常通過幾種廣泛使用的失真測(cè)量方式來進(jìn)行評(píng)估猪瞬,例如PSNR和SSIM。然而入篮,這些度量從根本上與人類觀察者的主觀評(píng)估不一致[1]陈瘦。非參考度量通常用于感知質(zhì)量評(píng)估,包括Ma的分?jǐn)?shù)[23]和NIQE[24]潮售,兩者都用于PIRM-SR挑戰(zhàn)賽中[3]計(jì)算感知指數(shù)痊项。在最近的一項(xiàng)研究中,Blau等[22]發(fā)現(xiàn)失真和感知質(zhì)量相互矛盾酥诽。
3 Proposed Methods
Our main aim is to improve the overall perceptual quality for SR. In this section, we first describe our proposed network architecture and then discuss the improvements from the discriminator and perceptual loss. At last, we describe the network interpolation strategy for balancing perceptual quality and PSNR.
3 提出的方法
我們的主要目標(biāo)是提高SR的整體感知質(zhì)量鞍泉。在本節(jié)中,我們首先描述我們提出的網(wǎng)絡(luò)架構(gòu)肮帐,然后討論判別器和感知損失的改進(jìn)咖驮。最后,我們描述用于平衡感知質(zhì)量和PSNR的網(wǎng)絡(luò)插值策略训枢。
3.1 Network Architecture
In order to further improve the recovered image quality of SRGAN, we mainly make two modifications to the structure of generator G: 1) remove all BN layers; 2) replace the original basic block with the proposed Residual-in-Residual Dense Block (RRDB), which combines multi-level residual network and dense connections as depicted in Fig. 4.
Fig.4: Left: We remove the BN layers in residual block in SRGAN. Right: RRDB block is used in our deeper model and is the residual scaling parameter.
3.1 網(wǎng)絡(luò)架構(gòu)
為了進(jìn)一步改進(jìn)SRGAN復(fù)原的圖像質(zhì)量托修,我們主要對(duì)生成器G的架構(gòu)進(jìn)行了兩個(gè)修改:1)移除所有的BN層;2)用提出的殘差套殘差密集塊(RRDB)替換原始的基本塊恒界,它結(jié)合了多層殘差網(wǎng)絡(luò)和密集連接睦刃,如圖4所示。
圖4:左:我們移除了SRGAN殘差塊中的BN層十酣。右:RRDB塊用在我們的更深模型中涩拙,是殘差尺度參數(shù)。
Removing BN layers has proven to increase performance and reduce computational complexity in different PSNR-oriented tasks including SR [20] and deblurring [35]. BN layers normalize the features using mean and variance in a batch during training and use estimated mean and variance of the whole training dataset during testing. When the statistics of training and testing datasets differ a lot, BN layers tend to introduce unpleasant artifacts and limit the generalization ability. We empirically observe that BN layers are more likely to bring artifacts when the network is deeper and trained under a GAN framework. These artifacts occasionally appear among iterations and different settings, violating the needs for a stable performance over training. We therefore remove BN layers for stable training and consistent performance. Furthermore, removing BN layers helps to improve generalization ability and to reduce computational complexity and memory usage.
在不同的面向PSNR的任務(wù)(包括SR[20]和去模糊[35])中耸采,已經(jīng)證實(shí)了移除BN層可以提高性能并降低計(jì)算復(fù)雜度吃环。BN層在訓(xùn)練中使用一批數(shù)據(jù)的均值和方差對(duì)特征進(jìn)行歸一化,并在測(cè)試中使用整個(gè)訓(xùn)練集估計(jì)的均值和方差洋幻。當(dāng)訓(xùn)練集和測(cè)試集的統(tǒng)計(jì)差別很大時(shí)郁轻,BN層趨向于引入令人不快的偽影并限制泛化能力。我們憑經(jīng)驗(yàn)觀察到文留,當(dāng)網(wǎng)絡(luò)較深且在GAN架構(gòu)下訓(xùn)練時(shí)好唯,BN層更可能帶來偽影。這些偽影有時(shí)會(huì)在迭代中間和不同的設(shè)置下出現(xiàn)燥翅,違背了訓(xùn)練過程中對(duì)于穩(wěn)定性能的需求骑篙。因此,我們?yōu)榱朔€(wěn)定的訓(xùn)練和一致的性能移除了BN層森书。此外靶端,移除BN層有助于提高泛化能力并降低計(jì)算復(fù)雜度及內(nèi)存使用谎势。
We keep the high-level architecture design of SRGAN (see Fig. 3), and use a novel basic block namely RRDB as depicted in Fig. 4. Based on the observation that more layers and connections could always boost performance [20,11,12], the proposed RRDB employs a deeper and more complex structure than the original residual block in SRGAN. Specifically, as shown in Fig. 4, the proposed RRDB has a residual-in-residual structure, where residual learning is used in different levels. A similar network structure is proposed in [36] that also applies a multilevel residual network. However, our RRDB differs from [36] in that we use dense block [34] in the main path as [11], where the network capacity becomes higher benefiting from the dense connections.
Fig. 3: We employ the basic architecture of SRResNet [1], where most computation is done in the LR feature space. We could select or design “basic blocks” (e.g., residual block [18], dense block [34], RRDB) for better performance.
我們保留了SRGAN的高級(jí)架構(gòu)設(shè)計(jì)(見圖3),并使用了一個(gè)新穎的名為RRDB的基本塊杨名,如圖4所示脏榆。基于觀測(cè)台谍,更多的層和連接總是可以提升性能[20,11,12]须喂,與SRGAN中的原始?xì)埐顗K相比,提出的RRDB采用了更深更復(fù)雜的架構(gòu)趁蕊。具體地說坞生,如圖4所示,提出了的RRDB有殘差套殘差的結(jié)構(gòu)掷伙,其中殘差學(xué)習(xí)用在不同的級(jí)別中是己。[36]中提出的類似結(jié)構(gòu)也適用于多級(jí)殘差網(wǎng)絡(luò)。然而任柜,我們的RRDB與[36]的不同在于我們?cè)谥髀窂街惺褂昧巳鏪11]的密集塊[34]卒废,受益于密集連接其網(wǎng)絡(luò)容量變得更高。
圖3:我們采用SRResNet[1]的基本架構(gòu)乘盼,大多數(shù)計(jì)算都在LR特征空間進(jìn)行。我們可以為了更佳的性能選擇或設(shè)計(jì)“基礎(chǔ)塊”(例如俄烁,殘差塊[18]绸栅,密集塊[34],RRDB)页屠。
In addition to the improved architecture, we also exploit several techniques to facilitate training a very deep network: 1) residual scaling [21,20], i.e., scaling down the residuals by multiplying a constant between 0 and 1 before adding them to the main path to prevent instability; 2) smaller initialization, as we empirically find residual architecture is easier to train when the initial parameter variance becomes smaller. More discussion can be found in the supplementary material.
除了改進(jìn)架構(gòu)之外粹胯,我們也利用幾種技術(shù)來促進(jìn)訓(xùn)練非常深的網(wǎng)絡(luò):1)殘差縮放[21,20],例如在將殘差加到主路徑上之前辰企,通過將其乘以一個(gè)0-1之間的常量來縮小殘差以防止不穩(wěn)定性风纠;2)更小的初始化,因?yàn)槲覀儜{經(jīng)驗(yàn)發(fā)現(xiàn)當(dāng)初始參數(shù)方差變得更小時(shí)牢贸,殘差結(jié)構(gòu)更容易訓(xùn)練竹观。更多討論可在補(bǔ)充材料中找到。
The training details and the effectiveness of the proposed network will be presented in Sec. 4.
訓(xùn)練細(xì)節(jié)和提出網(wǎng)絡(luò)的有效性將在第4節(jié)中介紹潜索。
3.2 Relativistic Discriminator
Besides the improved structure of generator, we also enhance the discriminator based on the Relativistic GAN [2]. Different from the standard discriminator
in SRGAN, which estimates the probability that one input image is real and natural, a relativistic discriminator tries to predict the probability that a real
image is relatively more realistic than a fake one , as shown in Fig. 5.
Fig. 5: Difference between standard discriminator and relativistic discriminator.
3.2 相對(duì)判別器
除了改進(jìn)生成器架構(gòu)之外臭增,我們還在相對(duì)GAN[2]的基礎(chǔ)上增強(qiáng)了判斷器。不同于SRGAN中的標(biāo)注判別器竹习,估算輸入圖像是真實(shí)自然的概率誊抛,相對(duì)判別器嘗試預(yù)測(cè)真實(shí)圖像比假圖像相對(duì)更真實(shí)的概率,如圖5所示整陌。
圖5:標(biāo)準(zhǔn)判別器和相對(duì)判別器的差異拗窃。
Specifically, we replace the standard discriminator with the Relativistic average Discriminator RaD [2], denoted as . The standard discriminator in SRGAN can be expressed as , where is the sigmoid function and is the non-transformed discriminator output. Then the RaD is formulated as , where represents the operation of taking average for all fake data in the mini-batch. The discriminator loss is then defined as:
The adversarial loss for generator is in a symmetrical form:
where and stands for the input LR image. It is observed that the adversarial loss for generator contains both and . Therefore, our generator benefits from the gradients from both generated data and real data in adversarial training, while in SRGAN only generated part takes effect. In Sec. 4.4, we will show that this modification of discriminator helps to learn sharper edges and more detailed textures.
具體來說瞎领,我們用相對(duì)平均判別器RaD[2]代替標(biāo)準(zhǔn)判別器,記為随夸。SRGAN中的標(biāo)準(zhǔn)判別器可表示為九默,其中是sigmoid函數(shù),是非變換判別器輸出逃魄。然后RaD用公式表示為荤西,其中表示對(duì)小批次中所有假數(shù)據(jù)取平均值的操作。然后判別器損失定義為:
生成器的對(duì)抗損失呈對(duì)稱形式:
其中和代表輸入LR圖像伍俘⌒靶浚可以看出,生成器的對(duì)抗損失包含和癌瘾。因此觅丰,在對(duì)抗訓(xùn)練中,我們的生成器受益于生成數(shù)據(jù)和真實(shí)數(shù)據(jù)的梯度妨退,而在SRGAN中僅生成部分起作用妇萄。在4.4節(jié)中,我們將展示判別器的這種修改有助于學(xué)習(xí)更清晰的邊緣和更細(xì)致的紋理咬荷。
3.3 Perceptual Loss
We also develop a more effective perceptual loss by constraining on features before activation rather than after activation as practiced in SRGAN.
3.3 感知損失
通過約束激活之前的特征而不是SRGAN中實(shí)踐的激活之后的特征冠句,我們還開發(fā)了一種更有效的感知損失。
Based on the idea of being closer to perceptual similarity [29,14], Johnson et al. [13] propose perceptual loss and it is extended in SRGAN [1]. Perceptual loss is previously defined on the activation layers of a pre-trained deep network, where the distance between two activated features is minimized. Contrary to the convention, we propose to use features before the activation layers, which will overcome two drawbacks of the original design. First, the activated features are very sparse, especially after a very deep network, as depicted in Fig. 6. For example, the average percentage of activated neurons for image ‘baboon’ after VGG19-54 layer is merely . The sparse activation provides weak supervision and thus leads to inferior performance. Second, using features after activation also causes inconsistent reconstructed brightness compared with the ground-truth image, which we will show in Sec. 4.4.
Fig.6: Representative feature maps before and after activation for image ‘baboon’. With the network going deeper, most of the features after activation become inactive while features before activation contains more information.
基于更接近感知相似[29,14]的想法幸乒,Johnson等[13]提出了感知損失并在SRGAN[1]中得到了擴(kuò)展懦底。之前的感知損失定義在預(yù)訓(xùn)練深度網(wǎng)絡(luò)的激活層上,最小化兩個(gè)激活特征之間的距離罕扎。與常規(guī)用法相反聚唐,我們提出使用激活層之前的特征,這將克服原始設(shè)計(jì)的兩個(gè)缺點(diǎn)腔召。首先杆查,激活特征非常稀疏,尤其是在非常深的網(wǎng)絡(luò)之后臀蛛,如圖6所示亲桦。例如,圖像“狒狒”在VGG19-54層之后激活神經(jīng)元的平均百分比只有浊仆。稀疏的激活提供了弱監(jiān)督烙肺,因此導(dǎo)致性能較差。其次氧卧,與真實(shí)圖像相比桃笙,使用激活之后的特征也會(huì)導(dǎo)致重建亮度不一致,這將在4.4節(jié)中展示沙绝。
圖6:圖像“狒狒”激活之前和激活之后代表性的特征映射搏明。隨著網(wǎng)絡(luò)加深鼠锈,大多數(shù)激活之后的特征變得不活躍而激活之前的特征包含更多的信息。
Therefore, the total loss for the generator is: where is the content loss that evaluate the 1-norm distance between recovered image and the ground-truth , and are the coefficients to balance different loss terms.
因此星著,生成器的全部損失為:购笆,其中是內(nèi)容損失,用來評(píng)估恢復(fù)圖像和真實(shí)圖像之間的1范數(shù)距離虚循,是平衡不同損失項(xiàng)的系數(shù)同欠。
We also explore a variant of perceptual loss in the PIRM-SR Challenge. In contrast to the commonly used perceptual loss that adopts a VGG network trained for image classification, we develop a more suitable perceptual loss for SR–MINC loss. It is based on a fine-tuned VGG network for material recognition [38], which focuses on textures rather than object. Although the gain of perceptual index brought by MINC loss is marginal, we still believe that exploring perceptual loss that focuses on texture is critical for SR.
我們?cè)赑IRM-SR挑戰(zhàn)賽中也探索了感知損失的變種。與采用圖像分類訓(xùn)練的VGG網(wǎng)絡(luò)的常用感知損失相比横缔,我們?yōu)镾R–MINC損失開發(fā)了一種更合適的感知損失铺遂。它是基于材料識(shí)別[38]的微調(diào)VGG網(wǎng)絡(luò),該網(wǎng)絡(luò)注重于紋理而不是目標(biāo)茎刚。盡管MINC損失帶來的感知指數(shù)收益是微不足道的襟锐,但我們?nèi)匀徽J(rèn)為,采用注重紋理的感知損失對(duì)于SR至關(guān)重要膛锭。
3.4 Network Interpolation
To remove unpleasant noise in GAN-based methods while maintain a good perceptual quality, we propose a flexible and effective strategy – network interpolation. Specifically, we first train a PSNR-oriented network and then obtain a GAN-based network by fine-tuning. We interpolate all the corresponding parameters of these two networks to derive an interpolated model , whose parameters are: where , and are the parameters of , and , respectively, and is the interpolation parameter.
3.4 網(wǎng)絡(luò)插值
為了去除基于GAN方法中討厭的噪聲同時(shí)保持好的感知質(zhì)量粮坞,我們提出了一種彈性有效的策略——網(wǎng)絡(luò)插值。具體來說初狰,我們首先訓(xùn)練一個(gè)面向PSNR的網(wǎng)絡(luò)莫杈,然后通過微調(diào)獲得一個(gè)基于GAN的網(wǎng)絡(luò)。我們插值這兩個(gè)網(wǎng)絡(luò)的所有對(duì)應(yīng)參數(shù)來取得插值模型奢入,其參數(shù)為: 其中, 和分別是, 和的參數(shù)筝闹,為插值參數(shù)。
The proposed network interpolation enjoys two merits. First, the interpolated model is able to produce meaningful results for any feasible without introducing artifacts. Second, we can continuously balance perceptual quality and fidelity without re-training the model.
提出的網(wǎng)絡(luò)插值有兩個(gè)優(yōu)點(diǎn)俊马。首先丁存,插值模型對(duì)于任何合理的都能產(chǎn)生有意義的結(jié)果而不會(huì)產(chǎn)生偽影肩杈。其次柴我,我們可以持續(xù)平衡感知質(zhì)量和保真度都不必重新訓(xùn)練模型。
We also explore alternative methods to balance the effects of PSNR-oriented and GAN-based methods. For instance, one can directly interpolate their output images (pixel by pixel) rather than the network parameters. However, such an approach fails to achieve a good trade-off between noise and blur, i.e., the interpolated image is either too blurry or noisy with artifacts (see Sec. 4.5). Another method is to tune the weights of content loss and adversarial loss, i.e., the parameter and in Eq. (3). But this approach requires tuning loss weights and fine-tuning the network, and thus it is too costly to achieve continuous control of the image style.
我們也探索了替代方法來平衡面向PSNR方法和基于GAN方法的影響扩然。例如艘儒,可以直接插值它們的輸出圖像(逐像素)而不是網(wǎng)絡(luò)參數(shù)。然而夫偶,這種方法不會(huì)在噪聲和模糊之間取得良好的權(quán)衡界睁,即插值圖像或太模糊或帶有偽影的噪聲太大(見4.5節(jié))。另一種方法是調(diào)整內(nèi)容損失和對(duì)抗損失的權(quán)重兵拢,即方程3中的參數(shù)和翻斟。但這種方法要求調(diào)整損失權(quán)重并微調(diào)網(wǎng)絡(luò),因此實(shí)現(xiàn)圖像風(fēng)格的連續(xù)控制代價(jià)很高说铃。
4 Experiments
4.1 Training Details
Following SRGAN [1], all experiments are performed with a scaling factor of ×4 between LR and HR images. We obtain LR images by down-sampling HR images using the MATLAB bicubic kernel function. The mini-batch size is set to 16. The spatial size of cropped HR patch is 128 × 128. We observe that training a deeper network benefits from a larger patch size, since an enlarged receptive field helps to capture more semantic information. However, it costs more training time and consumes more computing resources. This phenomenon is also observed in PSNR-oriented methods (see supplementary material).
4 實(shí)驗(yàn)
4.1 訓(xùn)練細(xì)節(jié)
按照SRGAN[1]访惜,所有實(shí)驗(yàn)在LR和HR圖像間均以4倍的尺度系數(shù)進(jìn)行嘹履。我們通過使用MATLAB雙三次核函數(shù)對(duì)HR圖像進(jìn)行下采樣來獲得LR圖像。最小批次大小設(shè)置為16债热。裁剪的HR圖像塊的空間大小為128×128砾嫉。我們觀察到,訓(xùn)練更深的網(wǎng)絡(luò)可以從更大的批次大小中獲益窒篱,因?yàn)閿U(kuò)大的感受野有助于捕獲更多的語義信息焕刮。但是,這會(huì)花費(fèi)更多的訓(xùn)練時(shí)間并消耗更多的計(jì)算資源墙杯。這種現(xiàn)象也可以在面向PSNR的方法中觀察到(見補(bǔ)充材料)配并。
The training process is divided into two stages. First, we train a PSNR-oriented model with the L1 loss. The learning rate is initialized as and decayed by a factor of 2 every of mini-batch updates. We then employ the trained PSNR-oriented model as an initialization for the generator. The generator is trained using the loss function in Eq. (3) with and . The learning rate is set to and halved at [50k, 100k, 200k, 300k] iterations. Pre-training with pixel-wise loss helps GAN-based methods to obtain more visually pleasing results. The reasons are that 1) it can avoid undesired local optima for the generator; 2) after pre-training, the discriminator receives relatively good super-resolved images instead of extreme fake ones (black or noisy images) at the very beginning, which helps it to focus more on texture discrimination.
訓(xùn)練過程分為兩個(gè)階段。首先霍转,我們訓(xùn)練一個(gè)具有L1損失的面向PSNR的模型荐绝。學(xué)習(xí)率初始化為,每個(gè)小批次更新的衰減因子為2避消。然后低滩,我們采用訓(xùn)練的面向PSNR的模型作為生成器的初始化。生成器訓(xùn)練使用等式3中的損失函數(shù)岩喷,恕沫,。學(xué)習(xí)率設(shè)置為纱意,在[50k, 100k, 200k, 300k]次迭代之后減半婶溯。使用逐像素?fù)p失進(jìn)行預(yù)訓(xùn)練有助于基于GAN的方法獲得視覺上更好的結(jié)果。原因是:1)它可以避免生成器不希望的局部最優(yōu)偷霉;2)在預(yù)訓(xùn)練之后迄委,最初判別器可以收到相對(duì)好的超分辨率圖像而不是極端假的圖像(黑色或噪聲圖像),這有助于其更關(guān)注紋理判別类少。
For optimization, we use Adam [39] with . We alternately update the generator and discriminator network until the model converges. We use two settings for our generator – one of them contains 16 residual blocks, with a capacity similar to that of SRGAN and the other is a deeper model with 23 RRDB blocks. We implement our models with the PyTorch framework and train them using NVIDIA Titan Xp GPUs.
為了優(yōu)化叙身,我們使用Adam[39],其中信轿。我們交替更新生成器和判別器網(wǎng)絡(luò),直到模型收斂残吩。我們?yōu)樯善魇褂昧藘煞N設(shè)置——其中一種包含16個(gè)殘差塊财忽,能力類似于SRGAN,另一種是具有23個(gè)RRDB塊的更深的模型泣侮。我們使用PyTorch框架實(shí)現(xiàn)我們的模型即彪,并使用NVIDIA Titan Xp GPU對(duì)其進(jìn)行訓(xùn)練。
4.2 Data
For training, we mainly use the DIV2K dataset [40], which is a high-quality (2K resolution) dataset for image restoration tasks. Beyond the training set of DIV2K that contains 800 images, we also seek for other datasets with rich and diverse textures for our training. To this end, we further use the Flickr2K dataset [41] consisting of 2650 2K high-resolution images collected on the Flickr website, and the OutdoorSceneTraining (OST) [17] dataset to enrich our training set. We empirically find that using this large dataset with richer textures helps the generator to produce more natural results, as shown in Fig. 8.
Fig. 8: Overall visual comparisons for showing the effects of each component in ESRGAN. Each column represents a model with its configurations in the top. The red sign indicates the main improvement compared with the previous model.
4.2 數(shù)據(jù)
對(duì)于訓(xùn)練活尊,我們主要使用DIV2K數(shù)據(jù)集[40]隶校,它是用于圖像復(fù)原任務(wù)的高質(zhì)量(2K分辨率)數(shù)據(jù)集琼蚯。除了包含800張圖像的DIV2K訓(xùn)練集外,我們也搜尋了其它具有豐富多樣紋理的數(shù)據(jù)集進(jìn)行訓(xùn)練惠况。為此遭庶,我們進(jìn)一步使用Flickr2K數(shù)據(jù)集[41],包含F(xiàn)lickr網(wǎng)站上收集的2650張2K高分辨率圖像稠屠,OutdoorSceneTraining(OST)[17]數(shù)據(jù)集來豐富我們的訓(xùn)練集峦睡。我們憑經(jīng)驗(yàn)發(fā)現(xiàn),使用具有豐富紋理的大型數(shù)據(jù)集有助于生成器產(chǎn)生更自然的結(jié)果权埠,如圖8所示榨了。
圖8:展示ESRGAN中每個(gè)組件效果的整體視覺比較。每一列表示一個(gè)模型攘蔽,其配置在頂部龙屉。紅色符號(hào)表示與前面模型相比的主要改進(jìn)。
We train our models in RGB channels and augment the training dataset with random horizontal flips and 90 degree rotations. We evaluate our models on widely used benchmark datasets – Set5 [42], Set14 [43], BSD100 [44], Urban100 [45], and the PIRM self-validation dataset that is provided in the PIRM-SR Challenge.
我們?cè)赗GB通道訓(xùn)練模型满俗,并通過隨機(jī)水平翻轉(zhuǎn)和90度旋轉(zhuǎn)來增強(qiáng)訓(xùn)練集转捕。我們?cè)趶V泛使用的基準(zhǔn)數(shù)據(jù)集——Set5[42],Set14[43]唆垃,BSD100[44]五芝,Urban100[45]以及PIRM-SR挑戰(zhàn)賽提供的PIRM自驗(yàn)證數(shù)據(jù)上評(píng)估我們的模型。
4.3 Qualitative Results
We compare our final models on several public benchmark datasets with state-ofthe-art PSNR-oriented methods including SRCNN [4], EDSR [20] and RCAN [12], and also with perceptual-driven approaches including SRGAN [1] and EnhanceNet [16]. Since there is no effective and standard metric for perceptual quality, we present some representative qualitative results in Fig. 7. PSNR (evaluated on the luminance channel in YCbCr color space) and the perceptual index used in the PIRM-SR Challenge are also provided for reference.
4.3 定性結(jié)果
我們將最終的模型與最新的面向PSNR的方法包括SRCNN[4]辕万,EDSR[20]和RCAN[12]枢步,以及感知驅(qū)動(dòng)的方法包括在SRGAN[1]和EnhanceNet[16]在一些公開基準(zhǔn)數(shù)據(jù)集上進(jìn)行了比較。由于對(duì)于感知質(zhì)量沒有有效標(biāo)準(zhǔn)的度量標(biāo)準(zhǔn)渐尿,我們?cè)趫D7中展示了一些具有代表性的結(jié)果醉途,也提供了PSNR(在YCbCr顏色空間的亮度通道上評(píng)估)和PIRM-SR挑戰(zhàn)賽中的感知指數(shù)供參考。
It can be observed from Fig. 7 that our proposed ESRGAN outperforms previous approaches in both sharpness and details. For instance, ESRGAN can produce sharper and more natural baboon’s whiskers and grass textures (see image 43074) than PSNR-oriented methods, which tend to generate blurry results, and than previous GAN-based methods, whose textures are unnatural and contain unpleasing noise. ESRGAN is capable of generating more detailed structures in building (see image 102061) while other methods either fail to produce enough details (SRGAN) or add undesired textures (EnhanceNet). Moreover, previous GAN-based methods sometimes introduce unpleasant artifacts, e.g., SRGAN adds wrinkles to the face. Our ESRGAN gets rid of these artifacts and produces natural results.
從圖7可以看出砖茸,我們提出的ESRGAN在清晰度和細(xì)節(jié)方面都優(yōu)于之前的方法隘擎。例如,與面向PSNR的方法(更趨向于產(chǎn)生模糊的結(jié)果)和以前的基于GAN的方法(紋理不自然并包含令人不快的噪聲)相比渔彰,ESRGAN可以產(chǎn)生更清晰更自然的狒狒胡須和草的紋理(見圖43074)嵌屎。在建筑物中(見圖102061)推正,ESRGAN能夠產(chǎn)生更詳細(xì)的結(jié)構(gòu)而其它的方法要么不能產(chǎn)生足夠的細(xì)節(jié)(SRGAN)恍涂,要么添加不必要的紋理(EnhanceNet)。此外植榕,以前基于GAN的方法有時(shí)會(huì)引入令人不快的偽影再沧,例如SRGAN會(huì)在臉上添加皺紋。我們的ESRGAN除去了這些偽影并產(chǎn)生了自然的結(jié)果尊残。
4.4 Ablation Study
In order to study the effects of each component in the proposed ESRGAN, we gradually modify the baseline SRGAN model and compare their differences. The overall visual comparison is illustrated in Fig. 8. Each column represents a model with its configurations shown in the top. The red sign indicates the main improvement compared with the previous model. A detailed discussion is provided as follows.
4.4 消融研究
為了研究提出的ESRGAN中每個(gè)組件的效果炒瘸,我們逐漸修改基準(zhǔn)的SRGAN模型并比較它們的差異淤堵。完整的視覺比較如圖8所示。每一列表示一個(gè)模型顷扩,其配置在頂部拐邪。紅色符號(hào)表明與前面模型相比的主要改進(jìn)。詳細(xì)討論提供如下隘截。
BN removal. We first remove all BN layers for stable and consistent performance without artifacts. It does not decrease the performance but saves the computational resources and memory usage. For some cases, a slight improvement can be observed from the 2nd and 3rd columns in Fig. 8 (e.g., image 39). Furthermore, we observe that when a network is deeper and more complicated, the model with BN layers is more likely to introduce unpleasant artifacts. The examples can be found in the supplementary material.
移除BN扎阶。為了穩(wěn)定和沒有偽影的一致性能,我們首先移除了所有的BN層婶芭。它不會(huì)降低性能但會(huì)節(jié)省計(jì)算資源和內(nèi)存使用东臀。在某些情況下,從圖8中的第2列和第3列可以觀察到輕微的改進(jìn)(例如犀农,圖39)惰赋。此外,我們觀察到當(dāng)網(wǎng)絡(luò)更深更復(fù)雜時(shí)呵哨,具有BN層的模型更可能引入令人不快的偽影赁濒。可以在補(bǔ)充材料中找到示例孟害。
Before activation in perceptual loss. We first demonstrate that using features before activation can result in more accurate brightness of reconstructed images. To eliminate the influences of textures and color, we filter the image with a Gaussian kernel and plot the histogram of its gray-scale counterpart. Fig. 9a shows the distribution of each brightness value. Using activated features skews the distribution to the left, resulting in a dimmer output while using features before activation leads to a more accurate brightness distribution closer to that of the ground-truth.
Fig. 9: Comparison between before activation and after activation.
感知損失在激活之前流部。我們首先證實(shí)了使用激活之前的特征可以使重建圖像的亮度更準(zhǔn)確。為了消除紋理和顏色的影響纹坐,我們使用高斯核對(duì)圖像進(jìn)行了濾波并繪制了其對(duì)應(yīng)灰度圖像的直方圖枝冀。圖9a展示了每一個(gè)亮度值的分布。使用激活的特征會(huì)使分布偏向左耘子,導(dǎo)致了較暗的輸出果漾,而使用激活之前的特征會(huì)得到更精確的亮度分布,更接近于真實(shí)圖像的亮度分布谷誓。
圖9:激活之前和激活之后的比較绒障。
We can further observe that using features before activation helps to produce sharper edges and richer textures as shown in Fig. 9b (see bird feather) and Fig. 8 (see the 3rd and 4th columns), since the dense features before activation offer a stronger supervision than that a sparse activation could provide.
我們可以進(jìn)一步觀察到,使用激活之前的特征有助于產(chǎn)生更清晰的邊緣和更豐富的紋理捍歪,如圖9b(見鳥羽)和圖8(見第三列和第四列)所示户辱,因?yàn)榕c稀疏激活提供的特征相比,激活之前的密集特征能提供更強(qiáng)的監(jiān)督糙臼。
RaGAN. RaGAN uses an improved relativistic discriminator, which is shown to benefit learning sharper edges and more detailed textures. For example, in the 5th column of Fig. 8, the generated images are sharper with richer textures than those on their left (see the baboon, image 39 and image 43074).
RaGAN庐镐。RaGAN使用改進(jìn)的相對(duì)判別器,證明了其有利于學(xué)習(xí)更清晰的邊緣和更細(xì)致的紋理变逃。例如必逆,在圖8的第5列中,生成的圖像比其左側(cè)的圖像更清晰,具有更豐富的紋理(見狒狒名眉,圖39和圖43074)粟矿。
Deeper network with RRDB. Deeper model with the proposed RRDB can further improve the recovered textures, especially for the regular structures like the roof of image 6 in Fig. 8, since the deep model has a strong representation capacity to capture semantic information. Also, we find that a deeper model can reduce unpleasing noises like image 20 in Fig. 8.
具有RRDB的更深網(wǎng)絡(luò)。具有提出的RRDB的更深模型可以進(jìn)一步改善恢復(fù)的紋理损拢,尤其是像圖8中圖像6的屋頂這樣的常規(guī)結(jié)構(gòu)陌粹,因?yàn)樯疃饶P途哂袕?qiáng)大的表示能力來捕獲語義信息。 我們也發(fā)現(xiàn)更深的模型可以減少像圖8中圖像20這樣的令人不快的噪聲福压。
In contrast to SRGAN, which claimed that deeper models are increasingly difficult to train, our deeper model shows its superior performance with easy training, thanks to the improvements mentioned above especially the proposed RRDB without BN layers.
與SRGAN聲稱的更深的模型越來越難訓(xùn)練相比申屹,由于上述提供的改進(jìn)尤其是提出的沒有BN層的RRDB,我們更深的模型展示了它容易訓(xùn)練且優(yōu)越性能隧膏。
4.5 Network Interpolation
We compare the effects of network interpolation and image interpolation strategies in balancing the results of a PSNR-oriented model and GAN-based method. We apply simple linear interpolation on both the schemes. The interpolation parameter is chosen from 0 to 1 with an interval of 0.2.
4.5 網(wǎng)絡(luò)插值
我們比較了網(wǎng)絡(luò)插值和圖像插值策略在平衡面向PSNR模型與基于GAN方法的結(jié)果方面的作用哗讥。我們?cè)谶@個(gè)兩個(gè)方案中應(yīng)用了簡(jiǎn)單的線性插值。插值參數(shù)從間隔為0.2的0-1之間選取胞枕。
As depicted in Fig. 10, the pure GAN-based method produces sharp edges and richer textures but with some unpleasant artifacts, while the pure PSNRoriented method outputs cartoon-style blurry images. By employing network interpolation, unpleasing artifacts are reduced while the textures are maintained. By contrast, image interpolation fails to remove these artifacts effectively.
Fig. 10: The comparison between network interpolation and image interpolation.
如圖10所示杆煞,單純的基于GAN的方法會(huì)產(chǎn)生清晰的邊緣和更豐富的紋理,但帶有一些令人不快的偽影腐泻,而單純的面向PSNR方法會(huì)輸出卡通風(fēng)格的模糊圖像决乎。通過采用網(wǎng)絡(luò)插值,在減少令人不快的偽影的同時(shí)保持了紋理派桩。相比之下构诚,圖像插值不能有效消除這些偽影。
圖10:網(wǎng)絡(luò)插值和圖像插值的比較。
Interestingly, it is observed that the network interpolation strategy provides a smooth control of balancing perceptual quality and fidelity in Fig. 10.
有趣的是,在圖10中觀察到網(wǎng)絡(luò)插值策略提供了對(duì)平衡感知質(zhì)量和保真度的平滑控制撤缴。
4.6 The PIRM-SR Challenge
We take a variant of ESRGAN to participate in the PIRM-SR Challenge [3]. Specifically, we use the proposed ESRGAN with 16 residual blocks and also empirically make some modifications to cater to the perceptual index. 1) The MINC loss is used as a variant of perceptual loss, as discussed in Sec. 3.3. Despite the marginal gain on the perceptual index, we still believe that exploring perceptual loss that focuses on texture is crucial for SR. 2) Pristine dataset [24], which is used for learning the perceptual index, is also employed in our training; 3) a high weight of loss up to is used due to the PSNR constraints; 4) we also use back projection [46] as post-processing, which can improve PSNR and sometimes lower the perceptual index.
4.6 PIRM-SR挑戰(zhàn)賽
我們采用ESRGAN的一個(gè)變種來參加PIRM-SR挑戰(zhàn)賽[3]玛界。具體來說坑质,我們使用提出的具有16個(gè)殘差塊的ESRGAN,并根據(jù)經(jīng)驗(yàn)進(jìn)行了一些修改來迎合感知指數(shù)。1)使用MINC損失作為感知損失的一個(gè)變種,如3.3節(jié)所述受裹。盡管在感知指數(shù)上有邊際收益,但我們?nèi)哉J(rèn)為采用專注于紋理的感知損失對(duì)于SR至關(guān)重要虏束;2)我們的訓(xùn)練中也使用了Pristine數(shù)據(jù)集[24]來學(xué)習(xí)感知指數(shù)棉饶;3)由于PSNR約束,損失的權(quán)重高達(dá)镇匀;4)我們也使用反向投射[46]作為后處理照藻,其可以改善PSNR,有時(shí)會(huì)降低感知指數(shù)坑律。
For other regions 1 and 2 that require a higher PSNR, we use image interpolation between the results of our ESRGAN and those of a PSNR-oriented method RCAN [12]. The image interpolation scheme achieves a lower perceptual index (lower is better) although we observed more visually pleasing results by using the network interpolation scheme. Our proposed ESRGAN model won the first place in the PIRM-SR Challenge (region 3) with the best perceptual index.
對(duì)于其它需要較高PSNR的區(qū)域1和2岩梳,我們?cè)贓SRGAN的結(jié)果和面向PSNR方法RCAN[12]的結(jié)果之間使用圖像插值。盡管通過使用網(wǎng)絡(luò)插值方案我們觀察到了視覺上更令人滿意的效果晃择,但圖像插值方案取得了較低的感知指數(shù)(越低越好)冀值。我們提出的ESRGAN模型以最好的感知指數(shù)贏得了PIRM-SR挑戰(zhàn)賽(區(qū)域3)的第一名。
5 Conclusion
We have presented an ESRGAN model that achieves consistently better perceptual quality than previous SR methods. The method won the first place in the PIRM-SR Challenge in terms of the perceptual index. We have formulated a novel architecture containing several RDDB blocks without BN layers. In addition, useful techniques including residual scaling and smaller initialization are employed to facilitate the training of the proposed deep model. We have also introduced the use of relativistic GAN as the discriminator, which learns to judge whether one image is more realistic than another, guiding the generator to recover more detailed textures. Moreover, we have enhanced the perceptual loss by using the features before activation, which offer stronger supervision and thus restore more accurate brightness and realistic textures.
5 結(jié)論
我們提出了一種ESRGAN模型宫屠,它比以前的SR方法始終取得更好的感知質(zhì)量列疗。就感知指數(shù)而言,該方法在PIRM-SR挑戰(zhàn)賽中獲得了第一名浪蹂。我們構(gòu)建了一種包含一些沒有BN層的RDDB塊的新穎架構(gòu)抵栈。此外,采用了包括殘差縮放和較小初始化的有用技術(shù)坤次,以促進(jìn)提出的深度模型的訓(xùn)練古劲。我們還介紹了使用相對(duì)GAN作為判別器,其學(xué)習(xí)判斷一張圖像是否比另一張更真實(shí)缰猴,引導(dǎo)生成器恢復(fù)更詳細(xì)的紋理产艾。此外,我們通過使用激活之前的特征增強(qiáng)了感知損失滑绒,它提供了更強(qiáng)的監(jiān)督闷堡,從而恢復(fù)了更精確的亮度和真實(shí)紋理。
Acknowledgement. This work is supported by SenseTime Group Limited, the General Research Fund sponsored by the Research Grants Council of the Hong Kong SAR (CUHK 14241716, 14224316. 14209217), National Natural Science Foundation of China (U1613211) and Shenzhen Research Program (JCYJ20170818164704758, JCYJ20150925163005055).
致謝疑故。這項(xiàng)工作由商湯科技支持杠览,香港特別行政區(qū)研究資助局(CUHK 14241716、14224316纵势、14209217)踱阿,中國(guó)國(guó)家自然科學(xué)基金(U1613211)和深圳研究計(jì)劃(JCYJ20170818164704758,JCYJ20150925163005055)贊助钦铁。
References
Ledig,C.,Theis,L.,Husza ?r,F.,Caballero,J.,Cunningham,A.,Acosta,A.,Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR. (2017)
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734 (2018)
Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The pirm challenge on perceptual super resolution. https://www.pirm2018.org/PIRM-SR. html (2018)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: ECCV. (2014)
Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: CVPR. (2016)
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: CVPR. (2017)
Kim, J., Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolutional network for image super-resolution. In: CVPR. (2016)
Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: CVPR. (2017)
Tai, Y., Yang, J., Liu, X., Xu, C.: Memnet: A persistent memory network for image restoration. In: ICCV. (2017)
Haris, M., Shakhnarovich, G., Ukita, N.: Deep backprojection networks for super- resolution. In: CVPR. (2018)
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: CVPR. (2018)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: ECCV. (2018)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV. (2016)
Bruna, J., Sprechmann, P., LeCun, Y.: Super-resolution with deep convolutional sufficient statistics. In: ICLR. (2015)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. (2014)
Sajjadi, M.S., Scho ?lkopf, B., Hirsch, M.: Enhancenet: Single image super-resolution through automated texture synthesis. In: ICCV. (2017)
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: CVPR. (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. (2016)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICMR. (2015)
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: CVPRW. (2017)
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261 (2016)
Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR. (2017)
Ma, C., Yang, C.Y., Yang, X., Yang, M.H.: Learning a no-reference quality metric for single-image super-resolution. CVIU 158 (2017) 1–16
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a completely blind image quality analyzer. IEEE Signal Process. Lett. 20(3) (2013) 209–212
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. TPAMI 38(2) (2016) 295–307
Yu, K., Dong, C., Lin, L., Loy, C.C.: Crafting a toolchain for image restoration by deep reinforcement learning. In: CVPR. (2018)
Yuan, Y., Liu, S., Zhang, J., Zhang, Y., Dong, C., Lin, L.: Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In: CVPRW. (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: ICCV. (2015)
Gatys, L., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: NIPS. (2015)
Mechrez, R., Talmi, I., Shama, F., Zelnik-Manor, L.: Maintaining natural image statistics with the contextual loss. arXiv preprint arXiv:1803.04626 (2018)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: NIPS. (2017)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR. (2017)
Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR. (2017)
Zhang, K., Sun, M., Han, X., Yuan, X., Guo, L., Liu, T.: Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Bell, S., Upchurch, P., Snavely, N., Bala, K.: Material recognition in the wild with the materials in context database. In: CVPR. (2015)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: ICLR. (2015)
Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: CVPRW. (2017)
Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L., Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M., et al.: Ntire 2017 challenge on single image super-resolution: Methods and results. In: CVPRW. (2017)
Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC, BMVA press (2012)
Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: International Conference on Curves and Surfaces, Springer (2010)
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV. (2001)
Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: CVPR. (2015)
Timofte, R., Rothe, R., Van Gool, L.: Seven ways to improve example-based single image super resolution. In: CVPR. (2016)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics. (2010)