Problem
- Traditional GAN methods directly operate on the whole image, and inevitably change the attribute-irrelevant regions
- The performance of traditional regression methods heavily depends on the paired training data, which are however quite difficult to acquire
Relative work
ResGAN: learning the residual image avoids changing the attribute-irrelevant region by restraining most regions of the residual image as zero.
Improvement:This work is quite insightful to enforce the manipulation mainly concentrate on local areas especially for those local attributes.
Drawback: the location and the appearance of target attributes are modeled in single sparse residual image which is actually hard for a favorable optimization than modeling them separately
Method
- SaGAN: only alter the attribute specific region and keep the rest unchanged
- The generator contains an attribute manipulation network (AMN) to edit the face image, and a spatial attention network (SAN) to localize the attribute-speciic region which restricts the alternation of AMN within this region.
Contribution
- The spatial attention is introduced to the GAN framework, forming an end-to-end generative model for face attribute editing (referred to as SaGAN),which can only alter those attribute-speciic region and keep the rest irrelevant region remain the same.
- The proposed SaGAN adopts single generator with attribute as conditional signal rather than two dual ones for two inverse face attribute editing.
- The proposed SaGAN achieves quite promising results especially for those local attributes with the attribute-irrelevant details well preserved. Besides, our approach also benefits the face recognition by data augmentation.
Generative Adversarial Network with Spatial Attention
notation | meaning |
---|---|
input image | |
output image | |
an edited face image output by AMN | |
attribute value | |
ground truth attribute label of the real image | |
probability of an image to be a real one | |
probability of an image with the attribute | |
an attribute manipulation network (AMN) | |
a spatial attention network(SAN) | |
a spatial attention mask, used to restrict the alternation of AMN within this region | |
balance parameters | |
balance parameters | |
hyper-parameters control the gradient penalty, default = 10 |
- the goal of face attribute editing is to translate I into an
new image , which should be realistic, with attribute c and look the same as the input image excluding the attribute-specific region
Discriminator
- Two objectives, one to distinguish the generated images from the real ones, and another to classify the attributes of the generated and real images
- The two classifiers are both designed as a CNN with softmax function, denoted as and respectively.
- The two networks can share the first few convolutional layers followed by distinct fully-connected layers for different classifications
discriminator D:
Generator
- G contains two modules, an attribute manipulation network(AMN) and a spatial attention network(SAN)
- AMN focuses on how to manipulate and SAN focuses on where to manipulate.
- The attribute manipulation network takes a face image and an attribute value as input, and outputs an edited face image
- The spatial attention network takes the face image as input, and predict a spatial attention mask , which is used to restrict the alternation of AMN within this region
- Ideally, the attribute-specific region of should be 1, and the rest regions should be 0.
- Regions with non-zeros attention values are all regarded as attribute-specific region, and the rest with zero attention values are regarded as attribute-irrelevant region
- the attribute-specfiic regions are manipulated towards the target attribute while the rest regions remain the same
-
To make the edited face image photo-realistic: an adversarial loss is designed to confuse the real/fake classifier
-
To make be correctly with target attribute : an attribute classification loss is designed to enforce the attribute prediction of from the attribute classifier approximates the target value
-
To keep the attribute-irrelevant region unchanged: a reconstruction loss is employed similar as CycleGAN and StarGAN
- generator G
Implementation
Optimization
To optimize the adversarial real/fake classification more stably, in all experiments the objectives in Eq.(1) and Eq.(2) is optimized by using WGAN-GP
is sampled uniformly along a straight line between the edited images and the real images
Network Architecture
- For the generator, the two networks of AMN and SAN share the same network architecture except slight difference in the input and output:
Network | Input | Output | Activation function |
---|---|---|---|
AMN | 4-channel input, an input image and a attribute | 3-channel RGB image | Tanh |
SAN | 3-channel input, an input image | 1-channel attention mask image | Sigmoid |
- For the discriminator, the same architecture as PatchGAN, is used considering its promising performance.