Generative adversarial network[
Generative adversarial network (GAN), since proposed in 2014 by Ian Goodfellow has drawn a lot of attentions. It is consisted of a generator and a discriminator, where the generator tries to generate sample and the discrimiantor tries to discriminate the sample generated by generator from the real ones.
In the ideal case, the generator will generate data as real as the real data; and the discriminator will not be able to discrimnator the sample generated by generator from the real ones.
The quality of GAN can be measured via inception score: where $y$ is the label and $x$ is the generated dat. Or Fréchet Inception Distance (FID)
The cost function of vanilla GAN can be written as:
where the first part is the cross entropy on the discriminator output over the data distribution; and the second part is the cross entropy of negative discriminator output over the sampler distribution.
The training of GAN will iterate with the following steps:
- generate the data from generator and sample the real data
- use those two data to update the discriminator
- update the generator.
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS
DC-GAN use deep neural network for both the generator and discriminator. It emphasizes several guidelines for stable Deep Convolutional GANs
- Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
- Use batchnorm in both the generator and the discriminator.
- Remove fully connected hidden layers for deeper architectures.
- Use ReLU activation in generator for all layers except for the output, which uses Tanh.
- Use LeakyReLU activation in the discriminator for all layers.
Image-to-Image Translation with Conditional Adversarial Networks
Pix2pix is able to solve a wide set of image to image transfer problem. It uses U-Net for generator and convolutional
PatchGAN for patch-level discriminator, to address the problem that output GAN tends to be blur.
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Cycle-GAN tackles the problem that, the source image and target image pair may not be available; however, images from source domain and images from target domain are available. To resolve this, cycle consistent loss is proposed, which ensures the image of source domain transferred to target domain can be transfer back:
Wasserstein Generative Adversarial Networks
WGAN claims that in GAN (other many others), the density distribution for the real data doesn’t exists, that means the cross-entropy loss function doesn’t hold. As a result, wasserstein is proposed, which is a approximation of earth move distance function.
PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION
In Progressive GAN, the generator and discriminator is trained progressively, namely, a simpler one is trainied for low resolution image, then more layer are added for higer resolution image. This progressive training dramatically improves the stability and speed of the training process.
LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS
BigGan considers the problem of training GAN on large scale dataset, e.g., ImageNet. It proposes that increasing batch size, number of output channels of convolution layers and depth of network (combined with residual blocks) could dramatically improve the performance.
Truncation Trick is also proposed to generate the input for generator to improve stability, where noise is sampled from a normal distribution but any value above the threshold is resampled.
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
StarGan address problem of learning GAN simultaneously for multiple domains. To do that, the input to the generator is a label vector of the target domain of the image to be generated. It also adopts the idea of Cycle-gan.
A Style-Based Generator Architecture for Generative Adversarial Networks
StyleGan has a totall diferent architecture compared with other GANs (as shown below). It has three inputs:
- constant input: a $512x4x4$ tensor encodes the image prior;
- latent vector after mapping network: encodes the style information;
- noise: added to input to multiple hidden layer to further improve the result.