Generative adversarial network

[ deep-learning  gan  generative-adversarial-network  dc-gan  pix2pix  cycle-gan  wgan  progressive-gan  biggan  stargan  stylegan  ]

Generative adversarial network (GAN), since proposed in 2014 by Ian Goodfellow has drawn a lot of attentions. It is consisted of a generator and a discriminator, where the generator tries to generate sample and the discrimiantor tries to discriminate the sample generated by generator from the real ones.

In the ideal case, the generator will generate data as real as the real data; and the discriminator will not be able to discrimnator the sample generated by generator from the real ones.

GAN网络结构

The quality of GAN can be measured via inception score: where $y$ is the label and $x$ is the generated dat. Or Fréchet Inception Distance (FID)

Vanilla GAN

The cost function of vanilla GAN can be written as:

where the first part is the cross entropy on the discriminator output over the data distribution; and the second part is the cross entropy of negative discriminator output over the sampler distribution.

The training of GAN will iterate with the following steps:

  • generate the data from generator and sample the real data
  • use those two data to update the discriminator
  • update the generator.

DC-GAN

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

DC-GAN use deep neural network for both the generator and discriminator. It emphasizes several guidelines for stable Deep Convolutional GANs

  • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
  • Use batchnorm in both the generator and the discriminator.
  • Remove fully connected hidden layers for deeper architectures.
  • Use ReLU activation in generator for all layers except for the output, which uses Tanh.
  • Use LeakyReLU activation in the discriminator for all layers.

pix2pix

Image-to-Image Translation with Conditional Adversarial Networks

Pix2pix is able to solve a wide set of image to image transfer problem. It uses U-Net for generator and convolutional PatchGAN for patch-level discriminator, to address the problem that output GAN tends to be blur.

Cycle-GAN

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Cycle-GAN tackles the problem that, the source image and target image pair may not be available; however, images from source domain and images from target domain are available. To resolve this, cycle consistent loss is proposed, which ensures the image of source domain transferred to target domain can be transfer back:

WGAN

Wasserstein Generative Adversarial Networks

WGAN claims that in GAN (other many others), the density distribution for the real data doesn’t exists, that means the cross-entropy loss function doesn’t hold. As a result, wasserstein is proposed, which is a approximation of earth move distance function.

Progressive GAN

PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

In Progressive GAN, the generator and discriminator is trained progressively, namely, a simpler one is trainied for low resolution image, then more layer are added for higer resolution image. This progressive training dramatically improves the stability and speed of the training process.

BigGan

LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS

BigGan considers the problem of training GAN on large scale dataset, e.g., ImageNet. It proposes that increasing batch size, number of output channels of convolution layers and depth of network (combined with residual blocks) could dramatically improve the performance. Truncation Trick is also proposed to generate the input for generator to improve stability, where noise is sampled from a normal distribution but any value above the threshold is resampled.

StarGan

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

StarGan address problem of learning GAN simultaneously for multiple domains. To do that, the input to the generator is a label vector of the target domain of the image to be generated. It also adopts the idea of Cycle-gan.

StyleGan

A Style-Based Generator Architecture for Generative Adversarial Networks

StyleGan has a totall diferent architecture compared with other GANs (as shown below). It has three inputs:

  • constant input: a $512x4x4$ tensor encodes the image prior;
  • latent vector after mapping network: encodes the style information;
  • noise: added to input to multiple hidden layer to further improve the result.

Written on April 5, 2019