Generative Adversarial Networks (GANs)

What Are GANs?

Generative adversarial networks (GANs) are a type of deep neural network used to generate synthetic images. The architecture comprises two deep neural networks, a generator and a discriminator, which work against each other (thus, “adversarial”). The generator generates new data instances, while the discriminator evaluates the data for authenticity and decides whether each instance of data is “real” from the training dataset, or “fake” from the generator.

Together, the generator and discriminator are trained to work against each other until the generator is able to create realistic synthetic data that the discriminator can no longer determine is fake. After successful training, the data produced by the generator can be used to create new synthetic data, for potential use as input to other deep neural networks.

GANs are versatile in that they can learn to generate new instances of any datatype, such as synthetic images of faces, new songs in a certain style, or text of a specific genre.

Training a GAN

Using an example of creating synthetic images of money, let’s walk through the specific parts and functions of a GAN architecture.

  1. Noise is fed into the generator. Since the generator hasn’t been trained yet, the output will look like noise in the beginning.
Showing GAN architecture, and input and output of untrained GAN.
  1. Training data and the output of the generator is sent to the discriminator, which is being trained in parallel to identify real/fake images. The output of the discriminator at the beginning will not be very accurate as this portion of the network is also being trained and accuracy will improve over time.
Showing GAN architecture, and input and output of GAN during training.
  1. Feedback: The output of the discriminator can be fed back to the generator and the discriminator, which can use this information to update parameters and attempt to improve on the accuracy.
Showing GAN architecture without feedback, and input and output of GAN during training.

The goal of the discriminator, when shown an instance from the true dataset, is to recognize those images that are authentic. Meanwhile, the generator is creating new, synthetic images that it passes to the discriminator. It does so in the hopes that they, too, will be deemed authentic, even though they are fake. The goal of the generator is to generate passable images: to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.

MATLAB® and Deep Learning Toolbox™ let you build GANs network architectures using automatic differentiation, custom training loops, and shared weights.

Applications of Generative Adversarial Networks

Handwriting generation: As with the image example, GANs are used to create synthetic data. This can be used to supplement smaller datasets that need more examples of data in order to train accurate deep learning models. One example is handwriting detection: in order to train a deep neural network on handwriting, thousands of samples of training data are needed, and to collect this data manually may be time-intensive.

Handwritten digits from 0 to 9, generated using a GAN.

Synthetic handwriting generation using GANs.

Scene generation: Conditional GANs are a specific type of GAN takes advantage of labels, while the original GAN does not assume labels will be present. Conditional GANs can be used in applications such as scene generation, where there must be a certain organization to the information. Take the example of scene generation for automated driving. The road and sidewalk must be located below the buildings and sky. A synthetic image created for this example that does not adhere to the location of the road will immediately be determined as fake and unusable in an automated driving application.

Image to image translation of road and sidewalk for automated driving using a pix2pix conditional GAN.

Image-to-image translation (pix2pix) using conditional GANs.

Audio and Speech Applications: GANs are also used for applications such as text-to-speech synthesis, voice conversion, and speech enhancement. GANs provide significant advantage over traditional audio and speech implementations as they can generate new samples rather than simply augment existing signals. One example in which GANs are used for sound synthesis is to create synthetic version of drum sounds: Train Generative Adversarial Network (GAN) for Sound Synthesis

Note: GANs can be powerful in generating new synthetic data for use in many applications, yet it is often challenging to arrive at accurate results due to many failure modes that may take place. MATLAB lets you monitor GAN training progress and identify common failure modes.