Improved Attention GAN Project – Abstract

Automatically generating images according to natural language descriptions is a fundamental problem in many applications such as computer-aided design, photo editing, art generation, and video games.

AttnGAN is a neural architecture that combines the attention mechanism from the field of natural language processing, and the generative adversarial networks from the field of image processing. In this project, we implemented the architecture, suggested by the original researchers, trained the network on a set of bird images with their corresponding natural language captions and reproduced their results. This allowed the trained network to generate completely new images of birds from a given natural language sentence.

The AttnGAN model was created to transform descriptive text to fine-grained high-quality images. The motivation was to relate to image generation as a process with many stages, taking the idea of encoder-decoder and sequence-to-sequence and implementing it for a text-to-image model.

The architecture has a novel way to use the attention mechanism in the process, and make the text and image agree with each other using a unique loss function.