Ian Goodfellow et al on better text generation via filling in the blanks using MaskGANs

In the paper, “MaskGAN: Better Text Generation via Filling in the ______”, Ian Goodfellow, along with William Fedus and Andrew M. Dai have proposed a way to improve sample quality using Generative Adversarial Networks (GANs), which explicitly trains the generator to produce high quality samples and have also shown a lot of success in image generation.

Ian Goodfellow is a Research scientist at Google Brain. His research interests lies in the fields of deep learning, machine learning security and privacy, and particularly in generative models. Ian Goodfellow is known as the father of Generative Adversarial Networks. He runs the Self-Organizing Conference on Machine Learning, which was founded at OpenAI in 2016.

Generative Adversarial Networks (GANs) is an architecture for training generative models in an adversarial setup, with a generator generating images that is trying to fool a discriminator that is trained to discriminate between real and synthetic images. GANs have had a lot of success in producing more realistic images than other approaches but they have only seen limited use for text sequences. They were originally designed to output differentiable values, as such discrete language generation is challenging for them. The team of researchers, introduce an actor-critic conditional GAN that fills in missing text conditioned on the surrounding context. The paper also shows that this GAN produces more realistic text samples compared to a maximum likelihood trained model.

MaskGAN: Better Text Generation via Filling in the _______

What problem is the paper attempting to solve?

This paper highlights how text classification was traditionally done using Recurrent Neural Network models by sampling from a distribution that is conditioned on the previous word and a hidden state that consists of a representation of the words generated so far. These are typically trained with maximum likelihood in an approach known as teacher forcing. However, this method causes problems when, during sample generation, the model is often forced to condition on sequences that were never conditioned on at training time, which leads to unpredictable dynamics in the hidden state of the RNN. Also, methods such as Professor Forcing and Scheduled Sampling have been proposed to solve this issue, which work indirectly by either causing the hidden state dynamics to become predictable (Professor Forcing) or by randomly conditioning on sampled words at training time, however, they do not directly specify a cost function on the output of the RNN encouraging high sample quality. The method proposed in the paper is trying to solve problem of text generation with GANs, by a sensible combination of novel approaches.

MaskGANs Paper summary

This paper proposes to improve sample quality using Generative Adversarial Network (GANs), which explicitly trains the generator to produce high quality samples. The model is trained on a text fill-in-the-blank or in-filling task. In this task, portions of a body of text are deleted or redacted. The goal of the model is to then infill the missing portions of text so that it is indistinguishable from the original data. While in-filling text, the model operates autoregressively over the tokens it has thus far filled in, as in standard language modeling, while conditioning on the true known context. If the entire body of text is redacted, then this reduces to language modeling.

The paper also shows qualitatively and quantitatively, evidence that this new proposed method produces more realistic text samples compared to a maximum likelihood trained model.

Key Takeaways

One can have a hold about what MaskGANs are, as this paper introduces a text generation model trained on in-filling (MaskGAN).
The paper considers the actor-critic architecture in extremely large action spaces, new evaluation metrics, and the generation of synthetic training data.
The proposed contiguous in-filling task i.e. MASKGAN, is a good approach to reduce mode collapse and help with training stability for textual GANs.
The paper shows that MaskGAN samples on a larger dataset (IMDB reviews) is significantly better than the corresponding tuned MaskMLE model as shown by human evaluation.
One can produce high-quality samples despite the MaskGAN model having much higher perplexity on the ground-truth test set

Reviewer feedback summary/takeaways

Overall Score: 21/30

Average Score: 7/10.

Reviewers liked the overall idea behind the paper. They appreciated the benefits they received from context (left context and right context) by solving a "fill-in-the-blank" task at training time and translating this into text generation at test time. A reviewer also stated that experiments were well carried through and very thorough. A reviewer also commented that the importance of the MaskGAN mechanism has been highlighted and the description of the reinforcement learning training part has been clarified.

However, with pros, the paper has also received some cons stating,

There is a lot of pre-training required for the proposed architecture
Generated texts are generally locally valid but not always valid globally
It was not made very clear whether the discriminator also conditions on the unmasked sequence.

A reviewer also stated that there were some unanswered questions such as

Was pre-training done for the baseline as well?
How was the masking done? How did you decide on the words to mask? Was this at random?
Is it actually usable in place of ordinary LSTM (or RNN)-based generation?