The roadmap – beyond this chapter
This section delineates the details of the rest of the book; it's brief, but has informative details about what each chapter of the book covers. In this book, we will be looking at numerous exciting fields of NLP, from algorithms that find word similarities without any sort of annotated data, to algorithms that can write a story by themselves.
Starting from the next chapter, we will dive into the details about several popular and interesting NLP tasks. In order to gain an in-depth knowledge and to make the learning interactive, various exercises are also provided. We will use Python and TensorFlow, an open-source library for distributed numerical computations, for all the implementations. TensorFlow encapsulates advance technicalities such as optimizing your code for GPUs using Compute Unified Device Architecture (CUDA), which can be challenging. Furthermore, TensorFlow provides built-in functions for implementing deep learning algorithms, for example, activations, stochastic optimization methods, and convolutions, making everyone's life easier.
We will embark on a journey that covers many hot topics of NLP and how they perform, while using TensorFlow to see the state-of-the-art algorithms in action. This is what we will look at in this book:
- Chapter 2, Understanding TensorFlow, provides you with a sound guide to understand how to write client programs and run them in TensorFlow. This is important especially if you are new to TensorFlow, because TensorFlow behaves differently from a traditional coding language such as Python. This chapter will first offer an in-depth explanation about how TensorFlow executes a client. This will help you to understand the TensorFlow execution workflow and feel comfortable around TensorFlow terminology. Next, the chapter will walk you through various elements of a TensorFlow client such as defining variables, defining operations/functions, feeding inputs to an algorithm, and obtaining the results. We will finally discuss how all this knowledge of TensorFlow can be used to implement a moderately complex neural network to classify images of hand-written images.
- Chapter 3, Word2vec – Learning Word Embeddings. The objective of this chapter is to introduce Word2vec—a method to learn numerical representations of words that reflects semantic of the words. But before diving straight into the Word2vec techniques, we will first discuss some classical approaches used to represent word semantics. One of the early approach was to rely on WordNet—a large lexical database. WordNet can be used to measure the semantic similarity between different words. However, maintaining such a large lexical database is costly. Therefore, there exist other simpler representation techniques, such as one-hot-encoded representations, and the term-frequency inverse document frequency method, that doesn't rely on external resources. Following this, we will move onto the modern way of learning word vectors known as Word2vec, where we use a neural network to learn word representations. We will discuss two popular Word2vec techniques: skip-gram and continuous bag-of-words (CBOW) model.
- Chapter 4, Advanced Word2vec. We will start this chapter with several comparisons including a comparison between the skip-gram and CBOW algorithms to see if there is a clear-cut winner. Then we will discuss several extensions that have been introduced to the original Word2vec techniques over the course of the past few years. For example, ignoring common words in the text, such as "the" and "a", that have a high probability, improves the performance of the Word2vec models. On the other hand, the Word2vec model only considers the local context of a word and ignores the global statistics of the entire corpus. Consequently, a word embedding learning technique known as GloVe, which incorporates both global and local statistics in finding word vectors will be discussed.
- Chapter 5, Sentence Classification with Convolution Neural Networks, introduces you to convolution neural networks (CNNs). Convolution networks are a powerful family of deep models that can leverage the spatial structure of an input to learn from data. In other words, a CNN can process images in their two-dimensional form, where a multilayer perceptron needs the image to be unwrapped to a one-dimensional vector. We will first discuss various operations that undergoes in CNNs, such as the convolution and pooling operations, in detail. Then we will see an example where we will learn to classify hand-written digit images with a CNN. Then we will transition into an application of CNNs in NLP. Precisely, we will be investigating how to apply a CNN to classify sentences, where the task is to classify if a sentence is about a person, location, object, and so on.
- Chapter 6, Recurrent Neural Networks, focuses on introducing recurrent neural networks (RNNs) and using RNNs for language generation. RNNs are different from feed-forward neural networks (for example, CNNs) as RNNs have memory. The memory is stored as a continuously updated system state. We will start with a representation of a feed-forward neural network and modify that representation to learn from sequences of data instead of individual data points. This process will transform the feed-forward network to a RNN. This will be followed by a technical description about the exact equations used for computations within the RNN. Next, we will discuss the optimization process of RNNs that is used to update the RNN's weights. Thereafter we will iterate through different types of RNNs such as one-to-one RNNs and one-to-many RNNs. We will then walkthrough an exciting application of RNNs, where the RNN will learn to tell new stories by learning from a corpus of existing stories. We achieve this by training the RNN to predict the next word given the preceding sequence of words of the story. Finally, we will discuss a variant of standard RNNs, which we call RNN-CF (RNN with contextual features), and will compare it with the standard RNN to see which one performs better.
- Chapter 7, Long Short-Term Memory Networks, discusses LSTMs by initially providing a solid intuition to how these models work and progressively diving into the technical details adequate to implement them on your own. Standard RNNs suffer from the crucial limitation of the inability to persist long-term memory. However, advanced RNN models (for example, long short-term memory cells (LSTMs) and gated recurrent units (GRUs)) have been proposed, which can remember sequences for large number of time steps. We will also examine how exactly does the LSTMs alleviate the problem of persisting long-term memory (this is known as the vanishing gradient problem). We will then discuss several improvements that can be used to improve LSTM models further such as predicting for several time steps ahead at once and reading sequences both forward and backward. Finally, we will discuss several variants of LSTM models such as GRUs and LSTMs with peephole connections.
- Chapter 8, Applications of LSTM – Generating Text, explains how to implement LSTMs, GRUs, and LSTMs with peephole connections discussed in Chapter 7, Long Short-Term Memory Networks. Furthermore, we will compare the performance of these extensions both qualitatively and quantitatively. We will also discuss how to implement some of the extensions examined in Chapter 7, Long Short-Term Memory Networks such as predicting several time steps ahead (known as beam search) and using word vectors as inputs instead of one-hot-encoded representations. Finally, we will discuss how we can use the RNN API, which is a sub library of TensorFlow that simplifies the implementation of recurrent models.
- Chapter 9, Applications of LSTM – Image Caption Generation, looks at another exciting application, where the model learns how to generate captions (that is, descriptions) for images using an LSTM and a CNN. This application is interesting because it shows us how to combine two different types of models as well as how to learn with multimodal data (for example, images and text). The specific way to achieve this is to first learn image representations (similar to word vectors) with the CNN and train the LSTM by feeding that image vector followed by the words of the description of the image as a sequence. We will first discuss how we can use a pretrained CNN to obtain the image representations. Then we will discuss how to learn the word embeddings. Next we will discuss how to feed the image vectors along with word embeddings to train the LSTM. This is followed by a description of different evaluation metrics that exist for evaluating image captioning systems. Afterwards, we will evaluate the captions generated by our model, both qualitatively and quantitatively. We will conclude the chapter with a guide of how to implement the same system using the TensorFlow RNN API.
- Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation. Machine Translation has gained a lot of attention both due to the necessity of automating translation and the inherent difficulty of the task. We will start the chapter with a brief historical flashback of how machine translation was implemented in the early days. This discussion ends with an introduction to neural machine translation (NMT) systems. We will see how well current NMT systems are doing compared to old systems (such as statistical machine translation systems), which will motivate us to learn about NMT systems. Afterwards, we will discuss the intuition behind the design of NMT systems and continue with the technical details. Then we will discuss the evaluation metric we use to evaluate our system. Following this, we will investigate how we can implement a German to English translator from scratch. Next, we will learn about ways to improve NMT systems. We will look at one of those extensions in detail, called attention mechanism. Attention mechanism has become an essential in sequence to sequence learning problems. Finally, we will compare the performance improvement obtained with attention mechanism and analyze reasons behind the performance gain. This chapter concludes with a section on how the same concept of NMT systems can be extended to implement chatbots. Chatbots are systems that can communicate with humans and are used to fulfill various customer requests.
- Chapter 11, Current Trends and the Future of Natural Language Processing. Natural language processing has branched out to a vast spectrum of different tasks. Here we will discuss some of the current trends and future developments of NLP we can expect in the future. We will first discuss various word embedding extensions that have emerged recently. We will also look at the implementation of one such embedding learning technique, known as tv-embeddings. Next, we will examine various trends growing in the field of neural machine translation. Then we will look at how NLP is combined with other fields such as computer vision and reinforcement learning to solve some interesting problems such as teaching computer agents to communicate by devising their own language. Another booming area these days is artificial general intelligence, which is about developing systems that can do multiple tasks (classify images, translate text, caption images, and so on) with a single system. We will investigate several such systems. Afterwards, we will talk about the introduction of NLP into mining social media. We will conclude this chapter with some of the new tasks emerging (for example, language grounding – developing common sense NLP systems) and new models (for example, phased LSTMs).
- Appendix, Mathematical Foundations and Advanced TensorFlow, will introduce the reader to various mathematical data structures (for example, matrices) and operations (for example, matrix inverse). We will also discuss several important concepts in probability. We will then introduce Keras—a high-level library that uses TensorFlow underneath. Keras makes the implementing of neural networks simpler by hiding some of the details in TensorFlow, which some might find challenging. Concretely, we will see how we can implement a CNN with Keras, to get a feel of how to use Keras. Next, we will discuss how we can use the seq2seq library in TensorFlow to implement a neural machine translation system with much less code that we used in Chapter 11, Current Trends and the Future of Natural Language Processing. Finally, we will walk you through a guide aimed at teaching to use the TensorBoard to visualize word embeddings. TensorBoard is a handy visualization tool that is shipped with TensorFlow. This can be used to visualize and monitor various variables in your TensorFlow client.