Preface
2017 was a watershed moment for Natural Language Processing (NLP), with Transformer-and attention-based networks coming to the fore. The past few years have been as transformational for NLP as AlexNet was for computer vision in 2012. Tremendous advances in NLP have been made, and we are now moving from research labs into applications.
These advances span the domains of Natural Language Understanding (NLU), Natural Language Generation (NLG), and Natural Language Interaction (NLI). With so much research in all of these domains, it can be a daunting task to understand the exciting developments in NLP.
This book is focused on cutting-edge applications in the fields of NLP, language generation, and dialog systems. It covers the concepts of pre-processing text using techniques such as tokenization, parts-of-speech (POS) tagging, and lemmatization using popular libraries such as Stanford NLP and spaCy. Named Entity Recognition (NER) models are built from scratch using Bi-directional Long Short-Term Memory networks (BiLSTMs), Conditional Random Fields (CRFs), and Viterbi decoding. Taking a very practical, application-focused perspective, the book covers key emerging areas such as generating text for use in sentence completion and text summarization, multi-modal networks that bridge images and text by generating captions for images, and managing the dialog aspects of chatbots. It covers one of the most important reasons behind recent advances of NLP – transfer learning and fine tuning. Unlabeled textual data is easily available but labeling this data is costly. This book covers practical techniques that can simplify the labeling of textual data.
By the end of the book, I hope you will have advanced knowledge of the tools, techniques, and deep learning architectures used to solve complex NLP problems. The book will cover encoder-decoder networks, Long Short-Term Memory networks (LSTMs) and BiLSTMs, CRFs, BERT, GPT-2, GPT-3, Transformers, and other key technologies using TensorFlow.
Advanced TensorFlow techniques required for building advanced models are also covered:
- Building custom models and layers
- Building custom loss functions
- Implementing learning rate annealing
- Using
tf.data
for loading data efficiently - Checkpointing models to enable long training times (usually several days)
This book contains working code that can be adapted to your own use cases. I hope that you will even be able to do novel state-of-the-art research using the skills you'll gain as you progress through the book.