Before the advent of deep learning, traditional natural language processing (NLP) approaches had been widely used in tasks such as spam filtering, sentiment classification, and part of speech (POS) tagging. These classic approaches utilized statistical characteristics of sequences such as word count and co-occurrence, as well as simple linguistic features. However, the main disadvantage of these techniques was that they could not capture complex linguistic characteristics, such as context and intra-word dependencies.
Recent developments in neural networks and deep learning have given us powerful new tools to match human-level performance on NLP tasks and build products that deal with natural language. Deep learning for NLP is centered around the concept of word embeddings or vectors, also known as Word2vec, which encapsulate the meanings of words and phrases as dense vector representations. Word vectors, which are able to capture semantic information about words better than traditional one-hot representations, allow us to handle the temporal nature of language in an intuitive way when used in combination with a class of neural networks known as recurrent neural networks (RNNs). While RNNs can capture only local word dependencies, recently proposed vector-based operations for attention and alignment over word vector sequences allow neural networks to model global intra-word dependencies, including context. Due to their capability to model the syntax and semantics of language, strong empirical performance, and ability to generalize to new data, neural networks have become the go-to model for building highly sophisticated commercial products, such as search engines, translation services, and dialog systems.
This book introduces the basic building blocks of deep learning models for NLP and explores cutting-edge techniques from recent literature. We take a problem-based approach, where we introduce new models as solutions to various NLP tasks. Our focus is on providing practical code implementations in Python that can be applied to your use cases to bring human capabilities into your applications.