What this book covers
Chapter 1, Learning NLP Basics, is an introductory chapter with basic preprocessing steps for working with text. It includes recipes such as dividing up text into sentences, stemming and lemmatization, removing stopwords, and parts-of-speech tagging. You will find out about different approaches for parts-of-speech tagging, as well as two options for removing stopwords.
Chapter 2, Playing with Grammar, will show how to get and use grammatical information about text. We will create a dependency parse and then use it to split a sentence into clauses. We will also use the dependency parse and noun chunks to extract entities and relations in the text. Certain recipes will show how to extract grammatical information in both English and Spanish.
Chapter 3, Representing Text – Capturing Semantics, covers how, as working with words and semantics is easy for people but difficult for computers, we need to represent text in a way other than words in order for computers to be able to work with the text. This chapter presents different ways of representing text, from a simple bag of words, to BERT. This chapter also discusses a basic implementation of semantic search that uses these semantic representations.
Chapter 4, Classifying Texts, covers text classification, which is one of the most important techniques in NLP. It is used in many different industries for different types of texts, such as tweets, long documents, and sentences. In this chapter, you will learn how to do both supervised and unsupervised text classification with a variety of techniques and tools, including K-Means, SVMs and LSTMs.
Chapter 5, Getting Started with Information Extraction, discusses how one of the main goals of NLP is extracting information from text in order to use it later. This chapter shows different ways of pulling information from text, from the simplest regular expression techniques to find emails and URLs to neural network tools to extract sentiment.
Chapter 6, Topic Modeling, discusses how determining topics of texts is an important NLP tool that can help in text classification and discovering new topics in texts. This chapter introduces different techniques for topic modeling, including unsupervised and supervised techniques, and topic modeling of short texts, such as tweets.
Chapter 7, Building Chatbots, covers chatbots, which are an important marketing tool that has emerged in the last few years. In this chapter, you will learn how to build a chatbot using two different frameworks, NLTK for keyword matching chatbots, and Rasa for sophisticated chatbots with a deep learning model under the hood.
Chapter 8, Visualizing Text Data, discusses how visualizing the results of different NLP analyses can be a very useful tool for presentation and evaluation. This chapter introduces you to visualization techniques for different NLP tools, including NER, topic modeling, and word clouds.