Chapter 1, Preparing Text for Analysis and Tokenization, demonstrates numerous approaches for performing tokenization. This is the process of extracting the individual words and elements of a document, and forms the basis for most NLP tasks. This process can be difficult to perform correctly. There are many specialized tokenizers available to address a number of different specialized texts.
Chapter 2, Isolating Sentences within a Document, covers how the process of sentence isolation is also a key NLP task. The process involves more than finding a period, exclamation mark, or question mark and using them as sentence delimiters. The process often requires the use of trained neural network models to work correctly.
Chapter 3, Performing Name Entity Recognition, explains how to isolate the key elements of a text in terms of entities such as names, dates, and places. It is not feasible to create an exhaustive list of entities, so neural networks are frequently used to perform this task.
Chapter 4, Detecting POS Using Neural Networks, covers the topic of POS, which refers to parts of speech and corresponds to sentence elements such as nouns, verbs, and adjectives. Performing POS is critical to extract meaning from a text. This chapter will illustrate various POS techniques and show how these elements can be depicted.
Chapter 5, Performing Text Classification, outlines a common NLP activity: classifying text into one or more categories. This chapter will demonstrate how this is accomplished, including the process of performing sentiment analysis. This is often used to access a customer's opinion of a product or service.
Chapter 6, Finding Relationships within Text, explains how identifying the relationships between text elements can be used to extract meaning from a document. While this is not a simple task, it is becoming increasingly important to many applications. We will examine various approaches to accomplish this goal.
Chapter 7, Language Identification and Translation, covers how language translation is critical to many problem domains, and takes on increased importance as the world becomes more and more interconnected. In this chapter, we will demonstrate several cloud-based approaches to performing natural language translation.
Chapter 8, Identifying Semantic Similarities within Text, explains how texts can be similar to each other at various levels. Similar words may be used, or there may be similarities in text structure. This capability is useful for a variety of tasks ranging from spell checking to assisting in determining the meaning of a text. We will demonstrate various approaches in this chapter.
Chapter 9, Common Text Processing and Generation Tasks, outlines how the NLP techniques illustrated in this book are all based on a set of common text-processing activities. These include using data structures such as inverted dictionaries and generating random numbers for training sets. In this chapter, we will demonstrate many of these tasks.
Chapter 10, Extracting Data for Use in NLP Analysis, emphasizes how important it is to be able to obtain data from a variety of sources. As more and more data is created, we need mechanisms for extracting and then processing the data. We will illustrate some of these techniques, including extracting data from Word/PDF documents, websites, and spreadsheets.
Chapter 11, Creating a Chatbot, discusses an increasingly common and important NLP application: chatbots. In this chapter, we will demonstrate how to create a chatbot, and how a Java application interface can be used to enhance the functionality of the chatbot.
Appendix, Installation and Configuration, covers the different installations and configurations for Google Cloud Platform (GCP) and Amazon Web Services (AWS).