You're reading from Hands-On Natural Language Processing with PyTorch 1.x Build smart, AI-driven linguistic applications using deep learning and NLP techniques

Product type Paperback

Published in Jul 2020

Publisher Packt

ISBN-13 9781789802740

Length 276 pages

Edition 1st Edition

Languages

Processing

Tools

Processing

Concepts

Deep Learning

Author (1):

Thomas Dop

View More author details

Table of Contents (14) Chapters

Preface

1. Section 1: Essentials of PyTorch 1.x for NLP

2. Chapter 1: Fundamentals of Machine Learning and Deep Learning FREE CHAPTER

3. Chapter 2: Getting Started with PyTorch 1.x for NLP

4. Section 2: Fundamentals of Natural Language Processing

5. Chapter 3: NLP and Text Embeddings

6. Chapter 4: Text Preprocessing, Stemming, and Lemmatization

7. Section 3: Real-World NLP Applications Using PyTorch 1.x

8. Chapter 5: Recurrent Neural Networks and Sentiment Analysis

9. Chapter 6: Convolutional Neural Networks for Text Classification

10. Chapter 7: Text Translation Using Sequence-to-Sequence Neural Networks

11. Chapter 8: Building a Chatbot Using Attention-Based Neural Networks

12. Chapter 9: The Road Ahead

13. Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 4: Text Preprocessing, Stemming, and Lemmatization

Textual data can be gathered from a number of different sources and takes many different forms. Text can be tidy and readable or raw and messy and can also come in many different styles and formats. Being able to preprocess this data so that it can be converted into a standard format before it reaches our NLP models is what we'll be looking at in this chapter.

Stemming and lemmatization, similar to tokenization, are other forms of NLP preprocessing. However, unlike tokenization, which reduces a document into individual words, stemming and lemmatization are attempts to reduce these words further to their lexical roots. For example, almost any verb in English has many different variations, depending on tense:

He jumped

He is jumping

He jumps

While all these words are different, they all relate to the same root word – jump. Stemming and lemmatization are both techniques we can use to reduce word variations...

The rest of the chapter is locked

You're reading from Hands-On Natural Language Processing with PyTorch 1.x Build smart, AI-driven linguistic applications using deep learning and NLP techniques

Table of Contents (14) Chapters

Chapter 4: Text Preprocessing, Stemming, and Lemmatization

Authors (1)

Other recommended products

You're reading from Hands-On Natural Language Processing with PyTorch 1.x Build smart, AI-driven linguistic applications using deep learning and NLP techniques

Table of Contents (14) Chapters

Chapter 4: Text Preprocessing, Stemming, and Lemmatization

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products