Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Python Natural Language Processing Explore tools and techniques to analyze and process text with a view to building real-world NLP applications

Product type Paperback

Published in Jun 2020

Publisher Packt

ISBN-13 9781838989590

Length 316 pages

Edition 1st Edition

Languages

Processing

Tools

NumPy

Concepts

Mobile Application Development

Authors (2):

Mayank Rasu

Aman Kedia

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction

2. Understanding the Basics of NLP FREE CHAPTER

3. NLP Using Python

4. Section 2: Natural Language Representation and Mathematics

5. Building Your NLP Vocabulary

6. Transforming Text into Data Structures

7. Word Embeddings and Distance Measurements for Text

8. Exploring Sentence-, Document-, and Character-Level Embeddings

9. Section 3: NLP and Learning

10. Identifying Patterns in Text Using Machine Learning

11. From Human Neurons to Artificial Neurons for Understanding Text

12. Applying Convolutions to Text

13. Capturing Temporal Relationships in Text

14. State of the Art in NLP

15. Other Books You May Enjoy

Leave a review - let other readers know what you think

Venturing into Doc2Vec

As we saw in Chapter 5, Word Embeddings and Distance Measurements for Text, Word2Vec helped in fetching semantic embeddings for word-level representations. However, most of the NLP tasks we deal with are a combination of words or are essentially what we call a paragraph:

How do we fetch paragraph-level embeddings?

One simple mechanism would be to take the word embeddings for the words occurring in the paragraph and average them out to have representations of paragraphs:

Can we do better than averaging word embeddings?

Le and Mikolov extended the idea of Word2Vec to develop paragraph-level embeddings so that paragraphs of differing lengths can be represented by fixed-length vectors. In doing so, they presented the paper Distributed Representations of Sentences and Documents (https://arxiv.org/abs/1405.4053), which aimed at building paragraph-level embeddings. Similar to Word2Vec, the idea here is to predict certain words as well. However, in addition to using word...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Kedia

Aman Kedia is a data enthusiast and lifelong learner. He is an avid believer in Artificial Intelligence (AI) and the algorithms supporting it. He has worked on state-of-the-art problems in Natural Language Processing (NLP), encompassing resume matching and digital assistants, among others. He has worked at Oracle and SAP, trying to solve problems leveraging advancements in AI. He has four published research papers in the domain of AI.

See other products by Kedia

Rasu

Mayank Rasu is the author of the book Hands-On Natural Language Processing with Python. He has more than 12 years of global experience as a data scientist and quantitative analyst in the investment banking domain. He has worked at the intersection of finance and technology and has developed and deployed AI-based applications in the finance domain, which include sentiment analyzer, robotics process automation, and deep learning-based document reviewers. Mayank is also an educator and has trained/mentored working professionals on applied AI.

See other products by Rasu