Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Natural Language Processing with Python

You're reading from   Hands-On Natural Language Processing with Python A practical guide to applying deep learning architectures to your NLP applications

Arrow left icon
Product type Paperback
Published in Jul 2018
Publisher Packt
ISBN-13 9781789139495
Length 312 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (5):
Arrow left icon
Rajalingappaa Shanmugamani Rajalingappaa Shanmugamani
Author Profile Icon Rajalingappaa Shanmugamani
Rajalingappaa Shanmugamani
Chaitanya Joshi Chaitanya Joshi
Author Profile Icon Chaitanya Joshi
Chaitanya Joshi
Auguste Byiringiro Auguste Byiringiro
Author Profile Icon Auguste Byiringiro
Auguste Byiringiro
Rajesh Arumugam Rajesh Arumugam
Author Profile Icon Rajesh Arumugam
Rajesh Arumugam
Karthik Muthuswamy Karthik Muthuswamy
Author Profile Icon Karthik Muthuswamy
Karthik Muthuswamy
+1 more Show less
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Getting Started 2. Text Classification and POS Tagging Using NLTK FREE CHAPTER 3. Deep Learning and TensorFlow 4. Semantic Embedding Using Shallow Models 5. Text Classification Using LSTM 6. Searching and DeDuplicating Using CNNs 7. Named Entity Recognition Using Character LSTM 8. Text Generation and Summarization Using GRUs 9. Question-Answering and Chatbots Using Memory Networks 10. Machine Translation Using the Attention-Based Model 11. Speech Recognition Using DeepSpeech 12. Text-to-Speech Using Tacotron 13. Deploying Trained Models 14. Other Books You May Enjoy

Basic concepts and terminologies in NLP

The following are some of the important terminologies and concepts in NLP mostly related to the language data. Getting familiar with these terms and concepts will help the reader in getting up to speed in understanding the contents in later chapters of the book:

  • Text corpus or corpora
  • Paragraph
  • Sentences
  • Phrases and words
  • N-grams
  • Bag-of-words

We will explain these in the following sections.

Text corpus or corpora

The language data that all NLP tasks depend upon is called the text corpus or simply corpus. A corpus is a large set of text data that can be in one of the languages like English, French, and so on. The corpus can consist of a single document or a bunch of documents. The source of the text corpus can be social network sites like Twitter, blog sites, open discussion forums like Stack Overflow, books, and several others. In some of the tasks like machine translation, we would require a multilingual corpus. For example we might need both the English and French translations of the same document content for developing a machine translation model. For speech tasks, we would also need human voice recordings and the corresponding transcribed corpus.

In most of the later chapters, we will be using text corpus and speech recordings available from the internet or open source data repositories. For many of the NLP task, the corpus is split into chunks for further analysis. These chunks could be at the paragraph, sentence, or word level. We will touch upon these in the following sections.

Paragraph

A paragraph is the largest unit of text handled by an NLP task. Paragraph level boundaries by itself may not be much use unless broken down into sentences. Though sometimes the paragraph may be considered as context boundaries. Tokenizers that can split a document into paragraphs are available in some of the Python libraries. We will look at such tokenizers in later chapters.

Sentences

Sentences are the next level of lexical unit of language data. A sentence encapsulates a complete meaning or thought and context. It is usually extracted from a paragraph based on boundaries determined by punctuations like period. The sentence may also convey opinion or sentiment expressed in it. In general, sentences consists of parts of speech (POS) entities like nouns, verbs, adjectives, and so on. There are tokenizers available to split paragraphs to sentences based on punctuations.

Phrases and words

Phrases are a group of consecutive words within a sentence that can convey a specific meaning. For example, in the sentence Tomorrow is going to be a rainy day the part going to be a rainy day expresses a specific thought. Some of the NLP tasks extract key phrases from sentences for search and retrieval applications. The next smallest unit of text is the word. The common tokenizers split sentences into text based on punctuations like spaces and comma. One of the problems with NLP is ambiguity in the meaning of same words used in different context. We will later see how this is handled well when we discuss word embeddings.

N-grams

A sequence of characters or words forms an N-gram. For example, character unigram consists of a single character, a bigram consists of a sequence of two characters and so on. Similarly word N-grams consists of a sequence of n words. In NLP, N-grams are used as features for tasks like text classification.

Bag-of-words

Bag-of-words in contrast to N-grams does not consider word order or sequence. It captures the word occurrence frequencies in the text corpus. Bag-of-words is also used as features in tasks like sentiment analysis and topic identification.

In the following sections, we will look at an overview of the following applications of NLP:

  • Analyzing sentiment
  • Recognizing named entities
  • Linking entities
  • Translating text
  • Natural language interfaces
  • Semantic Role Labeling
  • Relation extraction
  • SQL query generation, or semantic parsing
  • Machine Comprehension
  • Textual entailment
  • Coreference resolution
  • Searching
  • Question answering and chatbots
  • Converting text to voice
  • Converting voice to text
  • Speaker identification
  • Spoken dialog systems
  • Other applications
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime