Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Natural Language Processing Cookbook

You're reading from   Python Natural Language Processing Cookbook Over 50 recipes to understand, analyze, and generate text for implementing language processing tasks

Arrow left icon
Product type Paperback
Published in Mar 2021
Publisher Packt
ISBN-13 9781838987312
Length 284 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Zhenya Antić Zhenya Antić
Author Profile Icon Zhenya Antić
Zhenya Antić
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

Preface 1. Chapter 1: Learning NLP Basics 2. Chapter 2: Playing with Grammar FREE CHAPTER 3. Chapter 3: Representing Text – Capturing Semantics 4. Chapter 4: Classifying Texts 5. Chapter 5: Getting Started with Information Extraction 6. Chapter 6: Topic Modeling 7. Chapter 7: Building Chatbots 8. Chapter 8: Visualizing Text Data 9. Other Books You May Enjoy

Getting the dependency parse

A dependency parse is a tool that shows dependencies in a sentence. For example, in the sentence The cat wore a hat, the root of the sentence in the verb, wore, and both the subject, the cat, and the object, a hat, are dependents. The dependency parse can be very useful in many NLP tasks since it shows the grammatical structure of the sentence, along with the subject, the main verb, the object, and so on. It can be then used in downstream processing.

Getting ready

We will use spacy to create the dependency parse. If you already downloaded it while working on the previous chapter, you do not need to do anything more. Otherwise, please follow the instructions at the beginning of Chapter 1, Learning NLP Basics, to install the necessary packages.

How to do it…

We will take a few sentences from the sherlock_holmes1.txt file to illustrate the dependency parse. The steps are as follows:

  1. Import spacy:
    import spacy
  2. Load the sentence to be parsed:
    sentence = 'I have seldom heard him mention her under any other name.'
  3. Load the spacy engine:
    nlp = spacy.load('en_core_web_sm')
  4. Process the sentence using the spacy engine:
    doc = nlp(sentence)
  5. The dependency information will be contained in the doc object. We can see the dependency tags by looping through the tokens in doc:
    for token in doc:
        print(token.text, "\t", token.dep_, "\t",
        spacy.explain(token.dep_))
  6. The result will be as follows. To learn what each of the tags means, use spaCy's explain function, which shows the meanings of the tags:
    I        nsubj   nominal subject
    have     aux     auxiliary
    seldom   advmod          adverbial modifier
    heard    ROOT    None
    him      nsubj   nominal subject
    mention          ccomp   clausal complement
    her      dobj    direct object
    under    prep    prepositional modifier
    any      det     determiner
    other    amod    adjectival modifier
    name     pobj    object of preposition
    .        punct   punctuation
  7. To explore the dependency parse structure, we can use the attributes of the Token class. Using its ancestors and children attributes, we can get the tokens that this token depends on and the tokens that depend on it, respectively. The code to get these ancestors is as follows:
    for token in doc:
        print(token.text)
        ancestors = [t.text for t in token.ancestors]
        print(ancestors)

    The output will be as follows:

    I
    ['heard']
    have
    ['heard']
    seldom
    ['heard']
    heard
    []
    him
    ['mention', 'heard']
    mention
    ['heard']
    her
    ['mention', 'heard']
    under
    ['mention', 'heard']
    any
    ['name', 'under', 'mention', 'heard']
    other
    ['name', 'under', 'mention', 'heard']
    name
    ['under', 'mention', 'heard']
    .
    ['heard']
  8. To see all the children token, use the following code:
    for token in doc:
        print(token.text)
        children = [t.text for t in token.children]
        print(children)
  9. The output will be as follows:
    I
    []
    have
    []
    seldom
    []
    heard
    ['I', 'have', 'seldom', 'mention', '.']
    him
    []
    mention
    ['him', 'her', 'under']
    her
    []
    under
    ['name']
    any
    []
    other
    []
    name
    ['any', 'other']
    .
    []
  10. We can also see the subtree that the token is in:
    for token in doc:
        print(token.text)
        subtree = [t.text for t in token.subtree]
        print(subtree)

    This will produce the following output:

    I
    ['I']
    have
    ['have']
    seldom
    ['seldom']
    heard
    ['I', 'have', 'seldom', 'heard', 'him', 'mention', 'her', 'under', 'any', 'other', 'name', '.']
    him
    ['him']
    mention
    ['him', 'mention', 'her', 'under', 'any', 'other', 'name']
    her
    ['her']
    under
    ['under', 'any', 'other', 'name']
    any
    ['any']
    other
    ['other']
    name
    ['any', 'other', 'name']
    .
    ['.']

How it works…

The spacy NLP engine does the dependency parse as part of its overall analysis. The dependency parse tags explain the role of each word in the sentence. ROOT is the main word that all the other words depend on, usually the verb.

From the subtrees that each word is part of, we can see the grammatical phrases that appear in the sentence, such as the noun phrase (NP) any other name and prepositional phrase (PP) under any other name.

The dependency chain can be seen by following the ancestor links for each word. For example, if we look at the word name, we will see that its ancestors are under, mention, and heard. The immediate parent of name is under, under's parent is mention, and mention's parent is heard. A dependency chain will always lead to the root, or the main word, of the sentence.

In step 1, we import the spacy package. In step 2, we initialize the variable sentence that contains the sentence to be parsed. In step 3, we load the spacy engine and in step 4, we use the engine to process the sentence.

In step 5, we print out each token's dependency tag and use the spacy.explain function to see what those tags mean.

In step 6, we print out the ancestors of each token. The ancestors will start at the parent and go up until they reach the root. For example, the parent of him is mention, and the parent of mention is heard, so both mention and heard are listed as ancestors of him.

In step 7, we print children of each token. Some tokens, such as have, do not have any children, while others have several. The token that will always have children, unless the sentence consists of one word, is the root of the sentence; in this case, heard.

In step 8, we print the subtree for each token. For example, the word under is in the subtree under any other name.

See also

The dependency parse can be visualized graphically using the displacy package, which is part of spacy. Please see Chapter 8, Visualizing Text Data, for a detailed recipe on how to perform visualization.

You have been reading a chapter from
Python Natural Language Processing Cookbook
Published in: Mar 2021
Publisher: Packt
ISBN-13: 9781838987312
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime