Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Python Natural Language Processing Cookbook

You're reading from   Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781803245744
Length 312 pages
Edition 2nd Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Saurabh Chakravarty Saurabh Chakravarty
Author Profile Icon Saurabh Chakravarty
Saurabh Chakravarty
Zhenya Antić Zhenya Antić
Author Profile Icon Zhenya Antić
Zhenya Antić
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Chapter 1: Learning NLP Basics 2. Chapter 2: Playing with Grammar FREE CHAPTER 3. Chapter 3: Representing Text – Capturing Semantics 4. Chapter 4: Classifying Texts 5. Chapter 5: Getting Started with Information Extraction 6. Chapter 6: Topic Modeling 7. Chapter 7: Visualizing Text Data 8. Chapter 8: Transformers and Their Applications 9. Chapter 9: Natural Language Understanding 10. Chapter 10: Generative AI and Large Language Models 11. Index 12. Other Books You May Enjoy

Extracting subjects and objects of the sentence

Sometimes, we might need to find the subject and direct objects of the sentence, and that is easily accomplished with the spaCy package.

Getting ready

We will be using the dependency tags from spaCy to find subjects and objects. The code uses the spaCy engine to parse the sentence. Then, the subject function loops through the tokens, and if the dependency tag contains subj, it returns that token’s subtree, a Span object. There are different subject tags, including nsubj for regular subjects and nsubjpass for subjects of passive sentences, thus we want to look for both.

How to do it…

We will use the subtree attribute of tokens to find the complete noun chunk that is the subject or direct object of the verb (see the Getting the dependency parse recipe). We will define functions to find the subject, direct object, dative phrase, and prepositional phrases:

  1. Run the file and language utility notebooks:
    %run -i "../util/file_utils.ipynb"
    %run -i "../util/lang_utils.ipynb"
  2. We will use two functions to find the subject and the direct object of the sentence. These functions will loop through the tokens and return the subtree that contains the token with subj or dobj in the dependency tag, respectively. Here is the subject function. It looks for the token that has a dependency tag that contains subj and then returns the subtree that contains that token. There are several subject dependency tags, including nsubj and nsubjpass (for the subject of a passive sentence), so we look for the most general pattern:
    def get_subject_phrase(doc):
        for token in doc:
            if ("subj" in token.dep_):
                subtree = list(token.subtree)
                start = subtree[0].i
                end = subtree[-1].i + 1
                return doc[start:end]
  3. Here is the direct object function. It works similarly to get_subject_phrase but looks for the dobj dependency tag instead of a tag that contains subj. If the sentence does not have a direct object, it will return None:
    def get_object_phrase(doc):
        for token in doc:
            if ("dobj" in token.dep_):
                subtree = list(token.subtree)
                start = subtree[0].i
                end = subtree[-1].i + 1
                return doc[start:end]
  4. Assign a list of sentences to a variable, loop through them, and use the preceding functions to print out their subjects and objects:
    sentences = [
        "The big black cat stared at the small dog.",
        "Jane watched her brother in the evenings.",
        "Laura gave Sam a very interesting book."
    ]
    for sentence in sentences:
        doc = small_model(sentence)
        subject_phrase = get_subject_phrase(doc)
        object_phrase = get_object_phrase(doc)
        print(sentence)
        print("\tSubject:", subject_phrase)
        print("\tDirect object:", object_phrase)

    The result will be as follows. Since the first sentence does not have a direct object, None is printed out. For the sentence The big black cat stared at the small dog, the subject is the big black cat and there is no direct object (the small dog is the object of the preposition at). For the sentence Jane watched her brother in the evenings, the subject is Jane and the direct object is her brother. In the sentence Laura gave Sam a very interesting book, the subject is Laura and the direct object is a very interesting book:

    The big black cat stared at the small dog.
      Subject: The big black cat
      Direct object: None
    Jane watched her brother in the evenings.
      Subject: Jane
      Direct object: her brother
    Laura gave Sam a very interesting book.
      Subject: Laura
      Direct object: a very interesting book

There’s more…

We can look for other objects, for example, the dative objects of verbs such as give and objects of prepositional phrases. The functions will look very similar, with the main difference being the dependency tags: dative for the dative object function, and pobj for the prepositional object function. The prepositional object function will return a list since there can be more than one prepositional phrase in a sentence:

  1. The dative object function checks the tokens for the dative tag. It returns None if there are no dative objects:
    def get_dative_phrase(doc):
        for token in doc:
            if ("dative" in token.dep_):
                subtree = list(token.subtree)
                start = subtree[0].i
                end = subtree[-1].i + 1
                return doc[start:end]
  2. We can also combine the subject, object, and dative functions into one with an argument that specifies which object to look for:
    def get_phrase(doc, phrase):
        # phrase is one of "subj", "obj", "dative"
        for token in doc:
            if (phrase in token.dep_):
                subtree = list(token.subtree)
                start = subtree[0].i
                end = subtree[-1].i + 1
                return doc[start:end]
  3. Let us now define a sentence with a dative object and run the function for all three types of phrases:
    sentence = "Laura gave Sam a very interesting book."
    doc = small_model(sentence)
    subject_phrase = get_phrase(doc, "subj")
    object_phrase = get_phrase(doc, "obj")
    dative_phrase = get_phrase(doc, "dative")
    print(sentence)
    print("\tSubject:", subject_phrase)
    print("\tDirect object:", object_phrase)
    print("\tDative object:", dative_phrase)

    The result will be as follows. The dative object is Sam:

    Laura gave Sam a very interesting book.
      Subject: Laura
      Direct object: a very interesting book
      Dative object: Sam
  4. Here is the prepositional object function. It returns a list of objects of prepositions, which will be empty if there are none:
    def get_prepositional_phrase_objs(doc):
        prep_spans = []
        for token in doc:
            if ("pobj" in token.dep_):
                subtree = list(token.subtree)
                start = subtree[0].i
                end = subtree[-1].i + 1
                prep_spans.append(doc[start:end])
        return prep_spans
  5. Let’s define a list of sentences and run the two functions on them:
    sentences = [
        "The big black cat stared at the small dog.",
        "Jane watched her brother in the evenings."
    ]
    for sentence in sentences:
        doc = small_model(sentence)
        subject_phrase = get_phrase(doc, "subj")
        object_phrase = get_phrase(doc, "obj")
        dative_phrase = get_phrase(doc, "dative")
        prepositional_phrase_objs = \
            get_prepositional_phrase_objs(doc)
        print(sentence)
        print("\tSubject:", subject_phrase)
        print("\tDirect object:", object_phrase)
        print("\tPrepositional phrases:", prepositional_phrase_objs)

    The result will be as follows:

    The big black cat stared at the small dog.
      Subject: The big black cat
      Direct object: the small dog
      Prepositional phrases: [the small dog]
    Jane watched her brother in the evenings.
      Subject: Jane
      Direct object: her brother
      Prepositional phrases: [the evenings]

    There is one prepositional phrase in each sentence. In the sentence The big black cat stared at the small dog, it is at the small dog, and in the sentence Jane watched her brother in the evenings, it is in the evenings.

It is left as an exercise for you to find the actual prepositional phrases with prepositions intact instead of just the noun phrases that are dependent on these prepositions.

You have been reading a chapter from
Python Natural Language Processing Cookbook - Second Edition
Published in: Sep 2024
Publisher: Packt
ISBN-13: 9781803245744
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image