You're reading from Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781803245744

Length 312 pages

Edition 2nd Edition

Languages

Processing

Tools

Processing

Concepts

GPT/LLMs

Authors (2):

Saurabh Chakravarty

Zhenya Antić

View More author details

Table of Contents (13) Chapters

Preface

1. Chapter 1: Learning NLP Basics

2. Chapter 2: Playing with Grammar FREE CHAPTER

3. Chapter 3: Representing Text – Capturing Semantics

4. Chapter 4: Classifying Texts

5. Chapter 5: Getting Started with Information Extraction

6. Chapter 6: Topic Modeling

7. Chapter 7: Visualizing Text Data

8. Chapter 8: Transformers and Their Applications

9. Chapter 9: Natural Language Understanding

10. Chapter 10: Generative AI and Large Language Models

11. Index

Why subscribe?

12. Other Books You May Enjoy

Extracting noun chunks

Noun chunks are known in linguistics as noun phrases. They represent nouns and any words that depend on and accompany nouns. For example, in the sentence The big red apple fell on the scared cat, the noun chunks are the big red apple and the scared cat. Extracting these noun chunks is instrumental to many other downstream NLP tasks, such as named entity recognition and processing entities and relations between them. In this recipe, we will explore how to extract named entities from a text.

Getting ready

We will use the spaCy package, which has a function for extracting noun chunks, and the text from the sherlock_holmes_1.txt file as an example.

How to do it…

Use the following steps to get the noun chunks from a text:

Run the file and language utility notebooks:

%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"

Define the function that will print out the noun chunks. The noun chunks are contained in the doc.noun_chunks class variable:

def print_noun_chunks(text, model):
    doc = model(text)
    for noun_chunk in doc.noun_chunks:
        print(noun_chunk.text)

Read the text from the sherlock_holmes_1.txt file and use the function on the resulting text:
```
sherlock_holmes_part_of_text = read_text_file("../data/sherlock_holmes_1.txt")
print_noun_chunks(sherlock_holmes_part_of_text, small_model)
```
This is the partial result. See the output of the notebook at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter02/noun_chunks_2.3.ipynb for the full printout. The function gets the pronouns, nouns, and noun phrases that are in the text correctly:
```
Sherlock Holmes
she
the_ woman
I
him
her
any other name
his eyes
she
the whole
…
```

There’s more…

Noun chunks are spaCy Span objects and have all their properties. See the official documentation at https://spacy.io/api/token.

Let’s explore some properties of noun chunks:

We will define a function that will print out the different properties of noun chunks. It will print the text of the noun chunk, its start and end indices within the Doc object, the sentence it belongs to (useful when there is more than one sentence), the root of the noun chunk (its main word), and the chunk’s similarity to the word emotions. Finally, it will print out the similarity of the whole input sentence to emotions:

def explore_properties(sentence, model):
    doc = model(sentence)
    other_span = "emotions"
    other_doc = model(other_span)
    for noun_chunk in doc.noun_chunks:
        print(noun_chunk.text)
        print("Noun chunk start and end", "\t",
            noun_chunk.start, "\t", noun_chunk.end)
        print("Noun chunk sentence:", noun_chunk.sent)
        print("Noun chunk root:", noun_chunk.root.text)
        print(f"Noun chunk similarity to '{other_span}'",
            noun_chunk.similarity(other_doc))
    print(f"Similarity of the sentence '{sentence}' to 
        '{other_span}':",
        doc.similarity(other_doc))

Set the sentence to All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind:
```
sentence = "All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind."
```

Use the explore_properties function on the sentence using the small model:

explore_properties(sentence, small_model)

This is the result:

All emotions
Noun chunk start and end    0    2
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: emotions
Noun chunk similarity to 'emotions' 0.4026421588260174
his cold, precise but admirably balanced mind
Noun chunk start and end    11    19
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: mind
Noun chunk similarity to 'emotions' -0.036891259527462
Similarity of the sentence 'All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.' to 'emotions': 0.03174900767577446

You will also see a warning message similar to this one due to the fact that the small model does not ship with word vectors of its own:

/tmp/ipykernel_1807/2430050149.py:10: UserWarning: [W007] The model you're using has no word vectors loaded, so the result of the Span.similarity method will be based on the tagger, parser and NER, which may not give useful similarity judgements. This may happen if you're using one of the small models, e.g. `en_core_web_sm`, which don't ship with word vectors and only use context-sensitive tensors. You can always add your own word vectors, or use one of the larger models instead if available.
  print(f"Noun chunk similarity to '{other_span}'", noun_chunk.similarity(other_doc))

Now, let’s apply the same function to the same sentence with the large model:

sentence = "All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind."
explore_properties(sentence, large_model)

The large model does come with its own word vectors and does not result in a warning:

All emotions
Noun chunk start and end    0    2
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: emotions
Noun chunk similarity to 'emotions' 0.6302678068015664
his cold, precise but admirably balanced mind
Noun chunk start and end    11    19
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: mind
Noun chunk similarity to 'emotions' 0.5744456705692561
Similarity of the sentence 'All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.' to 'emotions': 0.640366414527618

We see that the similarity of the All emotions noun chunk is high in relation to the word emotions, as compared to the similarity of the his cold, precise but admirably balanced mind noun chunk.

Important note

A larger spaCy model, such as en_core_web_lg, takes up more space but is more precise.

You're reading from Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Table of Contents (13) Chapters

Extracting noun chunks

Getting ready

How to do it…

There’s more…

See also

Authors (2)

Personalised recommendations for you