You're reading from Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781803245744

Length 312 pages

Edition 2nd Edition

Languages

Processing

Tools

Processing

Concepts

GPT/LLMs

Authors (2):

Saurabh Chakravarty

Zhenya Antić

View More author details

Table of Contents (13) Chapters

Preface

1. Chapter 1: Learning NLP Basics

2. Chapter 2: Playing with Grammar FREE CHAPTER

3. Chapter 3: Representing Text – Capturing Semantics

4. Chapter 4: Classifying Texts

5. Chapter 5: Getting Started with Information Extraction

6. Chapter 6: Topic Modeling

7. Chapter 7: Visualizing Text Data

8. Chapter 8: Transformers and Their Applications

9. Chapter 9: Natural Language Understanding

10. Chapter 10: Generative AI and Large Language Models

11. Index

Why subscribe?

12. Other Books You May Enjoy

Finding patterns in text using grammatical information

In this section, we will use the spaCy Matcher object to find patterns in the text. We will use the grammatical properties of the words to create these patterns. For example, we might be looking for verb phrases instead of noun phrases. We can specify grammatical patterns to match verb phrases.

Getting ready

We will be using the spaCy Matcher object to specify and find patterns. It can match different properties, not just grammatical. You can find out more in the documentation at https://spacy.io/usage/rule-based-matching/.

How to do it…

Your steps should be formatted like so:

Run the file and language utility notebooks:

%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"

Import the Matcher object and initialize it. We need to put in the vocabulary object, which is the same as the vocabulary of the model we will be using to process the text:
```
from spacy.matcher import Matcher
matcher = Matcher(small_model.vocab)
```
Create a list of patterns and add them to the matcher. Each pattern is a list of dictionaries, where each dictionary describes a token. In our patterns, we only specify the part of speech for each token. We then add these patterns to the Matcher object. The patterns we will be using are a verb by itself (for example, paints), an auxiliary followed by a verb (for example, was observing), an auxiliary followed by an adjective (for example, were late), and an auxiliary followed by a verb and a preposition (for example, were staring at). This is not an exhaustive list; feel free to come up with other examples:
```
patterns = [
    [{"POS": "VERB"}],
    [{"POS": "AUX"}, {"POS": "VERB"}],
    [{"POS": "AUX"}, {"POS": "ADJ"}],
    [{"POS": "AUX"}, {"POS": "VERB"}, {"POS": "ADP"}]
]
matcher.add("Verb", patterns)
```

Read in the small part of the Sherlock Holmes text and process it using the small model:

sherlock_holmes_part_of_text = read_text_file("../data/sherlock_holmes_1.txt")
doc = small_model(sherlock_holmes_part_of_text)

Now, we find the matches using the Matcher object and the processed text. We then loop through the matches and print out the match ID, the string ID (the identifier of the pattern), the start and end of the match, and the text of the match:

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = small_model.vocab.strings[match_id]
    span = doc[start:end]
    print(match_id, string_id, start, end, span.text)

The result will be as follows:

14677086776663181681 Verb 14 15 heard
14677086776663181681 Verb 17 18 mention
14677086776663181681 Verb 28 29 eclipses
14677086776663181681 Verb 31 32 predominates
14677086776663181681 Verb 43 44 felt
14677086776663181681 Verb 49 50 love
14677086776663181681 Verb 63 65 were abhorrent
14677086776663181681 Verb 80 81 take
14677086776663181681 Verb 88 89 observing
14677086776663181681 Verb 94 96 has seen
14677086776663181681 Verb 95 96 seen
14677086776663181681 Verb 103 105 have placed
14677086776663181681 Verb 104 105 placed
14677086776663181681 Verb 114 115 spoke
14677086776663181681 Verb 120 121 save
14677086776663181681 Verb 130 132 were admirable
14677086776663181681 Verb 140 141 drawing
14677086776663181681 Verb 153 154 trained
14677086776663181681 Verb 157 158 admit
14677086776663181681 Verb 167 168 adjusted
14677086776663181681 Verb 171 172 introduce
14677086776663181681 Verb 173 174 distracting
14677086776663181681 Verb 178 179 throw
14677086776663181681 Verb 228 229 was

The code finds some of the verb phrases in the text. Sometimes, it finds a partial match that is part of another match. Weeding out these partial matches is left as an exercise.

You're reading from Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Table of Contents (13) Chapters

Finding patterns in text using grammatical information

Getting ready

How to do it…

See also

Authors (2)

Personalised recommendations for you