Python Natural Language Processing Cookbook

Playing with Grammar

Grammar is one of the main building blocks of language. Each human language, and programming language for that matter, has a set of rules that every person speaking it must follow, otherwise risking not being understood. These grammatical rules can be uncovered using NLP and are useful for extracting data from sentences. For example, using information about the grammatical structure of text, we can parse out subjects, objects, and relations between different entities.

In this chapter, you will learn how to use different packages to reveal the grammatical structure of words and sentences, as well as extract certain parts of sentences. These are the topics covered in this chapter:

Counting nouns – plural and singular nouns
Getting the dependency parse
Extracting noun chunks
Extracting the subjects and objects of the sentence
Finding patterns in text using grammatical information

Counting nouns – plural and singular nouns

In this recipe, we will do two things: determine whether a noun is plural or singular and turn plural nouns into singular, and vice versa.

You might need these two things for a variety of tasks. For example, you might want to count the word statistics, and for that, you most likely need to count the singular and plural nouns together. In order to count the plural nouns together with singular ones, you need a way to recognize that a word is plural or singular.

Getting ready

To determine whether a noun is singular or plural, we will use spaCy via two different methods: by looking at the difference between the lemma and the actual word and by looking at the morph attribute. To inflect these nouns, or turn singular nouns into plural or vice versa we will use the textblob package. We will also see how to determine the noun’s number using GPT-3 through the OpenAI API. The code for this section is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/tree/main/Chapter02.

How to do it…

We will first use spaCy’s lemma information to infer whether a noun is singular or plural. Then, we will use the morph attribute of Token objects. We will then create a function that uses one of those methods. Finally, we will use GPT-3.5 to find out the number of nouns:

Run the code in the file and language utility notebooks. If you run into an error saying that the small or large models do not exist, you need to open the lang_utils.ipynb file, uncomment, and run the statement that downloads the model:
```
%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"
```
Initialize the text variable and process it using the spaCy small model to get the resulting Doc object:
```
text = "I have five birds"
doc = small_model(text)
```
In this step, we loop through the Doc object. For each token in the object, we check whether it’s a noun and whether the lemma is the same as the word itself. Since the lemma is the basic form of the word, if the lemma is different from the word, that token is plural:
```
for token in doc:
    if (token.pos_ == "NOUN" and token.lemma_ != token.text):
        print(token.text, "plural")
```
The result should be as follows:
```
birds plural
```
Now, we will check the number of a noun using a different method: the morph features of a Token object. The morph features are the morphological features of a word, such as number, case, and so on. Since we know that token 3 is a noun, we directly access the morph features and get the Number to get the same result as previously:
```
doc = small_model("I have five birds.")
print(doc[3].morph.get("Number"))
```
Here is the result:
```
['Plur']
```
In this step, we prepare to define a function that returns a tuple, (noun, number). In order to better encode the noun number, we use an Enum class that assigns numbers to different values. We assign 1 to singular and 2 to plural. Once we create the class, we can directly refer to the noun number variables as Noun_number.SINGULAR and Noun_number.PLURAL:
```
class Noun_number(Enum):
    SINGULAR = 1
    PLURAL = 2
```

In this step, we define the function. It takes as input the text, the spaCy model, and the method of determining the noun number. The two methods are lemma and morph, the same two methods we used in steps 3 and 4, respectively. The function outputs a list of tuples, each of the format (<noun text>, <noun number>), where the noun number is expressed using the Noun_number class defined in step 5:

def get_nouns_number(text, model, method="lemma"):
    nouns = []
    doc = model(text)
    for token in doc:
        if (token.pos_ == "NOUN"):
            if method == "lemma":
                if token.lemma_ != token.text:
                    nouns.append((token.text, 
                        Noun_number.PLURAL))
                else:
                    nouns.append((token.text,
                        Noun_number.SINGULAR))
            elif method == "morph":
                if token.morph.get("Number") == "Sing":
                    nouns.append((token.text,
                        Noun_number.PLURAL))
                else:
                    nouns.append((token.text,
                        Noun_number.SINGULAR))
    return nouns

We can use the preceding function and see its performance with different spaCy models. In this step, we use the small spaCy model with the function we just defined. Using both methods, we see that the spaCy model gets the number of the irregular noun geese incorrectly:

text = "Three geese crossed the road"
nouns = get_nouns_number(text, small_model, "morph")
print(nouns)
nouns = get_nouns_number(text, small_model)
print(nouns)

The result should be as follows:

[('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
[('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]

Now, let’s do the same using the large model. If you have not yet downloaded the large model, do so by running the first line. Otherwise, you can comment it out. Here, we see that although the morph method still incorrectly assigns singular to geese, the lemma method provides the correct answer:

!python -m spacy download en_core_web_lg
large_model = spacy.load("en_core_web_lg")
nouns = get_nouns_number(text, large_model, "morph")
print(nouns)
nouns = get_nouns_number(text, large_model)
print(nouns)

The result should be as follows:

[('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
[('geese', <Noun_number.PLURAL: 2>), ('road', <Noun_number.SINGULAR: 1>)]

Let’s now use GPT-3.5 to get the noun number. In the results, we see that GPT-3.5 gives us an identical result and correctly identifies both the number for geese and the number for road:

from openai import OpenAI
client = OpenAI(api_key=OPEN_AI_KEY)
prompt="""Decide whether each noun in the following text is singular or plural.
Return the list in the format of a python tuple: (word, number). Do not provide any additional explanations.
Sentence: Three geese crossed the road."""
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    temperature=0,
    max_tokens=256,
    top_p=1.0,
    frequency_penalty=0,
    presence_penalty=0,
    messages=[
        {"role": "system", "content": "You are a helpful 
            assistant."},
        {"role": "user", "content": prompt}
    ],
)
print(response.choices[0].message.content)

The result should be as follows:

('geese', 'plural')
('road', 'singular')

There’s more…

We can also change the nouns from plural to singular, and vice versa. We will use the textblob package for that. The package should be installed automatically via the Poetry environment:

Import the TextBlob class from the package:
```
from textblob import TextBlob
```
Initialize a list of text variables and process them using the TextBlob class via a list comprehension:
```
texts = ["book", "goose", "pen", "point", "deer"]
blob_objs = [TextBlob(text) for text in texts]
```
Use the pluralize function of the object to get the plural. This function returns a list and we access its first element. Print the result:
```
plurals = [blob_obj.words.pluralize()[0] 
    for blob_obj in blob_objs]
print(plurals)
```
The result should be as follows:
```
['books', 'geese', 'pens', 'points', 'deer']
```
Now, we will do the reverse. We use the preceding plurals list to turn the plural nouns into TextBlob objects:
```
blob_objs = [TextBlob(text) for text in plurals]
```

Turn the nouns into singular using the singularize function and print:

singulars = [blob_obj.words.singularize()[0] 
    for blob_obj in blob_objs]
print(singulars)

The result should be the same as the list we started with in step 2:

['book', 'goose', 'pen', 'point', 'deer']

Getting the dependency parse

A dependency parse is a tool that shows dependencies in a sentence. For example, in the sentence The cat wore a hat, the root of the sentence is the verb, wore, and both the subject, the cat, and the object, a hat, are dependents. The dependency parse can be very useful in many NLP tasks since it shows the grammatical structure of the sentence, with the subject, the main verb, the object, and so on. It can then be used in downstream processing.

The spaCy NLP engine does the dependency parse as part of its overall analysis. The dependency parse tags explain the role of each word in the sentence. ROOT is the main word that all other words depend on, usually the verb.

Getting ready

We will use spaCy to create the dependency parse. The required packages are part of the Poetry environment.

How to do it…

We will take a few sentences from the sherlock_holmes1.txt file to illustrate the dependency parse. The steps are as follows:

Run the file and language utility notebooks:

%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"

Define the sentence we will be parsing:

sentence = 'I have seldom heard him mention her under any other name.'

Define a function that will print the word, its grammatical function embedded in the dep_ attribute, and the explanation of that attribute. The dep_ attribute of the Token object shows the grammatical function of the word in the sentence:
```
def print_dependencies(sentence, model):
    doc = model(sentence)
    for token in doc:
        print(token.text, "\t", token.dep_, "\t", 
            spacy.explain(token.dep_))
```

Now, let’s use this function on the first sentence in our list. We can see that the verb heard is the ROOT word of the sentence, with all other words depending on it:

print_dependencies(sentence, small_model)

The result should be as follows:

I    nsubj    nominal subject
have    aux    auxiliary
seldom    advmod    adverbial modifier
heard    ROOT    root
him    nsubj    nominal subject
mention    ccomp    clausal complement
her    dobj    direct object
under    prep    prepositional modifier
any    det    determiner
other    amod    adjectival modifier
name    pobj    object of preposition
.    punct    punctuation

To explore the dependency parse structure, we can use the attributes of the Token class. Using the ancestors and children attributes, we can get the tokens that this token depends on and the tokens that depend on it, respectively. The function to print the ancestors is as follows:
```
def print_ancestors(sentence, model):
    doc = model(sentence)
    for token in doc:
        print(token.text, [t.text for t in token.ancestors])
```
Now, let’s use this function on the first sentence in our list:
```
print_ancestors(sentence, small_model)
```
The output will be as follows. In the result, we see that heard has no ancestors since it is the main word in the sentence. All other words depend on it, and in fact, contain heard in their ancestor lists.
The dependency chain can be seen by following the ancestor links for each word. For example, if we look at the word name, we see that its ancestors are under, mention, and heard. The immediate parent of name is under, the parent of under is mention, and the parent of mention is heard. A dependency chain will always lead to the root, or the main word, of the sentence:
```
I ['heard']
have ['heard']
seldom ['heard']
heard []
him ['mention', 'heard']
mention ['heard']
her ['mention', 'heard']
under ['mention', 'heard']
any ['name', 'under', 'mention', 'heard']
other ['name', 'under', 'mention', 'heard']
name ['under', 'mention', 'heard']
. ['heard']
```

To see all the children, use the following function. This function prints out each word and the words that depend on it, its children:

def print_children(sentence, model):
    doc = model(sentence)
    for token in doc:
        print(token.text,[t.text for t in token.children])

Now, let’s use this function on the first sentence in our list:

print_children(sentence, small_model)

The result should be as follows. Now, the word heard has a list of words that depend on it since it is the main word in the sentence:

I []
have []
seldom []
heard ['I', 'have', 'seldom', 'mention', '.']
him []
mention ['him', 'her', 'under']
her []
under ['name']
any []
other []
name ['any', 'other']
. []

We can also see left and right children in separate lists. In the following function, we print the children as two separate lists, left and right. This can be useful when doing grammatical transformations in the sentence:

def print_lefts_and_rights(sentence, model):
    doc = model(sentence)
    for token in doc:
        print(token.text,
            [t.text for t in token.lefts],
            [t.text for t in token.rights])

Let’s use this function on the first sentence in our list:

print_lefts_and_rights(sentence, small_model)

The result should be as follows:

I [] []
have [] []
seldom [] []
heard ['I', 'have', 'seldom'] ['mention', '.']
him [] []
mention ['him'] ['her', 'under']
her [] []
under [] ['name']
any [] []
other [] []
name ['any', 'other'] []
. [] []

We can also see the subtree that the token is in by using this function:

def print_subtree(sentence, model):
    doc = model(sentence)
    for token in doc:
        print(token.text, [t.text for t in token.subtree])

Let’s use this function on the first sentence in our list:

print_subtree(sentence, small_model)

The result should be as follows. From the subtrees that each word is part of, we can see the grammatical phrases that appear in the sentence, such as the noun phrase, any other name, and the prepositional phrase, under any other name:

I ['I']
have ['have']
seldom ['seldom']
heard ['I', 'have', 'seldom', 'heard', 'him', 'mention', 'her', 'under', 'any', 'other', 'name', '.']
him ['him']
mention ['him', 'mention', 'her', 'under', 'any', 'other', 'name']
her ['her']
under ['under', 'any', 'other', 'name']
any ['any']
other ['other']
name ['any', 'other', 'name']
. ['.']

Extracting noun chunks

Noun chunks are known in linguistics as noun phrases. They represent nouns and any words that depend on and accompany nouns. For example, in the sentence The big red apple fell on the scared cat, the noun chunks are the big red apple and the scared cat. Extracting these noun chunks is instrumental to many other downstream NLP tasks, such as named entity recognition and processing entities and relations between them. In this recipe, we will explore how to extract named entities from a text.

Getting ready

We will use the spaCy package, which has a function for extracting noun chunks, and the text from the sherlock_holmes_1.txt file as an example.

How to do it…

Use the following steps to get the noun chunks from a text:

Run the file and language utility notebooks:

%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"

Define the function that will print out the noun chunks. The noun chunks are contained in the doc.noun_chunks class variable:

def print_noun_chunks(text, model):
    doc = model(text)
    for noun_chunk in doc.noun_chunks:
        print(noun_chunk.text)

Read the text from the sherlock_holmes_1.txt file and use the function on the resulting text:
```
sherlock_holmes_part_of_text = read_text_file("../data/sherlock_holmes_1.txt")
print_noun_chunks(sherlock_holmes_part_of_text, small_model)
```
This is the partial result. See the output of the notebook at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter02/noun_chunks_2.3.ipynb for the full printout. The function gets the pronouns, nouns, and noun phrases that are in the text correctly:
```
Sherlock Holmes
she
the_ woman
I
him
her
any other name
his eyes
she
the whole
…
```

There’s more…

Noun chunks are spaCy Span objects and have all their properties. See the official documentation at https://spacy.io/api/token.

Let’s explore some properties of noun chunks:

We will define a function that will print out the different properties of noun chunks. It will print the text of the noun chunk, its start and end indices within the Doc object, the sentence it belongs to (useful when there is more than one sentence), the root of the noun chunk (its main word), and the chunk’s similarity to the word emotions. Finally, it will print out the similarity of the whole input sentence to emotions:

def explore_properties(sentence, model):
    doc = model(sentence)
    other_span = "emotions"
    other_doc = model(other_span)
    for noun_chunk in doc.noun_chunks:
        print(noun_chunk.text)
        print("Noun chunk start and end", "\t",
            noun_chunk.start, "\t", noun_chunk.end)
        print("Noun chunk sentence:", noun_chunk.sent)
        print("Noun chunk root:", noun_chunk.root.text)
        print(f"Noun chunk similarity to '{other_span}'",
            noun_chunk.similarity(other_doc))
    print(f"Similarity of the sentence '{sentence}' to 
        '{other_span}':",
        doc.similarity(other_doc))

Set the sentence to All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind:
```
sentence = "All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind."
```

Use the explore_properties function on the sentence using the small model:

explore_properties(sentence, small_model)

This is the result:

All emotions
Noun chunk start and end    0    2
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: emotions
Noun chunk similarity to 'emotions' 0.4026421588260174
his cold, precise but admirably balanced mind
Noun chunk start and end    11    19
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: mind
Noun chunk similarity to 'emotions' -0.036891259527462
Similarity of the sentence 'All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.' to 'emotions': 0.03174900767577446

You will also see a warning message similar to this one due to the fact that the small model does not ship with word vectors of its own:

/tmp/ipykernel_1807/2430050149.py:10: UserWarning: [W007] The model you're using has no word vectors loaded, so the result of the Span.similarity method will be based on the tagger, parser and NER, which may not give useful similarity judgements. This may happen if you're using one of the small models, e.g. `en_core_web_sm`, which don't ship with word vectors and only use context-sensitive tensors. You can always add your own word vectors, or use one of the larger models instead if available.
  print(f"Noun chunk similarity to '{other_span}'", noun_chunk.similarity(other_doc))

Now, let’s apply the same function to the same sentence with the large model:

sentence = "All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind."
explore_properties(sentence, large_model)

The large model does come with its own word vectors and does not result in a warning:

All emotions
Noun chunk start and end    0    2
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: emotions
Noun chunk similarity to 'emotions' 0.6302678068015664
his cold, precise but admirably balanced mind
Noun chunk start and end    11    19
Noun chunk sentence: All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
Noun chunk root: mind
Noun chunk similarity to 'emotions' 0.5744456705692561
Similarity of the sentence 'All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.' to 'emotions': 0.640366414527618

We see that the similarity of the All emotions noun chunk is high in relation to the word emotions, as compared to the similarity of the his cold, precise but admirably balanced mind noun chunk.

Important note

A larger spaCy model, such as en_core_web_lg, takes up more space but is more precise.

Extracting subjects and objects of the sentence

Sometimes, we might need to find the subject and direct objects of the sentence, and that is easily accomplished with the spaCy package.

Getting ready

We will be using the dependency tags from spaCy to find subjects and objects. The code uses the spaCy engine to parse the sentence. Then, the subject function loops through the tokens, and if the dependency tag contains subj, it returns that token’s subtree, a Span object. There are different subject tags, including nsubj for regular subjects and nsubjpass for subjects of passive sentences, thus we want to look for both.

How to do it…

We will use the subtree attribute of tokens to find the complete noun chunk that is the subject or direct object of the verb (see the Getting the dependency parse recipe). We will define functions to find the subject, direct object, dative phrase, and prepositional phrases:

Run the file and language utility notebooks:

%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"

We will use two functions to find the subject and the direct object of the sentence. These functions will loop through the tokens and return the subtree that contains the token with subj or dobj in the dependency tag, respectively. Here is the subject function. It looks for the token that has a dependency tag that contains subj and then returns the subtree that contains that token. There are several subject dependency tags, including nsubj and nsubjpass (for the subject of a passive sentence), so we look for the most general pattern:
```
def get_subject_phrase(doc):
    for token in doc:
        if ("subj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return doc[start:end]
```

Here is the direct object function. It works similarly to get_subject_phrase but looks for the dobj dependency tag instead of a tag that contains subj. If the sentence does not have a direct object, it will return None:

def get_object_phrase(doc):
    for token in doc:
        if ("dobj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return doc[start:end]

Assign a list of sentences to a variable, loop through them, and use the preceding functions to print out their subjects and objects:

sentences = [
    "The big black cat stared at the small dog.",
    "Jane watched her brother in the evenings.",
    "Laura gave Sam a very interesting book."
]
for sentence in sentences:
    doc = small_model(sentence)
    subject_phrase = get_subject_phrase(doc)
    object_phrase = get_object_phrase(doc)
    print(sentence)
    print("\tSubject:", subject_phrase)
    print("\tDirect object:", object_phrase)

The result will be as follows. Since the first sentence does not have a direct object, None is printed out. For the sentence The big black cat stared at the small dog, the subject is the big black cat and there is no direct object (the small dog is the object of the preposition at). For the sentence Jane watched her brother in the evenings, the subject is Jane and the direct object is her brother. In the sentence Laura gave Sam a very interesting book, the subject is Laura and the direct object is a very interesting book:

The big black cat stared at the small dog.
  Subject: The big black cat
  Direct object: None
Jane watched her brother in the evenings.
  Subject: Jane
  Direct object: her brother
Laura gave Sam a very interesting book.
  Subject: Laura
  Direct object: a very interesting book

There’s more…

We can look for other objects, for example, the dative objects of verbs such as give and objects of prepositional phrases. The functions will look very similar, with the main difference being the dependency tags: dative for the dative object function, and pobj for the prepositional object function. The prepositional object function will return a list since there can be more than one prepositional phrase in a sentence:

The dative object function checks the tokens for the dative tag. It returns None if there are no dative objects:

def get_dative_phrase(doc):
    for token in doc:
        if ("dative" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return doc[start:end]

We can also combine the subject, object, and dative functions into one with an argument that specifies which object to look for:

def get_phrase(doc, phrase):
    # phrase is one of "subj", "obj", "dative"
    for token in doc:
        if (phrase in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return doc[start:end]

Let us now define a sentence with a dative object and run the function for all three types of phrases:

sentence = "Laura gave Sam a very interesting book."
doc = small_model(sentence)
subject_phrase = get_phrase(doc, "subj")
object_phrase = get_phrase(doc, "obj")
dative_phrase = get_phrase(doc, "dative")
print(sentence)
print("\tSubject:", subject_phrase)
print("\tDirect object:", object_phrase)
print("\tDative object:", dative_phrase)

The result will be as follows. The dative object is Sam:

Laura gave Sam a very interesting book.
  Subject: Laura
  Direct object: a very interesting book
  Dative object: Sam

Here is the prepositional object function. It returns a list of objects of prepositions, which will be empty if there are none:

def get_prepositional_phrase_objs(doc):
    prep_spans = []
    for token in doc:
        if ("pobj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            prep_spans.append(doc[start:end])
    return prep_spans

Let’s define a list of sentences and run the two functions on them:

sentences = [
    "The big black cat stared at the small dog.",
    "Jane watched her brother in the evenings."
]
for sentence in sentences:
    doc = small_model(sentence)
    subject_phrase = get_phrase(doc, "subj")
    object_phrase = get_phrase(doc, "obj")
    dative_phrase = get_phrase(doc, "dative")
    prepositional_phrase_objs = \
        get_prepositional_phrase_objs(doc)
    print(sentence)
    print("\tSubject:", subject_phrase)
    print("\tDirect object:", object_phrase)
    print("\tPrepositional phrases:", prepositional_phrase_objs)

The result will be as follows:

The big black cat stared at the small dog.
  Subject: The big black cat
  Direct object: the small dog
  Prepositional phrases: [the small dog]
Jane watched her brother in the evenings.
  Subject: Jane
  Direct object: her brother
  Prepositional phrases: [the evenings]

There is one prepositional phrase in each sentence. In the sentence The big black cat stared at the small dog, it is at the small dog, and in the sentence Jane watched her brother in the evenings, it is in the evenings.

It is left as an exercise for you to find the actual prepositional phrases with prepositions intact instead of just the noun phrases that are dependent on these prepositions.

Finding patterns in text using grammatical information

In this section, we will use the spaCy Matcher object to find patterns in the text. We will use the grammatical properties of the words to create these patterns. For example, we might be looking for verb phrases instead of noun phrases. We can specify grammatical patterns to match verb phrases.

Getting ready

We will be using the spaCy Matcher object to specify and find patterns. It can match different properties, not just grammatical. You can find out more in the documentation at https://spacy.io/usage/rule-based-matching/.

How to do it…

Your steps should be formatted like so:

Run the file and language utility notebooks:

%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"

Import the Matcher object and initialize it. We need to put in the vocabulary object, which is the same as the vocabulary of the model we will be using to process the text:
```
from spacy.matcher import Matcher
matcher = Matcher(small_model.vocab)
```
Create a list of patterns and add them to the matcher. Each pattern is a list of dictionaries, where each dictionary describes a token. In our patterns, we only specify the part of speech for each token. We then add these patterns to the Matcher object. The patterns we will be using are a verb by itself (for example, paints), an auxiliary followed by a verb (for example, was observing), an auxiliary followed by an adjective (for example, were late), and an auxiliary followed by a verb and a preposition (for example, were staring at). This is not an exhaustive list; feel free to come up with other examples:
```
patterns = [
    [{"POS": "VERB"}],
    [{"POS": "AUX"}, {"POS": "VERB"}],
    [{"POS": "AUX"}, {"POS": "ADJ"}],
    [{"POS": "AUX"}, {"POS": "VERB"}, {"POS": "ADP"}]
]
matcher.add("Verb", patterns)
```

Read in the small part of the Sherlock Holmes text and process it using the small model:

sherlock_holmes_part_of_text = read_text_file("../data/sherlock_holmes_1.txt")
doc = small_model(sherlock_holmes_part_of_text)

Now, we find the matches using the Matcher object and the processed text. We then loop through the matches and print out the match ID, the string ID (the identifier of the pattern), the start and end of the match, and the text of the match:

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = small_model.vocab.strings[match_id]
    span = doc[start:end]
    print(match_id, string_id, start, end, span.text)

The result will be as follows:

14677086776663181681 Verb 14 15 heard
14677086776663181681 Verb 17 18 mention
14677086776663181681 Verb 28 29 eclipses
14677086776663181681 Verb 31 32 predominates
14677086776663181681 Verb 43 44 felt
14677086776663181681 Verb 49 50 love
14677086776663181681 Verb 63 65 were abhorrent
14677086776663181681 Verb 80 81 take
14677086776663181681 Verb 88 89 observing
14677086776663181681 Verb 94 96 has seen
14677086776663181681 Verb 95 96 seen
14677086776663181681 Verb 103 105 have placed
14677086776663181681 Verb 104 105 placed
14677086776663181681 Verb 114 115 spoke
14677086776663181681 Verb 120 121 save
14677086776663181681 Verb 130 132 were admirable
14677086776663181681 Verb 140 141 drawing
14677086776663181681 Verb 153 154 trained
14677086776663181681 Verb 157 158 admit
14677086776663181681 Verb 167 168 adjusted
14677086776663181681 Verb 171 172 introduce
14677086776663181681 Verb 173 174 distracting
14677086776663181681 Verb 178 179 throw
14677086776663181681 Verb 228 229 was

The code finds some of the verb phrases in the text. Sometimes, it finds a partial match that is part of another match. Weeding out these partial matches is left as an exercise.