You're reading from Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Product type Paperback

Published in Sep 2024

Publisher Packt

ISBN-13 9781803245744

Length 312 pages

Edition 2nd Edition

Languages

Processing

Tools

Processing

Concepts

GPT/LLMs

Authors (2):

Saurabh Chakravarty

Zhenya Antić

View More author details

Table of Contents (13) Chapters

Preface

1. Chapter 1: Learning NLP Basics

2. Chapter 2: Playing with Grammar FREE CHAPTER

3. Chapter 3: Representing Text – Capturing Semantics

4. Chapter 4: Classifying Texts

5. Chapter 5: Getting Started with Information Extraction

6. Chapter 6: Topic Modeling

7. Chapter 7: Visualizing Text Data

8. Chapter 8: Transformers and Their Applications

9. Chapter 9: Natural Language Understanding

10. Chapter 10: Generative AI and Large Language Models

11. Index

Why subscribe?

12. Other Books You May Enjoy

Counting nouns – plural and singular nouns

In this recipe, we will do two things: determine whether a noun is plural or singular and turn plural nouns into singular, and vice versa.

You might need these two things for a variety of tasks. For example, you might want to count the word statistics, and for that, you most likely need to count the singular and plural nouns together. In order to count the plural nouns together with singular ones, you need a way to recognize that a word is plural or singular.

Getting ready

To determine whether a noun is singular or plural, we will use spaCy via two different methods: by looking at the difference between the lemma and the actual word and by looking at the morph attribute. To inflect these nouns, or turn singular nouns into plural or vice versa we will use the textblob package. We will also see how to determine the noun’s number using GPT-3 through the OpenAI API. The code for this section is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/tree/main/Chapter02.

How to do it…

We will first use spaCy’s lemma information to infer whether a noun is singular or plural. Then, we will use the morph attribute of Token objects. We will then create a function that uses one of those methods. Finally, we will use GPT-3.5 to find out the number of nouns:

Run the code in the file and language utility notebooks. If you run into an error saying that the small or large models do not exist, you need to open the lang_utils.ipynb file, uncomment, and run the statement that downloads the model:
```
%run -i "../util/file_utils.ipynb"
%run -i "../util/lang_utils.ipynb"
```
Initialize the text variable and process it using the spaCy small model to get the resulting Doc object:
```
text = "I have five birds"
doc = small_model(text)
```
In this step, we loop through the Doc object. For each token in the object, we check whether it’s a noun and whether the lemma is the same as the word itself. Since the lemma is the basic form of the word, if the lemma is different from the word, that token is plural:
```
for token in doc:
    if (token.pos_ == "NOUN" and token.lemma_ != token.text):
        print(token.text, "plural")
```
The result should be as follows:
```
birds plural
```
Now, we will check the number of a noun using a different method: the morph features of a Token object. The morph features are the morphological features of a word, such as number, case, and so on. Since we know that token 3 is a noun, we directly access the morph features and get the Number to get the same result as previously:
```
doc = small_model("I have five birds.")
print(doc[3].morph.get("Number"))
```
Here is the result:
```
['Plur']
```
In this step, we prepare to define a function that returns a tuple, (noun, number). In order to better encode the noun number, we use an Enum class that assigns numbers to different values. We assign 1 to singular and 2 to plural. Once we create the class, we can directly refer to the noun number variables as Noun_number.SINGULAR and Noun_number.PLURAL:
```
class Noun_number(Enum):
    SINGULAR = 1
    PLURAL = 2
```

In this step, we define the function. It takes as input the text, the spaCy model, and the method of determining the noun number. The two methods are lemma and morph, the same two methods we used in steps 3 and 4, respectively. The function outputs a list of tuples, each of the format (<noun text>, <noun number>), where the noun number is expressed using the Noun_number class defined in step 5:

def get_nouns_number(text, model, method="lemma"):
    nouns = []
    doc = model(text)
    for token in doc:
        if (token.pos_ == "NOUN"):
            if method == "lemma":
                if token.lemma_ != token.text:
                    nouns.append((token.text, 
                        Noun_number.PLURAL))
                else:
                    nouns.append((token.text,
                        Noun_number.SINGULAR))
            elif method == "morph":
                if token.morph.get("Number") == "Sing":
                    nouns.append((token.text,
                        Noun_number.PLURAL))
                else:
                    nouns.append((token.text,
                        Noun_number.SINGULAR))
    return nouns

We can use the preceding function and see its performance with different spaCy models. In this step, we use the small spaCy model with the function we just defined. Using both methods, we see that the spaCy model gets the number of the irregular noun geese incorrectly:

text = "Three geese crossed the road"
nouns = get_nouns_number(text, small_model, "morph")
print(nouns)
nouns = get_nouns_number(text, small_model)
print(nouns)

The result should be as follows:

[('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
[('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]

Now, let’s do the same using the large model. If you have not yet downloaded the large model, do so by running the first line. Otherwise, you can comment it out. Here, we see that although the morph method still incorrectly assigns singular to geese, the lemma method provides the correct answer:

!python -m spacy download en_core_web_lg
large_model = spacy.load("en_core_web_lg")
nouns = get_nouns_number(text, large_model, "morph")
print(nouns)
nouns = get_nouns_number(text, large_model)
print(nouns)

The result should be as follows:

[('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
[('geese', <Noun_number.PLURAL: 2>), ('road', <Noun_number.SINGULAR: 1>)]

Let’s now use GPT-3.5 to get the noun number. In the results, we see that GPT-3.5 gives us an identical result and correctly identifies both the number for geese and the number for road:

from openai import OpenAI
client = OpenAI(api_key=OPEN_AI_KEY)
prompt="""Decide whether each noun in the following text is singular or plural.
Return the list in the format of a python tuple: (word, number). Do not provide any additional explanations.
Sentence: Three geese crossed the road."""
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    temperature=0,
    max_tokens=256,
    top_p=1.0,
    frequency_penalty=0,
    presence_penalty=0,
    messages=[
        {"role": "system", "content": "You are a helpful 
            assistant."},
        {"role": "user", "content": prompt}
    ],
)
print(response.choices[0].message.content)

The result should be as follows:

('geese', 'plural')
('road', 'singular')

There’s more…

We can also change the nouns from plural to singular, and vice versa. We will use the textblob package for that. The package should be installed automatically via the Poetry environment:

Import the TextBlob class from the package:
```
from textblob import TextBlob
```
Initialize a list of text variables and process them using the TextBlob class via a list comprehension:
```
texts = ["book", "goose", "pen", "point", "deer"]
blob_objs = [TextBlob(text) for text in texts]
```
Use the pluralize function of the object to get the plural. This function returns a list and we access its first element. Print the result:
```
plurals = [blob_obj.words.pluralize()[0] 
    for blob_obj in blob_objs]
print(plurals)
```
The result should be as follows:
```
['books', 'geese', 'pens', 'points', 'deer']
```
Now, we will do the reverse. We use the preceding plurals list to turn the plural nouns into TextBlob objects:
```
blob_objs = [TextBlob(text) for text in plurals]
```

Turn the nouns into singular using the singularize function and print:

singulars = [blob_obj.words.singularize()[0] 
    for blob_obj in blob_objs]
print(singulars)

The result should be the same as the list we started with in step 2:

['book', 'goose', 'pen', 'point', 'deer']

You're reading from Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Table of Contents (13) Chapters Close

Counting nouns – plural and singular nouns

Getting ready

How to do it…

There’s more…

Authors (2)

Personalised recommendations for you

Table of Contents (13) Chapters