Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Python Natural Language Processing Cookbook

You're reading from   Python Natural Language Processing Cookbook Over 60 recipes for building powerful NLP solutions using Python and LLM libraries

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781803245744
Length 312 pages
Edition 2nd Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Saurabh Chakravarty Saurabh Chakravarty
Author Profile Icon Saurabh Chakravarty
Saurabh Chakravarty
Zhenya Antić Zhenya Antić
Author Profile Icon Zhenya Antić
Zhenya Antić
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Chapter 1: Learning NLP Basics 2. Chapter 2: Playing with Grammar FREE CHAPTER 3. Chapter 3: Representing Text – Capturing Semantics 4. Chapter 4: Classifying Texts 5. Chapter 5: Getting Started with Information Extraction 6. Chapter 6: Topic Modeling 7. Chapter 7: Visualizing Text Data 8. Chapter 8: Transformers and Their Applications 9. Chapter 9: Natural Language Understanding 10. Chapter 10: Generative AI and Large Language Models 11. Index 12. Other Books You May Enjoy

Counting nouns – plural and singular nouns

In this recipe, we will do two things: determine whether a noun is plural or singular and turn plural nouns into singular, and vice versa.

You might need these two things for a variety of tasks. For example, you might want to count the word statistics, and for that, you most likely need to count the singular and plural nouns together. In order to count the plural nouns together with singular ones, you need a way to recognize that a word is plural or singular.

Getting ready

To determine whether a noun is singular or plural, we will use spaCy via two different methods: by looking at the difference between the lemma and the actual word and by looking at the morph attribute. To inflect these nouns, or turn singular nouns into plural or vice versa we will use the textblob package. We will also see how to determine the noun’s number using GPT-3 through the OpenAI API. The code for this section is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/tree/main/Chapter02.

How to do it…

We will first use spaCy’s lemma information to infer whether a noun is singular or plural. Then, we will use the morph attribute of Token objects. We will then create a function that uses one of those methods. Finally, we will use GPT-3.5 to find out the number of nouns:

  1. Run the code in the file and language utility notebooks. If you run into an error saying that the small or large models do not exist, you need to open the lang_utils.ipynb file, uncomment, and run the statement that downloads the model:
    %run -i "../util/file_utils.ipynb"
    %run -i "../util/lang_utils.ipynb"
  2. Initialize the text variable and process it using the spaCy small model to get the resulting Doc object:
    text = "I have five birds"
    doc = small_model(text)
  3. In this step, we loop through the Doc object. For each token in the object, we check whether it’s a noun and whether the lemma is the same as the word itself. Since the lemma is the basic form of the word, if the lemma is different from the word, that token is plural:
    for token in doc:
        if (token.pos_ == "NOUN" and token.lemma_ != token.text):
            print(token.text, "plural")

    The result should be as follows:

    birds plural
  4. Now, we will check the number of a noun using a different method: the morph features of a Token object. The morph features are the morphological features of a word, such as number, case, and so on. Since we know that token 3 is a noun, we directly access the morph features and get the Number to get the same result as previously:
    doc = small_model("I have five birds.")
    print(doc[3].morph.get("Number"))

    Here is the result:

    ['Plur']
  5. In this step, we prepare to define a function that returns a tuple, (noun, number). In order to better encode the noun number, we use an Enum class that assigns numbers to different values. We assign 1 to singular and 2 to plural. Once we create the class, we can directly refer to the noun number variables as Noun_number.SINGULAR and Noun_number.PLURAL:
    class Noun_number(Enum):
        SINGULAR = 1
        PLURAL = 2
  6. In this step, we define the function. It takes as input the text, the spaCy model, and the method of determining the noun number. The two methods are lemma and morph, the same two methods we used in steps 3 and 4, respectively. The function outputs a list of tuples, each of the format (<noun text>, <noun number>), where the noun number is expressed using the Noun_number class defined in step 5:
    def get_nouns_number(text, model, method="lemma"):
        nouns = []
        doc = model(text)
        for token in doc:
            if (token.pos_ == "NOUN"):
                if method == "lemma":
                    if token.lemma_ != token.text:
                        nouns.append((token.text, 
                            Noun_number.PLURAL))
                    else:
                        nouns.append((token.text,
                            Noun_number.SINGULAR))
                elif method == "morph":
                    if token.morph.get("Number") == "Sing":
                        nouns.append((token.text,
                            Noun_number.PLURAL))
                    else:
                        nouns.append((token.text,
                            Noun_number.SINGULAR))
        return nouns
  7. We can use the preceding function and see its performance with different spaCy models. In this step, we use the small spaCy model with the function we just defined. Using both methods, we see that the spaCy model gets the number of the irregular noun geese incorrectly:
    text = "Three geese crossed the road"
    nouns = get_nouns_number(text, small_model, "morph")
    print(nouns)
    nouns = get_nouns_number(text, small_model)
    print(nouns)

    The result should be as follows:

    [('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
    [('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
  8. Now, let’s do the same using the large model. If you have not yet downloaded the large model, do so by running the first line. Otherwise, you can comment it out. Here, we see that although the morph method still incorrectly assigns singular to geese, the lemma method provides the correct answer:
    !python -m spacy download en_core_web_lg
    large_model = spacy.load("en_core_web_lg")
    nouns = get_nouns_number(text, large_model, "morph")
    print(nouns)
    nouns = get_nouns_number(text, large_model)
    print(nouns)

    The result should be as follows:

    [('geese', <Noun_number.SINGULAR: 1>), ('road', <Noun_number.SINGULAR: 1>)]
    [('geese', <Noun_number.PLURAL: 2>), ('road', <Noun_number.SINGULAR: 1>)]
  9. Let’s now use GPT-3.5 to get the noun number. In the results, we see that GPT-3.5 gives us an identical result and correctly identifies both the number for geese and the number for road:
    from openai import OpenAI
    client = OpenAI(api_key=OPEN_AI_KEY)
    prompt="""Decide whether each noun in the following text is singular or plural.
    Return the list in the format of a python tuple: (word, number). Do not provide any additional explanations.
    Sentence: Three geese crossed the road."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        temperature=0,
        max_tokens=256,
        top_p=1.0,
        frequency_penalty=0,
        presence_penalty=0,
        messages=[
            {"role": "system", "content": "You are a helpful 
                assistant."},
            {"role": "user", "content": prompt}
        ],
    )
    print(response.choices[0].message.content)

    The result should be as follows:

    ('geese', 'plural')
    ('road', 'singular')

There’s more…

We can also change the nouns from plural to singular, and vice versa. We will use the textblob package for that. The package should be installed automatically via the Poetry environment:

  1. Import the TextBlob class from the package:
    from textblob import TextBlob
  2. Initialize a list of text variables and process them using the TextBlob class via a list comprehension:
    texts = ["book", "goose", "pen", "point", "deer"]
    blob_objs = [TextBlob(text) for text in texts]
  3. Use the pluralize function of the object to get the plural. This function returns a list and we access its first element. Print the result:
    plurals = [blob_obj.words.pluralize()[0] 
        for blob_obj in blob_objs]
    print(plurals)

    The result should be as follows:

    ['books', 'geese', 'pens', 'points', 'deer']
  4. Now, we will do the reverse. We use the preceding plurals list to turn the plural nouns into TextBlob objects:
    blob_objs = [TextBlob(text) for text in plurals]
  5. Turn the nouns into singular using the singularize function and print:
    singulars = [blob_obj.words.singularize()[0] 
        for blob_obj in blob_objs]
    print(singulars)

    The result should be the same as the list we started with in step 2:

    ['book', 'goose', 'pen', 'point', 'deer']
You have been reading a chapter from
Python Natural Language Processing Cookbook - Second Edition
Published in: Sep 2024
Publisher: Packt
ISBN-13: 9781803245744
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image