Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering NLP from Foundations to LLMs

You're reading from   Mastering NLP from Foundations to LLMs Apply advanced rule-based techniques to LLMs and solve real-world business problems using Python

Arrow left icon
Product type Paperback
Published in Apr 2024
Publisher Packt
ISBN-13 9781804619186
Length 340 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Meysam Ghaffari Meysam Ghaffari
Author Profile Icon Meysam Ghaffari
Meysam Ghaffari
Lior Gazit Lior Gazit
Author Profile Icon Lior Gazit
Lior Gazit
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Chapter 1: Navigating the NLP Landscape: A Comprehensive Introduction FREE CHAPTER 2. Chapter 2: Mastering Linear Algebra, Probability, and Statistics for Machine Learning and NLP 3. Chapter 3: Unleashing Machine Learning Potentials in Natural Language Processing 4. Chapter 4: Streamlining Text Preprocessing Techniques for Optimal NLP Performance 5. Chapter 5: Empowering Text Classification: Leveraging Traditional Machine Learning Techniques 6. Chapter 6: Text Classification Reimagined: Delving Deep into Deep Learning Language Models 7. Chapter 7: Demystifying Large Language Models: Theory, Design, and Langchain Implementation 8. Chapter 8: Accessing the Power of Large Language Models: Advanced Setup and Integration with RAG 9. Chapter 9: Exploring the Frontiers: Advanced Applications and Innovations Driven by LLMs 10. Chapter 10: Riding the Wave: Analyzing Past, Present, and Future Trends Shaped by LLMs and AI 11. Chapter 11: Exclusive Industry Insights: Perspectives and Predictions from World Class Experts 12. Index 13. Other Books You May Enjoy

Explaining the preprocessing pipeline

We will explain a complete preprocessing pipeline that has been provided by the authors to you, the reader.

As shown in the following code, the input is a formatted text with encoded tags, similar to what we can extract from HTML web pages:

"<SUBJECT LINE> Employees details<END><BODY TEXT>Attached are 2 files,\n1st one is pairoll, 2nd is healtcare!<END>"

Let’s take a look at the effect of applying each step to the text:

  1. Decode/remove encoding:

    Employees details. Attached are 2 files, 1st one is pairoll, 2nd is healtcare!

  2. Lowercasing:

    employees details. attached are 2 files, 1st one is pairoll, 2nd is healtcare!

  3. Digits to words:

    employees details. attached are two files, first one is pairoll, second is healtcare!

  4. Remove punctuation and other special characters:

    employees details attached are two files first one is pairoll second is healtcare

  5. Spelling corrections:

    employees details...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime