You're reading from Mastering NLP from Foundations to LLMs Apply advanced rule-based techniques to LLMs and solve real-world business problems using Python

Product type Paperback

Published in Apr 2024

Publisher Packt

ISBN-13 9781804619186

Length 340 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Authors (2):

Meysam Ghaffari

Lior Gazit

Preface

1. Chapter 1: Navigating the NLP Landscape: A Comprehensive Introduction FREE CHAPTER

2. Chapter 2: Mastering Linear Algebra, Probability, and Statistics for Machine Learning and NLP

3. Chapter 3: Unleashing Machine Learning Potentials in Natural Language Processing

4. Chapter 4: Streamlining Text Preprocessing Techniques for Optimal NLP Performance

5. Chapter 5: Empowering Text Classification: Leveraging Traditional Machine Learning Techniques

6. Chapter 6: Text Classification Reimagined: Delving Deep into Deep Learning Language Models

7. Chapter 7: Demystifying Large Language Models: Theory, Design, and Langchain Implementation

8. Chapter 8: Accessing the Power of Large Language Models: Advanced Setup and Integration with RAG

9. Chapter 9: Exploring the Frontiers: Advanced Applications and Innovations Driven by LLMs

10. Chapter 10: Riding the Wave: Analyzing Past, Present, and Future Trends Shaped by LLMs and AI

11. Chapter 11: Exclusive Industry Insights: Perspectives and Predictions from World Class Experts

12. Index

13. Other Books You May Enjoy

Explaining the preprocessing pipeline

We will explain a complete preprocessing pipeline that has been provided by the authors to you, the reader.

As shown in the following code, the input is a formatted text with encoded tags, similar to what we can extract from HTML web pages:

"<SUBJECT LINE> Employees details<END><BODY TEXT>Attached are 2 files,\n1st one is pairoll, 2nd is healtcare!<END>"

Let’s take a look at the effect of applying each step to the text:

Decode/remove encoding:
Employees details. Attached are 2 files, 1st one is pairoll, 2nd is healtcare!
Lowercasing:
employees details. attached are 2 files, 1st one is pairoll, 2nd is healtcare!
Digits to words:
employees details. attached are two files, first one is pairoll, second is healtcare!
Remove punctuation and other special characters:
employees details attached are two files first one is pairoll second is healtcare
Spelling corrections:
employees details...

The rest of the chapter is locked