You're reading from Mastering NLP from Foundations to LLMs Apply advanced rule-based techniques to LLMs and solve real-world business problems using Python

Product type Paperback

Published in Apr 2024

Publisher Packt

ISBN-13 9781804619186

Length 340 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Authors (2):

Meysam Ghaffari

Lior Gazit

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: Navigating the NLP Landscape: A Comprehensive Introduction

2. Chapter 2: Mastering Linear Algebra, Probability, and Statistics for Machine Learning and NLP FREE CHAPTER

3. Chapter 3: Unleashing Machine Learning Potentials in Natural Language Processing

4. Chapter 4: Streamlining Text Preprocessing Techniques for Optimal NLP Performance

5. Chapter 5: Empowering Text Classification: Leveraging Traditional Machine Learning Techniques

6. Chapter 6: Text Classification Reimagined: Delving Deep into Deep Learning Language Models

7. Chapter 7: Demystifying Large Language Models: Theory, Design, and Langchain Implementation

8. Chapter 8: Accessing the Power of Large Language Models: Advanced Setup and Integration with RAG

9. Chapter 9: Exploring the Frontiers: Advanced Applications and Innovations Driven by LLMs

10. Chapter 10: Riding the Wave: Analyzing Past, Present, and Future Trends Shaped by LLMs and AI

11. Chapter 11: Exclusive Industry Insights: Perspectives and Predictions from World Class Experts

12. Index

Why subscribe?

13. Other Books You May Enjoy

Splitting data

When developing a machine learning model, it’s important to split the data into training, validation, and test sets; this is called data splitting. This is done to evaluate the performance of the model on new, unseen data and to prevent overfitting.

The most common method for splitting the data is the train-test split, which splits the data into two sets: the training set, which is used to train the model, and the test set, which is used to evaluate the performance of the model. The data is randomly divided into two sets, with a typical split being 80% of the data for training and 20% for testing. Using this approach the model will be trained using the majority of the data (training data) and then tested on the remaining data (test set). Using this approach, we can ensure that the model’s performance is based on new, unseen data.

Most of the time in machine learning model development, we have a set of hyperparameters for our model that we like to...