You're reading from Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781804610541

Length 398 pages

Edition 1st Edition

Languages

Python

Tools

Excel

Concepts

Machine Learning

Author (1):

Vijaya Kumar Suda

View More author details

Table of Contents (18) Chapters

Preface

1. Part 1: Labeling Tabular Data

2. Chapter 1: Exploring Data for Machine Learning FREE CHAPTER

3. Chapter 2: Labeling Data for Classification

4. Chapter 3: Labeling Data for Regression

5. Part 2: Labeling Image Data

6. Chapter 4: Exploring Image Data

7. Chapter 5: Labeling Image Data Using Rules

8. Chapter 6: Labeling Image Data Using Data Augmentation

9. Part 3: Labeling Text, Audio, and Video Data

10. Chapter 7: Labeling Text Data

11. Chapter 8: Exploring Video Data

12. Chapter 9: Labeling Video Data

13. Chapter 10: Exploring Audio Data

14. Chapter 11: Labeling Audio Data

15. Chapter 12: Hands-On Exploring Data Labeling Tools

16. Index

Why subscribe?

17. Other Books You May Enjoy

Summary

In this chapter, we have learned how to use Pandas and matplotlib to analyze a dataset and understand the data and correlations between various features. This understanding of data and patterns in the data is required to build the rules for labeling raw data before using it for training ML models and fine-tuning LLMs.

We also went through various examples for aggregating columns and categorical values using groupby and mean. Then, we created reusable functions so that those functions can be reused simply by calling and passing column names to get aggregates of one or more columns.

Finally, we saw a fast and easy exploration of data using the ydata-profiling library with simple one-line Python code. Using this library, we need not remember many Pandas functions. We can simply call one line of code to perform a detailed analysis of data. We can create detailed reports of statistics for each variable with missing values, correlations, interactions, and duplicate rows.

Once we get a good sense of our data using EDA, we will be able to build the rules for creating labels for the dataset.

In the next chapter, we will see how to build these rules using Python libraries such as snorkel and compose to label an unlabeled dataset. We will also explore other methods, such as pseudo-labeling and K-means clustering, for data labeling.

You're reading from Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Table of Contents (18) Chapters

Summary

Authors (1)

Personalised recommendations for you