Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Labeling in Machine Learning with Python

You're reading from   Data Labeling in Machine Learning with Python Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Arrow left icon
Product type Paperback
Published in Jan 2024
Publisher Packt
ISBN-13 9781804610541
Length 398 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Vijaya Kumar Suda Vijaya Kumar Suda
Author Profile Icon Vijaya Kumar Suda
Vijaya Kumar Suda
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Part 1: Labeling Tabular Data
2. Chapter 1: Exploring Data for Machine Learning FREE CHAPTER 3. Chapter 2: Labeling Data for Classification 4. Chapter 3: Labeling Data for Regression 5. Part 2: Labeling Image Data
6. Chapter 4: Exploring Image Data 7. Chapter 5: Labeling Image Data Using Rules 8. Chapter 6: Labeling Image Data Using Data Augmentation 9. Part 3: Labeling Text, Audio, and Video Data
10. Chapter 7: Labeling Text Data 11. Chapter 8: Exploring Video Data 12. Chapter 9: Labeling Video Data 13. Chapter 10: Exploring Audio Data 14. Chapter 11: Labeling Audio Data 15. Chapter 12: Hands-On Exploring Data Labeling Tools 16. Index 17. Other Books You May Enjoy

Summary

In this chapter, we have learned how to use Pandas and matplotlib to analyze a dataset and understand the data and correlations between various features. This understanding of data and patterns in the data is required to build the rules for labeling raw data before using it for training ML models and fine-tuning LLMs.

We also went through various examples for aggregating columns and categorical values using groupby and mean. Then, we created reusable functions so that those functions can be reused simply by calling and passing column names to get aggregates of one or more columns.

Finally, we saw a fast and easy exploration of data using the ydata-profiling library with simple one-line Python code. Using this library, we need not remember many Pandas functions. We can simply call one line of code to perform a detailed analysis of data. We can create detailed reports of statistics for each variable with missing values, correlations, interactions, and duplicate rows.

Once we get a good sense of our data using EDA, we will be able to build the rules for creating labels for the dataset.

In the next chapter, we will see how to build these rules using Python libraries such as snorkel and compose to label an unlabeled dataset. We will also explore other methods, such as pseudo-labeling and K-means clustering, for data labeling.

You have been reading a chapter from
Data Labeling in Machine Learning with Python
Published in: Jan 2024
Publisher: Packt
ISBN-13: 9781804610541
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image