Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Python Data Cleaning and Preparation Best Practices

You're reading from   Python Data Cleaning and Preparation Best Practices A practical guide to organizing and handling data from various sources and formats using Python

Arrow left icon
Product type Paperback
Published in Sep 2024
Publisher Packt
ISBN-13 9781837634743
Length 456 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Maria Zervou Maria Zervou
Author Profile Icon Maria Zervou
Maria Zervou
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Part 1: Upstream Data Ingestion and Cleaning
2. Chapter 1: Data Ingestion Techniques FREE CHAPTER 3. Chapter 2: Importance of Data Quality 4. Chapter 3: Data Profiling – Understanding Data Structure, Quality, and Distribution 5. Chapter 4: Cleaning Messy Data and Data Manipulation 6. Chapter 5: Data Transformation – Merging and Concatenating 7. Chapter 6: Data Grouping, Aggregation, Filtering, and Applying Functions 8. Chapter 7: Data Sinks 9. Part 2: Downstream Data Cleaning – Consuming Structured Data
10. Chapter 8: Detecting and Handling Missing Values and Outliers 11. Chapter 9: Normalization and Standardization 12. Chapter 10: Handling Categorical Features 13. Chapter 11: Consuming Time Series Data 14. Part 3: Downstream Data Cleaning – Consuming Unstructured Data
15. Chapter 12: Text Preprocessing in the Era of LLMs 16. Chapter 13: Image and Audio Preprocessing with LLMs 17. Index 18. Other Books You May Enjoy

Preface

In today’s fast-paced data-driven world, it’s easy to be dazzled by the headlines about artificial intelligence (AI) breakthroughs and advanced machine learning (ML) models. But ask any seasoned data scientist or engineer, and they’ll tell you the same thing: the true foundation of any successful data project is not the flashy algorithms or sophisticated models—it’s the data itself, and more importantly, how that data is prepared.

Throughout my career, I have learned that data preprocessing is the unsung hero of data science. It’s the meticulous, often complex process that turns raw data into a reliable asset, ready for analysis, modeling, and ultimately, decision-making. I’ve seen firsthand how the right preprocessing techniques can transform an organization’s approach to data, turning potential challenges into powerful opportunities.

Yet, despite its importance, data preprocessing is often overlooked or undervalued. Many see it as a tedious step, a bottleneck that slows down the exciting work of building models and delivering insights. But I’ve always believed that this phase is where the most critical work happens. After all, even the most sophisticated algorithms can’t make up for poor-quality data. That’s why I’ve dedicated much of my professional journey to mastering this art—exploring the best tools, techniques, and strategies to make preprocessing more efficient, scalable, and aligned with the ever-evolving landscape of AI.

This book aims to demystify the data preprocessing process, offering both a solid grounding in traditional methods and a forward-looking perspective on emerging techniques. We’ll explore how Python can be leveraged to clean, transform, and organize data more effectively. We’ll also look at how the advent of large language models (LLMs) is redefining what’s possible in this space. These models are already proving to be game changers, automating tasks that were once manual and time-consuming, and providing new ways to enhance data quality and usability.

Throughout the pages, I’ll share insights from my experiences, the challenges faced, and the lessons learned along the way. My hope is to provide you with not just a technical roadmap but also a deeper understanding of the strategic importance of data preprocessing in today’s data ecosystem. I strongly believe in the philosophy of “learning by doing,” so this book includes a wealth of code examples for you to follow along with. I encourage you to try these examples, experiment with the code, and challenge yourself to apply the techniques to your own datasets.

By the end of this book, you’ll be equipped with the knowledge and skills to approach data preprocessing not just as a necessary step but also as a critical component of your overall data strategy.

So, whether you’re a data scientist, engineer, analyst, or simply someone looking to enhance their understanding of data processes, I invite you to join me on this journey. Together, we will explore how to harness the power of data preprocessing to unlock the full potential of your data.

lock icon The rest of the chapter is locked
Next Section arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image