You're reading from Python Real-World Projects Craft your Python portfolio with deployable applications

Product type Paperback

Published in Sep 2023

Publisher Packt

ISBN-13 9781803246765

Length 478 pages

Edition 1st Edition

Languages

Python

Concepts

Programming Language

Author (1):

Steven F. Lott

View More author details

Table of Contents (20) Chapters

Preface

1. Chapter 1: Project Zero: A Template for Other Projects

2. Chapter 2: Overview of the Projects FREE CHAPTER

3. Chapter 3: Project 1.1: Data Acquisition Base Application

4. Chapter 4: Data Acquisition Features: Web APIs and Scraping

5. Chapter 5: Data Acquisition Features: SQL Database

6. Chapter 6: Project 2.1: Data Inspection Notebook

7. Chapter 7: Data Inspection Features

8. Chapter 8: Project 2.5: Schema and Metadata

9. Chapter 9: Project 3.1: Data Cleaning Base Application

10. Chapter 10: Data Cleaning Features

11. Chapter 11: Project 3.7: Interim Data Persistence

12. Chapter 12: Project 3.8: Integrated Data Acquisition Web Service

13. Chapter 13: Project 4.1: Visual Analysis Techniques

14. Chapter 14: Project 4.2: Creating Reports

15. Chapter 15: Project 5.1: Modeling Base Application

16. Chapter 16: Project 5.2: Simple Multivariate Statistics

17. Chapter 17: Next Steps

18. Other Books You Might Enjoy

19. Index

Chapter 9
Project 3.1: Data Cleaning Base Application

Data validation, cleaning, converting, and standardizing are steps required to transform raw data acquired from source applications into something that can be used for analytical purposes. Since we started using a small data set of very clean data, we may need to improvise a bit to create some ”dirty” raw data. A good alternative is to search for more complicated, raw data.

This chapter will guide you through the design of a data cleaning application, separate from the raw data acquisition. Many details of cleaning, converting, and standardizing will be left for subsequent projects. This initial project creates a foundation that will be extended by adding features. The idea is to prepare for the goal of a complete data pipeline that starts with acquisition and passes the data through a separate cleaning stage. We want to exploit the Linux principle of having applications connected by a shared buffer, often referred to as a shell pipeline.

This chapter will cover a number of skills related to the design of data validation and cleaning applications:

CLI architecture and how to design a pipeline of processes
The core concepts of validating, cleaning, converting, and standardizing raw data

We won’t address all the aspects of converting and standardizing data in this chapter. Projects in Chapter 10, Data Cleaning Features will expand on many conversion topics. The project in Chapter 12, Project 3.8: Integrated Data Acquisition Web Service will address the integrated pipeline idea. For now, we want to build an adaptable base application that can be extended to add features.

We’ll start with a description of an idealized data cleaning application.