Tutorial – Building an End-to-End ETL Pipeline in Python
Python is a programming language with a rich ecosystem of libraries and tools, which results in it being an excellent platform for building robust, reliable, and flexible ETL pipelines. So far in this book, we have taken a granular, piecewise look at each step of the ETL process in pure Python.
In this chapter, we’ll walk through a practical, comprehensive approach to creating a full end-to-end ETL pipeline using Python-related tools. By the end of this chapter, you will be able to extract data, perform necessary cleansing and transformation activities, and load the processed data into a PostgreSQL database table.
In this chapter, you will accomplish the following tasks:
- Data extraction: Read the source CSV files and store the data in separate DataFrames
- Data cleansing and transformation: Perform key data cleaning and transformation activities on each DataFrame to prepare the data so that it...