Designing the solution
To design the solution for the current problem statement, let’s analyze the data points or facts that are available to us right now:
- The current problem is a batch-based data engineering problem
- The problem at hand is a data ingestion problem
- Our source is CSV files containing structured data
- Our target is a PostgreSQL data warehouse
- Our data warehouse follows a star schema, with one fact table, two dynamic dimension tables, and three static dimension tables
- We should choose a technology that is independent of the deployment platform, considering that our solution can be migrated to the cloud in the future
- For the context and scope of this book, we will explore optimum solutions based on Java-based technologies
Based on the preceding facts, we can conclude that we have to build three similar data ingestion pipelines – one for the fact table and two others for the dynamic dimension tables. At this point, we...