Introducing the project
Imagine that your engineering firm is hired by a data science start-up – SafeRoad, a cutting-edge start-up – to create a custom data pipeline that seamlessly connects to the city of Chicago’s open data portal. SafeRoad intends to analyze Chicago’s vehicle crash data and is particularly interested in uncovering the factors responsible for these incidents.
The approach
To accomplish SafeRoad’s data request, your supervisor suggests modeling the data with a PostgreSQL database. As you may recall, PostgreSQL stands tall as a robust and open source relational database management system (RDBMS). PostgreSQL is a strong database choice since it’s not only cost-effective but it can also be utilized to build an ETL pipeline to easily load the database tables using pure Python. Your supervisor also mentions that the PostgreSQL database schema can be used to optimize queries to safeguard against pipeline crashes. You take...