Preparing the pedestrian traffic dataset for analysis
Before we can start analyzing the dataset, we need to load it into DuckDB in a shape that supports the kinds of analytical queries that will enable us to explore our dataset effectively. Our plan of attack will be to establish the steps required for parsing and transforming the CSV-backed dataset into a useful schema of columns and data types. After, we’ll load this data into a table in a persistent on-disk DuckDB database, which will enable ongoing analysis across working sessions.
As we discussed in Chapter 8, when working with DuckDB in Python, we have a choice between using the Relational API or the DB-API to work with DuckDB. The DB-API tends to be better suited when building applications as it promotes interoperability across data processing tools, whereas the Relational API offers a richer interface and features that are designed to enable effective data analysis; so, we will be working with the Relational API for...