Hands-On Exploratory Data Analysis with DuckDB
DuckDB is particularly well-suited to data analysis workflows due to its versatility and highly optimized performance, allowing practitioners to scale data analysis workflows beyond what they would be otherwise able to achieve on their local machine. Previously, we have been focusing more on covering core DuckDB concepts and features, with a bit of data analysis thrown in as examples. In this chapter, we’ll be putting the data analysis workflow first by taking what we’ve learned about using DuckDB and using these foundations to perform some hands-on exploratory analysis of a dataset. This will allow us to explore different approaches for performing effective data analysis with DuckDB.
More specifically, in this chapter, we’ll cover the following topics:
- Loading our dataset from a CSV file and applying some data cleaning steps, before writing it to a DuckDB database for our analysis
- Using the JupySQL...