Aggregate functions and common table expressions
We have used data wrangling techniques to take our raw web server activity data and load it into DuckDB, parse it into meaningful fields, transform it into correct data types, and enrich it with added metadata. With these data processing steps complete, we’ll now look at a type of operation we need for a core data analysis technique: the summarizing of large datasets by generating individual summary statistics of different fields that help us understand the shape of the underlying data. A common example is finding the average of a numerical field, which has the effect of converting a many-valued column into a single numerical value. This type of operation is commonly referred to as an aggregate function.
To ensure that we’re ready for this section, we’ll recreate all the necessary tables and views by executing the SQL from the web_log_script.sql
script:
.read "web_log_script.sql"