Data wrangling with DuckDB – cleaning and reshaping data
Data wrangling is the process of transforming raw data into a more usable shape, making it more appropriate and valuable for a variety of downstream applications. When performing data wrangling, data practitioners must continually work towards ensuring the quality and usefulness of data. In the wild, this is often much easier said than done. Real world properties of datasets that you’ll encounter contribute towards this being a time-consuming process. Data practitioners frequently must contend with the challenges of heterogeneous data formats and schemas, missing values, and poorly documented data sources, problems which are compounded as the size of data grows. It is normal for practitioners like data scientists and data analysts to spend more of their time in the process of data preparation, compared to the actual analysis of the data. It is a common complaint of data practitioners that their tooling gets in the...