Summary
In this chapter, we unpacked DuckDB, situating it within the landscape of databases and data processing tools, finding it to be a fully featured DBMS that is optimized for high performance over analytical workloads, while also being simple to install and work with by virtue of its in-process mode of operation.
We identified two broad areas of application where DuckDB is seeing much excitement and adoption: scaling and supercharging data science, data analytics, and ad hoc data-wrangling workflows, and forming a building block for operational data engineering infrastructure and interactive analytical data applications. We also outlined the properties of DuckDB that make it excel at these use cases: its performance, ease of use, versatility, powerful analytics capabilities, and an engaged community. Understanding DuckDB’s strengths and capabilities is important for you to be able to spot opportunities for adopting it in your own workflows, as well as being able to recognize when an alternative data processing approach would be more appropriate.
We then looked at DuckDB deployment options, seeing the wide range of DuckDB clients available, before getting DuckDB up and running on your own machine. We then finished with a short primer on some of the fundamentals of SQL. With these preparatory steps complete, you are now ready to dive into the hands-on DuckDB SQL examples we’ll be covering across the book.
In the next chapter, we’re going to dive into the topic of loading data into DuckDB, by exploring DuckDB’s versatile range of data ingestion patterns across a range of data sources and data formats. This will set us up for being able to explore DuckDB’s powerful analytical querying and data-wrangling capabilities.