An Introduction to DuckDB
Data is everywhere, stored in a huge variety of systems across many different formats, and with an ever-growing number of tools available to data practitioners to practice their craft. DuckDB is a relatively new and explosively popular database management system (DBMS) that is increasingly being adopted for analytical data workloads by data scientists, data analysts, data engineers, and software engineers. DuckDB is open source software that is made available under the permissive MIT license, making it friendly to both commercial and non-commercial applications alike. The non-profit DuckDB Foundation stewards the long-term health of the DuckDB project, and the development of DuckDB is supported by DuckDB Labs, which employs the project’s core contributors.
In this chapter, we’ll unpack what type of database DuckDB is and identify use cases that DuckDB is well suited to and that data practitioners are increasingly adopting it for. We’ll also outline the different deployment options DuckDB comes with and take you through how to install it on your own system so that you’re ready to dive into the hands-on examples in this book. Finally, we’ll go through a quick primer on Structured Query Language (SQL), the query language DuckDB uses for its primary interface that we’ll be using for many of the exercises in this book. If you’ve wrangled your fair share of SQL before, you may want to just skim through this section. If you’re newer to using SQL, or it’s been a while between queries, then you’ll want to dive into these hands-on exercises.
By the end of this chapter, you’ll be able to orient DuckDB within the landscape of data tooling and understand what kinds of use cases you may want to consider leveraging it for, as well as be able to recognize when other data processing tooling may be more appropriate.
Across the rest of the book, we’ll show you how to take DuckDB through its paces, and in doing so, hopefully impart a sense of why there is so much enthusiasm around it. Right now, let’s jump into setting the scene for our DuckDB explorations by covering the following topics:
- What is DuckDB?
- Why use DuckDB?
- DuckDB deployment options and installation
- A short SQL primer