Databases
Database knowledge is an important tool in the toolkit of any data practitioner. While pandas is a great tool for single-machine, in-memory computations, databases offer a very complementary set of analytical tools that can help with the storage and distribution of analytical processes.
Back in Chapter 4, The pandas I/O System, we walked through how to transfer data between pandas and theoretically any database. However, a relatively more recent database called DuckDB is worth some extra consideration, as it allows you to even more seamlessly bridge the worlds of dataframes and databases together.
DuckDB
DuckDB is a lightweight database system that offers a zero-copy integration with Apache Arrow, a technology that also underpins efficient data sharing and usage with pandas. It is extremely lightweight and, unlike most database systems, can be easily embedded into other tools or processes. Most importantly, DuckDB is optimized for analytical workloads.
DuckDB...