Leveraging positional joins
Relational tables in databases are traditionally considered unordered. SQL uses the ORDER BY
clause to specify the ordering when data is retrieved. However, much of the data we process from other systems has an implicit order. There may be a natural ordering with a log file (with newer records appended to the end of the file) and files on disk, such as Parquet or CSV, can have a natural ordering of the file rows. It can be frustrating (and somewhat perplexing) to users of dataframes who are used to structures that preserve their rows to discover that relational tables are unordered.
DuckDB has the very useful POSITIONAL
join, which acknowledges the implicit row numbers in each external table. This positional matching characteristic is frequently used by pandas dataframes when joining on the ordinal positioning of rows.
In this section, we will load implicitly ordered data from files on disk and introduce DuckDB’s capability to join on the relative...