Data import and manipulation
Our first data-wrangling task is to load the web server activity log into DuckDB. We begin by creating an empty web_log_text
table to store the raw text of the web server log:
CREATE OR REPLACE TABLE web_log_text (raw_text VARCHAR);
This will create an empty table with a single column to store the raw lines of the web server. The CREATE
OR REPLACE
form of the CREATE TABLE
statement instructs DuckDB to create a new table, overwriting any table that may exist with the same name.
To load our web server access.log
file into the web_log_text
table we can use the COPY
statement.
COPY web_log_text FROM 'access.log' (DELIM '');
This might be a little counter-intuitive at first, as we are using DuckDB’s CSV reader, which we saw in Chapter 2. While this isn’t strictly a CSV file, it is a text file with each record occurring on a new line. We can leverage the CSV reader by treating it as though it has no field delimiters...