Getting hands-on with semi-structured data
Although we will query semi-structured JSON data as part of this exercise, its storage still conforms to modeling best practices such as naming and standard columns. In this example, we will use semi-structured data containing information about pirates – such as details about the crew, weapons, and their ship – all stored in a single VARIANT
data type. With relational data, a row represents a single entity; in semi-structured data, a row is an entire file (although the file itself can contain single or countless entities). For this reason, metadata columns to mark individual loads and source filenames are stored alongside VARIANT
.
Figure 15.1 – A table with ELT meta columns and VARIANT for storing semi-structured data
This example uses AUTOINCREMENT
(a.k.a. IDENTITY
) as the default to generate a sequential unique ID for each load/record inserted.
In a real-world scenario, semi-structured...