Loading data from Amazon S3 using COPY
Amazon Redshift is a relational database management system (RDBMS) that supports a number of data model structures, including dimensional, denormalized, and aggregate (rollup) structures. This makes it optimal for analytics.
In this recipe, we will set up two separate sample datasets in Amazon Redshift that are publicly available:
- A dimensional model by using a Star Schema Benchmark (SSB) (https://www.cs.umb.edu/~poneil/StarSchemaB.pdf), a retail system-based dataset.
- A denormalized model by using the Amazon.com customer product reviews dataset.
For loading the datasets, we will use the COPY
command, which allows data to be copied from Amazon S3 to Amazon Redshift. This is the recommend approach for loading large amounts of data.
Getting ready
To complete this recipe, you will need to do the following:
- Deploy an Amazon Redshift cluster in AWS region eu-west-1.
- Create Amazon Redshift cluster master user...