Extending a data warehouse using Amazon Redshift Spectrum
Amazon Redshift Spectrum allows Amazon Redshift customers to query data directly from an Amazon S3 data lake. This allows us to combine data warehouse data with data lake data, which makes use of open source file formats such as Parquet, comma-separated values (CSV), Sequence, Avro, and so on. Amazon Redshift Spectrum is a serverless solution, so customers don't have to provision or manage it. It allows customers to perform unified analytics on data in an Amazon Redshift cluster and data in an Amazon S3 data lake, and easily create insights from disparate datasets.
Getting ready
To complete this recipe, you will need the following to be set up:
- An IAM user with access to Amazon Redshift
- An Amazon Redshift cluster deployed in the eu-west-1 AWS Region with the retail dataset created from Chapter 3, Loading and Unloading Data, using the Loading data from Amazon S3 using COPY recipe
- Amazon Redshift cluster...