Summary
In this chapter, we learned how a cloud data warehouse can be used to store hot data to optimize performance and manage costs. We reviewed some common "anti-patterns" for data warehouse usage before diving deep into the Redshift architecture to learn more about how Redshift optimizes data storage across nodes.
We then reviewed some of the important design decisions that need to be made when creating an optimized schema in Redshift, before reviewing ingested unloaded from Redshift.
Then, we performed a hands-on exercise where we created a new Redshift cluster, configured Redshift Spectrum to query data from Amazon S3, and then loaded a subset of data from S3 into Redshift. We then ran some complex queries to calculate the distance between two points before creating a materialized view with the results of our complex query.
In the next chapter, we will discuss how to orchestrate various components of our data engineering pipelines.