Introduction to Amazon Redshift Serverless
“Hey, what’s a data warehouse?” John Doe, CEO and co-founder of Red.wines, a fictional specialty wine e-commerce company, asked Tathya Vishleshak*, the company’s CTO. John, who owned a boutique winery, had teamed up with Tathya for the project. The company’s success surged during the pandemic, driven by social media and the stay-at-home trend. John wanted detailed data analysis to align inventory and customer outreach. However, there was a problem – producing this analysis was slowing down their online transaction processing (OLTP) database.
“A data warehouse is like a big database where we store different data for a long time to find insights and make decisions,” Tathya explained.
John had a concern, “Sounds expensive; we’re already paying for unused warehouse space. Can we afford it?”
Tathya reassured him, “You’re right, but there are cloud data warehouses such as Amazon Redshift Serverless that let you pay as you use.”
Expanding on this, this chapter introduces data warehousing and Amazon Redshift. We’ll cover Amazon Redshift Serverless basics, such as namespaces and workgroups, and guide you in creating a data warehouse. Amazon Redshift can gather data from various sources, mainly Amazon Simple Storage Service (S3).
As we go through this chapter, you’ll learn about a crucial aspect of this, the AWS Identity and Access Management (IAM) role, needed for loading data from S3. This role connects to your Serverless namespace for smooth data transfer. You’ll also learn how to load sample data and run queries using Amazon Redshift query editor. Our goal is to make it simple and actionable, so you’re confident in navigating this journey.
Tathya Vishleshak
The phrase 'Tathya Vishleshak' can be loosely interpreted to reflect the concept of a data analyst in Sanskrit/Hindi. However, it's important to note that this is not a precise or established translation, but rather an attempt to convey a similar meaning based on the individual meanings of the words 'Tathya' and 'Vishleshak' in Sanskrit.
Additionally, Amazon Redshift is used to analyze structured and unstructured data in data warehouses, operational databases, and data lakes. It’s employed for traditional data warehousing, business intelligence, real-time analytics, and machine learning/predictive analytics. Data analysts and developers use Redshift data with machine learning (ML) models for tasks such as predicting customer behavior. Amazon Redshift ML streamlines this process using familiar SQL commands.
The book delves into ML, explaining supervised and unsupervised training. You’ll learn about problem-solving with binary classification, multi-class classification, and regression using real-world examples. You’ll also discover how to create deep learning models and custom models with XGBoost, as well as use time series forecasting. The book also covers in-database and remote inferences using existing models, applying ML for predictive analytics, and operationalizing machine learning models.
The following topics will be covered in this chapter:
- What is Amazon Redshift?
- Getting started with Amazon Redshift Serverless
- Connecting to your data warehouse
This chapter requires a web browser and access to an AWS account.