What this book covers
Chapter 1, Getting Started with Amazon Redshift, discusses how Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases. This chapter walks you through the process of creating a sample Amazon Redshift cluster to set up the necessary access and security controls to easily get started with a data warehouse on AWS. Most operations are click-of-a-button operations; you should be able to launch a cluster in under 15 minutes.
Chapter 2, Data Management, discusses how a data warehouse system has very different design goals compared to a typical transaction-oriented relational database system for online transaction processing (OLTP). Amazon Redshift is optimized for the very fast execution of complex analytic queries against very large datasets. Because of the massive amounts of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of the columnar architecture and managed service. This chapter delves into the different data structure options to set up an analytical schema for the easy querying of your end users.
Chapter 3, Loading and Unloading Data, looks at how Amazon Redshift has in-built integrations with data lakes and other analytical services and how it is easy to move and analyze data across different services. This chapter discusses scalable options to move large datasets from a data lake based out of Amazon S3 storage as well as AWS analytical services such as Amazon EMR and Amazon DynamoDB.
Chapter 4, Data Pipelines, discusses how modern data warehouses depend on ETL operations to convert bulk information into usable data. An ETL process refreshes your data warehouse from source systems, organizing the raw data into a format you can more readily use. Most organizations run ETL as a batch or as part of a real-time ingest process to keep the data warehouse current and provide timely analytics. A fully automated and highly scalable ETL process helps minimize the operational effort that you must invest in managing regular ETL pipelines. It also ensures the timely and accurate refresh of your data warehouse. Here we will discuss recipes to implement real-time and batch-based AWS native options to implement data pipelines for orchestrating data workflows.
Chapter 5, Scalable Data Orchestration for Automation, looks at how for large-scale production pipelines, a common use case is to read complex data originating from a variety of sources. This data must be transformed to make it useful to downstream applications such as machine learning pipelines, analytics dashboards, and business reports. This chapter discusses building scalable data orchestration for automation using native AWS services.
Chapter 6, Data Authorization and Security, discusses how Amazon Redshift security is one of the key pillars of a modern data warehouse for data at rest as well as in transit. In this chapter, we will discuss the industry-leading security controls provided in the form of built-in AWS IAM integration, identity federation for single sign-on (SSO), multi-factor authentication, column-level access control, Amazon Virtual Private Cloud (VPC), and AWS KMS integration to protect your data. Amazon Redshift encrypts and keeps your data secure in transit and at rest using industry-standard encryption techniques. We will also elaborate on how you can authorize data access through fine-grained access controls for the underlying data structures in Amazon Redshift.
Chapter 7, Performance Optimization, examines how Amazon Redshift being a fully managed service provides great performance out of the box for most workloads. Amazon Redshift also provides you with levers that help you maximize the throughputs when data access patterns are already established. Performance tuning on Amazon Redshift helps you manage critical SLAs for workloads and easily scale up your data warehouse to meet/exceed business needs.
Chapter 8, Cost Optimization, discusses how Amazon Redshift is one of the best price-performant data warehouse platforms on the cloud. Amazon Redshift also provides you with scalability and different options to optimize the pricing, such as elastic resizing, pause and resume, reserved instances, and using cost controls. These options allow you to create the best price-performant data warehouse solution.
Chapter 9, Lake House Architecture, looks at how AWS provides purpose-built solutions to meet the scalability and agility needs of the data architecture. With its in-built integration and governance, it is possible to easily move data across the data stores. You might have all the data centralized in a data lake, but use Amazon Redshift to get quick results for complex queries on structured data for business intelligence queries. The curated data can now be exported into an Amazon S3 data lake and classified to build a machine learning algorithm. In this chapter, we will discuss in-built integrations that allow easy data movement to integrate a data lake, data warehouse, and purpose-built data stores and enable unified governance.
Chapter 10, Extending Redshift Capabilities, looks at how Amazon Redshift allows you to analyze all your data using standard SQL, using your existing business intelligence tools. Organizations are looking for more ways to extract valuable insights from data, such as big data analytics, machine learning applications, and a range of analytical tools to drive new use cases and business processes. Building an entire solution from data sourcing, transforming data, reporting, and machine learning can be easily accomplished by taking advantage of the capabilities provided by AWS's analytical services. Amazon Redshift natively integrates with other AWS services, such as Amazon QuickSight, AWS Glue DataBrew, Amazon AppFlow, Amazon ElastiCache, Amazon Data Exchange, and Amazon SageMaker, to meet your varying business needs.