AWS Data Lake architecture
Let's look at the data lake architecture with AWS Data Lake solution. The overall services provided by a data lake can be grouped into the following four categories:
Managed ingestion to onboard data from various sources and any format
Centralized storage that can scale as per the business needs
Processing and analyzing at big data scale in various programming languages
Governing and securing your data packages
The next diagram shows the overall architecture that will form a data lake in AWS. Let's review each feature category in detail:
Figure 7.1: AWS data lake architecture
Managed data ingestion to AWS Data Lake
Amazon has several tools that can ingest data to S3 and Redshift, and I will discuss the most common options here: Direct Connect, Snowball, and Kinesis. Let's review each of these options at a high level:
Direct Connect: With Direct Connect, you can establish private connectivity between AWS and your enterprise data center and provide an easy way to move...