Big data architecture best practices
In previous sections, you learned about various big data technology and architecture patterns. Let's look at the following reference architecture diagram with different layers of a data lake architecture to learn best practices.
Figure 13.12: Data lake reference architecture
The preceding diagram depicts an end-to-end data pipeline in a data lake architecture using the AWS cloud platform with the following components:
- AWS Direct Connect to set up a high-speed network connection between the on-premises data center and AWS to migrate data. If you have large volumes of archive data, it's better to use the AWS Snow family to move it offline.
- A data ingestion layer with various components to ingest streaming data using Amazon Kinesis, relational data using AWS Data Migration Service (DMS), secure file transfer using AWS Transfer for SFTP, and AWS DataSync to update data files between cloud and on-prem systems...