The Mastering Hadoop 3 and the Big Data Architect's Handbook books are recommended. To deep dive into Extract Transform and Load (ETL) workflows, read about AWS Glue (https://aws.amazon.com/glue).
EMR works perfectly with Amazon S3 to build data lakes; for more information, go to the following link: https://aws.amazon.com/big-data/datalakes-and-analytics/, and for a deep understanding of Hadoop architecture and HDFS, use the following links:
- HDFS Architecture: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html.
- Hadoop Architecture Overview: http://ercoppa.github.io/HadoopInternals/HadoopArchitectureOverview.html.