Working with Amazon S3
In previous chapters, we repeatedly discussed the concepts of big data and data lakes and how organizations are using them to store and extract valuable insights from their data through various data wrangling processes, as outlined in Chapter 1, using Amazon Web Services (AWS) services such as AWS Glue DataBrew, the AWS SDK for Pandas, and SageMaker Data Wrangler. This chapter will delve deeper into the specifics of big data and data lakes.
Specifically, we will be covering the following topics:
- The definition and concept of big data
- The characteristics of big data
- The concept and definition of a data lake
- Best practices for building a data lake on Amazon Simple Storage Service (Amazon S3)
- The layout and organization of data on Amazon S3
We will begin by exploring the definition and characteristics of big data.