Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas
In the previous chapters, we learned about the data-wrangling process and how to utilize different services for data-wrangling activities within the AWS ecosystem:
- We explored AWS Glue DataBrew, which helps you in creating a data-wrangling pipeline through a GUI-based approach for every type of user.
- We also went through SageMaker Data Wrangler, which also helps users in creating a GUI-based data-wrangling pipeline, but it’s more closely aligned with machine learning workloads with tighter integration with the SageMaker service.
- We also explored AWS SDK for Pandas, aka awswrangler, which is a hands-on coding approach to data wrangling that integrates the Pandas library with the AWS ecosystem. This will be used by users who are more hands-on with Python programming and are in love with the Pandas library and its capabilities.
- We also went through different AWS services such as Amazon S3...