Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Wrangling on AWS

You're reading from   Data Wrangling on AWS Clean and organize complex data for analysis

Arrow left icon
Product type Paperback
Published in Jul 2023
Publisher Packt
ISBN-13 9781801810906
Length 420 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Sankar M Sankar M
Author Profile Icon Sankar M
Sankar M
Navnit Shukla Navnit Shukla
Author Profile Icon Navnit Shukla
Navnit Shukla
Sam Palani Sam Palani
Author Profile Icon Sam Palani
Sam Palani
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Part 1:Unleashing Data Wrangling with AWS
2. Chapter 1: Getting Started with Data Wrangling FREE CHAPTER 3. Part 2:Data Wrangling with AWS Tools
4. Chapter 2: Introduction to AWS Glue DataBrew 5. Chapter 3: Introducing AWS SDK for pandas 6. Chapter 4: Introduction to SageMaker Data Wrangler 7. Part 3:AWS Data Management and Analysis
8. Chapter 5: Working with Amazon S3 9. Chapter 6: Working with AWS Glue 10. Chapter 7: Working with Athena 11. Chapter 8: Working with QuickSight 12. Part 4:Advanced Data Manipulation and ML Data Optimization
13. Chapter 9: Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas 14. Chapter 10: Data Processing for Machine Learning with SageMaker Data Wrangler 15. Part 5:Ensuring Data Lake Security and Monitoring
16. Chapter 11: Data Lake Security and Monitoring 17. Index 18. Other Books You May Enjoy

Options available for data wrangling on AWS

Depending on customer needs, data sources, and team expertise, AWS provides multiple options for data wrangling. In this section, we will cover the most common options that are available with AWS.

AWS Glue DataBrew

Released in 2020, AWS Glue DataBrew is a visual data preparation tool that makes it easy for you to clean and normalize data so that you can prepare it for analytics and machine learning. The visual UI provided by this service allows data analysts with no coding or scripting experience to accomplish all aspects of data wrangling. It comes with a rich set of common pre-built data transformation actions that can simplify these data wrangling activities. Similar to any Software as a service (SaaS) (https://en.wikipedia.org/wiki/Software_as_a_service), customers can start using the web UI without the need to provision any servers and only need to pay for the resources they use.

SageMaker Data Wrangler

Similar to AWS Glue DataBrew, AWS also provides SageMaker Data Wrangler, a web UI-based data wrangling service catered more toward data scientists. If the primary use case is around building a machine learning pipeline, SageMaker Data Wrangler should be the preference. It integrates directly with SageMaker Studio, where data that’s been prepared using SageMaker Data Wrangler can be fed into a data pipeline to build, train, and deploy machine learning models. It comes with pre-configured data transformations to impute missing data with means or medians, one-hot encoding, and time series-specific transformers that are required for preparing data for machine learning.

AWS SDK for pandas

For customers with a strong data integration team with coding and scripting experience, AWS SDK for pandas (https://github.com/aws/aws-sdk-pandas) is a great option. Built on top of other open source projects, it offers abstracted functions for executing typical data wrangling tasks such as loading/unloading data from various databases, data warehouses, and object data stores such as Amazon S3. AWS SDK for pandas simplifies integration with common AWS services such as Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, DynamoDB, and S3. It also supports common databases such as MySQL and SQL Server.

You have been reading a chapter from
Data Wrangling on AWS
Published in: Jul 2023
Publisher: Packt
ISBN-13: 9781801810906
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime