Step 2 – importing data
Before we can start importing data into SageMaker Data Wrangler, we need to create a connection with our data source. SageMaker Data Wrangler provides out-of-the-box native connectors to Amazon S3, Amazon Athena, Amazon Redshift, Snowflake, Amazon EMR, and Databricks. Besides that, you can also set up new data sources with over 40 SaaS and web applications using Amazon AppFlow, a fully managed integration service that helps you securely transfer data between software as a service (SaaS) applications. The Create connection screen shows the connectors in Data Wrangler, along with additional data sources you can set up using Amazon AppFlow.
Figure 10.5: Data Wrangler data sources
In this chapter, we will use a publicly available example, the Titanic dataset. The Titanic dataset is considered the “Hello World” of machine learning datasets due to the number of commonly used data processing and machine learning techniques...