Utilizing SageMaker for ETL
In this section, we will describe how to set up an ETL process using SageMaker (the following screenshot shows the web console for SageMaker). The main advantage of SageMaker comes from the fact that it is a fully managed infrastructure for building, training, and deploying ML models. The downside is the fact that it is more expensive than EMR and Glue.
SageMaker Studio is a web-based development environment for SageMaker. SageMaker has been introduced with the philosophy that it’s an all-in-one place for a data analytics pipeline. Every phase of an ML pipeline can be achieved using SageMaker Studio: data processing, algorithm design, scheduling jobs, experiment management, developing and training models, creating inference endpoints, detecting data drift, and visualizing model performance. SageMaker Studio notebooks can also be connected to EMR for computations with some restrictions; only limited Docker images (such as Data Science
or SparkMagic...