Data science environment architecture using SageMaker
Data scientists use data science environments to iterate different data science experiments with different datasets and algorithms. They need tools such as Jupyter Notebook for code authoring and execution, data processing engines for large data processing and feature engineering, and model training services for large-scale model training. The data science environment needs to provide utilities that can help you manage and track different experimentation runs. To manage artifacts such as source code and Docker images, the data scientists also need a code repository and Docker container repository.
Amazon SageMaker provides end-to-end ML capabilities that cover data preparation and data labeling, model training and tuning, model deployment, and model monitoring. It also provides other supporting features such as experiment tracking, a model registry, a feature store, and pipelines. The following diagram illustrates a basic data...