Best practices for building a data science environment
Data science environments are meant for data scientists to perform quick experimentations using a wide range of ML frameworks and libraries. The following are some best practices to follow when providing such an environment for your data scientists:
- Run large-scale model training using the SageMaker Training service instead of Studio notebooks: SageMaker Studio notebooks are meant for quick experimentation with small datasets. While it is possible to provision large EC2 instances for certain large model training jobs, it is not cost effective to always keep a large EC2 instance running for a notebook all the time.
- Abstract infrastructure configuration details from data scientists: There are many infrastructure configurations to consider when using SageMaker, such as networking configuration, IAM roles, encryption keys, EC2 instance types, and storage options. To make the lives of data scientists easier, abstract...