Managed data processing with SageMaker Processing in Python
In the previous recipe, we prepared a few prerequisites, including preparing the dummy dataset within a specified directory, for the SageMaker Processing job we will run in this recipe. Now, we will create a Python script and use SageMaker Processing to run the custom Python script inside a managed environment. This managed environment is automatically created, configured, and destroyed when the processing job is launched and executed. If you are working on a requirement that is similar to one of the following, then this recipe is for you:
- Normalizing numerical features with
sklearn
(scikit-learn
) - Text preprocessing with
nltk
(Natural Language Toolkit) - Automated feature engineering with
pandas
- Performing post-training processing and evaluation steps
Once we have completed this recipe, we will have the custom Python script executed inside an isolated and managed SageMaker Processing environment and...