Building a fully custom container for SageMaker Processing
We'll reuse the news headlines example from Chapter 6, Training Natural Processing Models:
- We start with a Dockerfile based on a minimal Python image. We install dependencies, add our processing script, and define it as our entry point:
FROM python:3.7-slim RUN pip3 install --no-cache gensim nltk sagemaker RUN python3 -m nltk.downloader stopwords wordnet ADD preprocessing-lda-ntm.py / ENTRYPOINT ["python3", "/preprocessing-lda-ntm.py"]
- We build the image and tag it as
sm-processing-custom:latest
:$ docker build -t sm-processing-custom:latest -f Dockerfile .
The resulting image is 497 MB. For comparison, it's 1.2 GB if we start from
python:3.7
instead ofpython:3.7-slim
. This makes it faster to push and download. - Using the AWS CLI, we create a repository in Amazon ECR to host this image, and we log in to the repository:
$ aws ecr create-repository --repository-name sm-processing-custom...