Building and testing the custom Python algorithm container image
In this recipe, we will prepare a Dockerfile
for the custom Python container image. We will make use of the train
and serve
scripts that we prepared in the previous recipes. After that, we will run the docker build
command to prepare the image before pushing it to an Amazon ECR repository.
Tip
Wait! What's a Dockerfile
? It's a text document containing the directives (commands) used to prepare and build a container image. This container image then serves as the blueprint when running containers. Feel free to check out https://docs.docker.com/engine/reference/builder/ for more information on Dockerfiles.
Getting ready
Make sure you have completed the Preparing and testing the serve script in Python recipe.
How to do it…
The initial steps in this recipe focus on preparing a Dockerfile
. Let's get started:
- Double-click the
Dockerfile
file in the file tree to open it in the Editor pane. Make sure that this is the sameDockerfile
that's inside theml-python
directory:Here, we can see a
Dockerfile
inside theml-python
directory. Remember that we created an emptyDockerfile
in the Setting up the Python and R experimentation environments recipe. Clicking it in the file tree should open an empty file in the Editor pane:Here, we have an empty
Dockerfile
. In the next step, we will update this by adding three lines of code. - Update
Dockerfile
with the following block of configuration code:FROM arvslat/amazon-sagemaker-cookbook-python-base:1 COPY train /usr/local/bin/train COPY serve /usr/local/bin/serve
Here, we are planning to build on top of an existing image called
amazon-sagemaker-cookbook-python-base
. This image already has a few prerequisites installed. These include theFlask
,pandas
, andScikit-learn
libraries so that you won't have to worry about getting the installation steps working properly in this recipe. For more details on this image, check out https://hub.docker.com/r/arvslat/amazon-sagemaker-cookbook-python-base:Here, we can see the Docker Hub page for the amazon-sagemaker-cookbook-python-base image.
Tip
You can access a working copy of this
Dockerfile
in the Machine Learning with Amazon SageMaker Cookbook GitHub repository: https://github.com/PacktPublishing/Machine-Learning-with-Amazon-SageMaker-Cookbook/blob/master/Chapter02/ml-python/serve.With the
Dockerfile
ready, we will proceed with using the Terminal until the end of this recipe: - You can use a new Terminal tab or an existing one to run the next set of commands:
Here, we can see how to create a new Terminal. Note that the Terminal pane is under the Editor pane in the AWS Cloud9 IDE.
- Navigate to the
ml-python
directory containing ourDockerfile
:cd /home/ubuntu/environment/opt/ml-python
- Specify the image name and the tag number:
IMAGE_NAME=chap02_python TAG=1
- Build the Docker container using the
docker build
command:docker build --no-cache -t $IMAGE_NAME:$TAG .
The
docker build
command makes use of what is written inside ourDockerfile
. We start with the image specified in theFROM
directive and then we proceed by copying the file into the container image. - Use the
docker run
command to test if thetrain
script works:docker run --name pytrain --rm -v /opt/ml:/opt/ml $IMAGE_NAME:$TAG train
Let's quickly discuss some of the different options that were used in this command. The
--rm
flag makes Docker clean up the container after the container exits. The-v
flag allows us to mount the/opt/ml
directory from the host system to the/opt/ml
directory of the container:Here, we can see the results after running the
docker run
command. It should show logs similar to what we had in the Preparing and testing the train script in Python recipe. - Use the
docker run
command to test if theserve
script works:docker run --name pyserve --rm -v /opt/ml:/opt/ml $IMAGE_NAME:$TAG serve
After running this command, the Flask API server starts successfully. We should see logs similar to what we had in the Preparing and testing the serve script in Python recipe:
Here, we can see that the API is running on port
8080
. In the base image we used, we addedEXPOSE 8080
to allow us to access this port in the running container. - Open a new Terminal tab:
As the API is running already in the first Terminal, we have created a new one.
- In the new Terminal tab, run the following command to get the IP address of the running Flask app:
SERVE_IP=$(docker network inspect bridge | jq -r ".[0].Containers[].IPv4Address" | awk -F/ '{print $1}') echo $SERVE_IP
We should get an IP address that's equal or similar to
172.17.0.2
. Of course, we may get a different IP address value. - Next, test the ping endpoint URL using the
curl
command:curl http://$SERVE_IP:8080/ping
We should get an
OK
after running this command. - Finally, test the
invocations
endpoint URL using thecurl
command:curl -d "1" -X POST http://$SERVE_IP:8080/invocations
We should get a value similar or close to
881.3428400857507
after invoking theinvocations
endpoint.
At this point, it is safe to say that the custom container image we have prepared in this recipe is ready. Now, let's see how this works!
How it works…
In this recipe, we built a custom container image using the Dockerfile
configuration we specified. When you have a Dockerfile
, the standard set of steps would be to use the docker build
command to build the Docker image, authenticate with ECR to gain the necessary permissions, use the docker tag
command to tag the image appropriately, and use the docker push
command to push the Docker image to the ECR repository.
Let's discuss what we have inside our Dockerfile
. If this is your first time hearing about Dockerfiles, they are simply text files containing commands to build the image. In our Dockerfile
, we did the following:
- We used
arvslat/amazon-sagemaker-cookbook-python-base
as the base image. Check out https://hub.docker.com/repository/docker/arvslat/amazon-sagemaker-cookbook-python-base for more details about this image. - We copied the
train
andserve
scripts to the/usr/local/bin
directory inside the container image. These scripts are executed when we usedocker run
.
Using the arvslat/amazon-sagemaker-cookbook-python-base
image as the base image allowed us to write a shorter Dockerfile
that focuses only on copying the train
and serve
files to the directory inside the container image. Behind the scenes, we have already pre-installed the flask
, pandas
, scikit-learn
, and joblib
packages, along with their prerequisites, inside this container image so that we will not run into issues when building the custom container image. Here is a quick look at the Dockerfile
file we used as the base image that we are using in this recipe:
FROM ubuntu:18.04 RUN apt-get -y update RUN apt-get install -y python3.6 RUN apt-get install -y --no-install-recommends python3-pip RUN apt-get install -y python3-setuptools RUN ln -s /usr/bin/python3 /usr/bin/python & \ ln -s /usr/bin/pip3 /usr/bin/pip RUN pip install flask RUN pip install pandas RUN pip install scikit-learn RUN pip install joblib WORKDIR /usr/local/bin EXPOSE 8080
In this Dockerfile
, we can see that we are using Ubuntu:18.04
as the base image. Note that we can use other base images as well, depending on the libraries and frameworks we want to be installed in the container image.
Once we have the container image built, the next step will be to test if the train
and serve
scripts will work inside the container once we use docker run
. Getting the IP address of the running container may be the trickiest part, as shown in the following block of code:
SERVE_IP=$(docker network inspect bridge | jq -r ".[0].Containers[].IPv4Address" | awk -F/ '{print $1}')
We can divide this into the following parts:
docker network inspect bridge
: This provides detailed information about the bridge network in JSON format. It should return an output with a structure similar to the following JSON value:[ { ... "Containers": { "1b6cf4a4b8fc5ea5...": { "Name": "pyserve", "EndpointID": "ecc78fb63c1ad32f0...", "MacAddress": "02:42:ac:11:00:02", "IPv4Address": "172.17.0.2/16", "IPv6Address": "" } }, ... } ]
jq -r ".[0].Containers[].IPv4Address"
: This parses through the JSON response value fromdocker network inspect bridge
. Piping this after the first command would yield an output similar to172.17.0.2/16
.awk -F/ '{print $1}'
: This splits the result from thejq
command using the/
separator and returns the value before/
. After getting theAA.BB.CC.DD/16
value from the previous command, we getAA.BB.CC.DD
after using theawk
command.
Once we have the IP address of the running container, we can ping the /ping
and /invocations
endpoints, similar to how we did in the Preparing and testing the serve script in Python recipe.
In the next recipes in this chapter, we will use this custom container image when we do training and deployment with the SageMaker Python SDK.