Building and testing the custom R algorithm container image
In the previous two recipes, we prepared and tested the train
, serve
, and api.r
files. With these ready, we can now proceed with crafting the Dockerfile and building the custom algorithm container image.
Tip
Wait! What's a Dockerfile
? It is a text document containing the directives (commands) used to prepare and build a container image. This container image then serves as the blueprint when running containers. Feel free to check out https://docs.docker.com/engine/reference/builder/ for more information.
In this recipe, we will prepare a Dockerfile
for the custom R container image. We will make use of the api.r
file, as well as the train
and serve
scripts we prepared in the Preparing and testing the train script in R and Preparing and testing the serve script in R recipes. After that, we will use the docker build
command to prepare the image before pushing it to an Amazon ECR repository.
Getting ready
Make sure you have completed the Preparing and testing the serve script in R recipe.
How to do it...
The initial steps in this recipe focus on preparing the Dockerfile
. Let's get started:
- Double-click the
Dockerfile
file in the file tree to open it in the Editor pane. Make sure that this is the same Dockerfile that's inside theml-r
directory:Here, we can see that there's a
Dockerfile
inside theml-r
directory. Remember that we created an emptyDockerfile
in the Setting up the Python and R experimentation environments recipe. Clicking on it in the file tree should open an empty file in the Editor pane:Here, we have an empty
Dockerfile
. In the next step, we will update this by adding four lines of code. - Update the
Dockerfile
with the following block of configuration code:FROM arvslat/amazon-sagemaker-cookbook-r-base:1 COPY train /usr/local/bin/train COPY serve /usr/local/bin/serve COPY api.r /usr/local/bin/api.r
Here, we are planning to build on top of an existing image called
amazon-sagemaker-cookbook-r-base
. This image already has a few prerequisites installed. These include therjson
,here
, andplumber
packages so that you don't have to worry about getting the installation steps working properly in this recipe. For more details on this image, check out https://hub.docker.com/r/arvslat/amazon-sagemaker-cookbook-r-base:Here, we can see the Docker Hub page for the
amazon-sagemaker-cookbook-r-base
image.Tip
You can access a working copy of this
Dockerfile
in the Amazon SageMaker Cookbook GitHub repository: https://github.com/PacktPublishing/Machine-Learining-with-Amazon-SageMaker-Cookbook/blob/master/Chapter02/ml-r/Dockerfile.With our
Dockerfile
ready, we will proceed by using the Terminal until the end of this recipe. - You may use a new Terminal tab or an existing one to run the next set of commands:
The preceding screenshot shows how to create a new Terminal. Note that the Terminal pane is right under the Editor pane in the AWS Cloud9 IDE.
- Navigate to the
ml-python
directory containing ourDockerfile
:cd /home/ubuntu/environment/opt/ml-r
- Specify the image name and the tag number:
IMAGE_NAME=chap02_r TAG=1
- Build the Docker container using the
docker build
command:docker build --no-cache -t $IMAGE_NAME:$TAG .
The
docker build
command makes use of what is written inside ourDockerfile
. We start with the image specified in theFROM
directive and then proceed by copying the file files into the container image. - Use the
docker run
command to test if thetrain
script works:docker run --name rtrain --rm -v /opt/ml:/opt/ml $IMAGE_NAME:$TAG train
Let's quickly discuss some of the different options that are used in this command. The
--rm
flag makes Docker clean up the container after the container exits, while the-v
flag allows us to mount the/opt/ml
directory from the host system to the/opt/ml
directory of the container:Here, we can see the logs and results after running the
docker run
command. - Use the
docker run
command to test if theserve
script works:docker run --name rserve --rm -v /opt/ml:/opt/ml $IMAGE_NAME:$TAG serve
After running this command, the
plumber
API server will start successfully, as shown in the following screenshot:Here, we can see that the API is running on port
8080
. In the base image we used, we addedEXPOSE 8080
to allow us to access this port in the running container. - Open a new Terminal tab:
As the API is running already in the first Terminal, we have created a new Terminal here.
- In the new Terminal tab, run the following command to get the IP address of the running Plumber API:
SERVE_IP=$(docker network inspect bridge | jq -r ".[0].Containers[].IPv4Address" | awk -F/ '{print $1}') echo $SERVE_IP
What happened here? Check out the How it works… section of this recipe for a detailed explanation of what happened in the previous block of code! In the meantime, let's think of this line as using multiple commands to get the IP address of the running API server. We should get an IP address equal or similar to
172.17.0.2
. Of course, we may get a different IP address value altogether. - Next, test the
ping
endpoint URL using thecurl
command:curl http://$SERVE_IP:8080/ping
We should get an
OK
after running this command. - Finally, test the
invocations
endpoint URL using thecurl
command:curl -d "1" -X POST http://$SERVE_IP:8080/invocations
We should get a value similar or close to
881.342840085751
after invoking theinvocations
endpoint.
Now, let's see how this works!
How it works…
In this recipe, we built a custom container image with our Dockerfile
. In our Dockerfile
, we did the following:
- We used the
arvslat/amazon-sagemaker-cookbook-r-base
image as the base image. Check out https://hub.docker.com/repository/docker/arvslat/amazon-sagemaker-cookbook-r-base for more details on this image. - We copied the
train
,serve
, andapi.r
files to the/usr/local/bin
directory inside the container image. These scripts are executed when we usedocker run
.
Using the arvslat/amazon-sagemaker-cookbook-r-base
image as the base image allowed us to write a shorter Dockerfile
that focuses only on copying the train
, serve
, and api.r
files to the directory inside the container image. Behind the scenes, we have already pre-installed the rjson
, plumber
, and here
packages, along with their prerequisites, inside this container image so that we will not run into issues when building the custom container image. Here is a quick look at the Dockerfile
file that was used for the base image that we are using in this recipe:
FROM r-base:4.0.2 RUN apt-get -y update RUN apt-get install -y --no-install-recommends wget RUN apt-get install -y --no-install-recommends libcurl4-openssl-dev RUN apt-get install -y --no-install-recommends libsodium-dev RUN R -e "install.packages('rjson',repos='https://cloud.r-project.org')" RUN R -e "install.packages('plumber',repos='https://cloud.r-project.org')" RUN R -e "install.packages('here',repos='https://cloud.r-project.org')" ENV PATH "/opt/ml:$PATH" WORKDIR /usr/local/bin EXPOSE 8080
In this Dockerfile
, we can see that we are using r-base:4.0.2
as the base image. If we were to use a higher version, there's a chance that the plumber
package will not install properly, which is why we had to stick with a lower version of this base image.
With these potential blockers out of the way, we were able to build a custom container image in a short amount of time. In the Using the custom R algorithm container image for training and inference with Amazon SageMaker Local Mode recipe of this chapter, we will use this custom container image when we do training and deployment with reticulate
so that we can use the SageMaker Python SDK with our R code.