Currently, the most commonly adopted way to store and deliver Docker images is through Docker Registry, an open source application by Docker that hosts Docker repositories. This application can be deployed on-premises, as well as used as a service from multiple providers, such as Docker Hub, Quay.io, and AWS ECR.
This article is an excerpt taken from the book Kubernetes on AWS written by Ed Robinson. In this book, you will discover how to utilize the power of Kubernetes to manage and update your applications. In this article, you will learn how to use Docker for pushing images onto ECR.
The application is a simple, stateless service, where most of the maintenance work involves making sure that storage is available, safe, and secure. As any seasoned system administrator knows, that is far from an easy ordeal, especially, if there is a large data store. For that reason, and especially if you're just starting out, it is highly recommended to use a hosted solution and let someone else deal with keeping your images safe and readily available.
ECR is AWS's approach to a hosted Docker registry, where there's one registry per account. It uses AWS IAM to authenticate and authorize users to push and pull images. By default, the limits for both repositories and images are set to 1,000.
Creating a repository
To create a repository, it's as simple as executing the following aws ecr command:
$ aws ecr create-repository --repository-name randserver
This will create a repository for storing our randserver application. Its output should look like this:
{
"repository": {
"repositoryArn": "arn:aws:ecr:eu-central-1:123456789012:repository/randserver",
"registryId": "123456789012",
"repositoryName": "randserver",
"repositoryUri": "123456789012.dkr.ecr.eu-central-1.amazonaws.com/randserver",
"createdAt": 1543162198.0
}
}
A nice addition to your repositories is a life cycle policy that cleans up older versions of your images so that you don't eventually get blocked from pushing a newer version. This can be achieved as follows, using the same aws ecr command:
$ aws ecr put-lifecycle-policy --registry-id 123456789012 --repository-name randserver --lifecycle-policy-text '{"rules":[{"rulePriority":10,"description":"Expire old images","selection":{"tagStatus":"any","countType":"imageCountMoreThan","countNumber":800},"action":{"type":"expire"}}]}'
This particular policy will start cleaning up once have more than 800 images on the same repository. You could also clean up based on the images, age, or both, as well as consider only some tags in your cleanup.
Pushing and pulling images from your workstation
In order use your newly-created ECR repository, first we're going to need to authenticate your local Docker daemon against the ECR registry. Once again, aws ecr will help you achieve just that:
aws ecr get-login --registry-ids 123456789012 --no-include-email
This will output a docker login command that will add a new user-password pair for your Docker configuration. You can copy-paste that command, or you can just run it as follows; the results will be the same:
$(aws ecr get-login --registry-ids 123456789012 --no-include-email)
Now, pushing and pulling images is just like using any other Docker registry, using the outputted repository URI that we got when creating the repository:
$ docker push 123456789012.dkr.ecr.eu-central-1.amazonaws.com/randserver:0.0.1
$ docker pull 123456789012.dkr.ecr.eu-central-1.amazonaws.com/randserver:0.0.1
Setting up privileges for pushing images
IAM users' permissions should allow your users to perform strictly only the operations they actually need to, in order to avoid any possible mistakes that might have a larger area of impact. This is also true for ECR management, and to that effect, there are three AWS IAM managed policies that greatly simplify achieving it:
AmazonEC2ContainerRegistryFullAccess: This allows a user to perform any operation on your ECR repositories, including deleting them, and should therefore be left for system administrators and owners.
AmazonEC2ContainerRegistryPowerUser: This allows a user to push and pull images on any repositories, which is very handy for developers that are actively building and deploying your software.
AmazonEC2ContainerRegistryReadOnly: This allows a user to pull images on any repository, which is useful for scenarios where developers are not pushing their software from their workstation, and are instead just pulling internal dependencies to work on their projects.
All of these policies can be attached to an IAM user as follows, by replacing the policy name at the end of the ARN with a suitable policy and pointing --user-name to the user you are managing:
$ aws iam attach-user-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly --user-name johndoe
All these AWS managed policies do have an important characteristic—all of them add permissions for all repositories on your registry. You'll probably find several use cases where this is far from ideal—maybe your organization has several teams that do not need access over each other's repositories; maybe you would like to have a user with the power to delete some repositories, but not all; or maybe you just need access to a single repository for Continuous Integration (CI) setup.
If your needs match any of these described situations, you should create your own policies with as granular permissions as required.
First, we will create an IAM group for the developers of our randserver application:
$ aws iam create-group --group-name randserver-developers
{
"Group": {
"Path": "/",
"GroupName": "randserver-developers",
"GroupId": "AGPAJRDMVLGOJF3ARET5K",
"Arn": "arn:aws:iam::123456789012:group/randserver-developers",
"CreateDate": "2018-10-25T11:45:42Z"
}
}
Then we'll add the johndoe user to the group:
$ aws iam add-user-to-group --group-name randserver-developers --user-name johndoe
Now we'll need to create our policy so that we can attach it to the group. Copy this JSON document to a file:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage"
],
"Resource": "arn:aws:ecr:eu-central-1:123456789012:repository/randserver"
}]
}
To create the policy, execute the following, passing the appropriate path for the JSON document file:
$ aws iam create-policy --policy-name EcrPushPullRandserverDevelopers --policy-document file://./policy.json
{
"Policy": {
"PolicyName": "EcrPushPullRandserverDevelopers",
"PolicyId": "ANPAITNBFTFWZMI4WFOY6",
"Arn": "arn:aws:iam::123456789012:policy/EcrPushPullRandserverDevelopers",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"CreateDate": "2018-10-25T12:00:15Z",
"UpdateDate": "2018-10-25T12:00:15Z"
}
}
The final step is then to attach the policy to the group, so that johndoe and all future developers of this application can use the repository from their workstation:
$ aws iam attach-group-policy --group-name randserver-developers --policy-arn arn:aws:iam::123456789012:policy/EcrPushPullRandserverDevelopers
Use images stored on ECR in Kubernetes
By attaching the IAM policy, AmazonEC2ContainerRegistryReadOnly, to the instance profile used by our cluster nodes, allows our nodes to fetch any images in any repository in the AWS account where the cluster resides.
In order to use an ECR repository in this manner, you should set the image field of the pod template on your manifest to point to it, such as in the following example:
image: 123456789012.dkr.ecr.eu-central-1.amazonaws.com/randserver:0.0.1.
Tagging images
Whenever a Docker image is pushed to a registry, we need to identify the image with a tag. A tag can be any alphanumeric string: latest stable v1.7.3 and even c31b1656da70a0b0b683b060187b889c4fd1d958 are both perfectly valid examples of tags that you might use to identify an image that you push to ECR.
Depending on how your software is developed and versioned, what you put in this tag might be different. There are three main strategies that might be adopted depending on different types of applications and development processes that we might need to generate images for.
Version Control System (VCS) references
When you build images from software where the source is managed in a version control system, such as Git, the simplest way of tagging your images, in this case, is to utilize the commit ID (often referred to as an SHA when using Git) from your VCS. This gives you a very simple way to check exactly which version of your code is currently running at any one time.
This first strategy is often adopted for applications where small changes are delivered in an incremental fashion. New versions of your images might be pushed multiple times a day and automatically deployed to testing and production-like environments. Good examples of these kinds of applications that are web applications and other software delivered as a service.
By pushing a commit ID through an automated testing and release pipeline, you can easily generate deployment manifests for an exact revision of your software.
Semantic versions
However, this strategy becomes more cumbersome and harder to deal with if you are building container images that are intended to be used by many users, whether that be multiple users within your organisation or even when you publish images publicly for third parties to use. With applications like these, it can be helpful to use a semantic version number that has some meaning, helping those that depend on you image decide if it safe to move to a newer version.
A common scheme for these sorts of images is called Semantic Versioning (SemVer). This is a version number made up of three individual numbers separated by dots. These numbers are known as the MAJOR, MINOR, and PATCH version. A semantic version number lays out these numbers in the form MAJOR.MINOR.PATCH. When a number is incremented, the less significant numbers to the right are reset to 0.
These version numbers give downstream users useful information about how a new version might affect compatibility:
The PATCH version is incremented whenever a bug or security fix is implemented that maintains backwards compatibility
The MINOR version is incremented whenever a new feature is added that maintains backwards compatibility
Any changes that break backwards compatibility should increment the MAJOR version number
This is useful because users of your images know that MINOR or PATCH level changes are unlikely to break anything, so only basic testing should be required when upgrading to a new version. But if upgrading to a new MAJOR version, they ought to check and test the impact on the changes, which might require changes to configuration or integration code.
Upstream version numbers
Often, when we when build container images that repackage existing software, it is desirable to use the original version number of the packaged software itself. Sometimes, it can help to add a suffix to version the configuration that you're using to package that software with.
In larger organizations, it can be common to package software tools with configuration files with organisation-specific default settings. You might find it useful to version the configuration files as well as the software tool.
If I were packaging the MySQL database for use in my organization, an image tag might look like 8.0.12-c15, where 8.0.12 refers to the upstream MySQL version and c15 is a version number I have created for the MySQL configuration files included in my container image.
Labelling images
If you have an even moderately complex workflow for developing and releasing your software, you might quickly find yourself wanting to add even more semantic information about your images into its tag than just a simple version number. This can quickly become unwieldy, as you will need to modify your build and deployment tooling whenever you want to add some extra information.
Thankfully, Docker images carry around labels that can be used to store whatever metadata is relevant to your image.
Adding a label to your image is done at build time, using the LABEL instruction in your Dockerfile. The LABEL instruction accepts multiple key value pairs in this format:
LABEL <key>=<value> <key>=<value> ...
Using this instruction, we can store any arbitrary metadata that we find useful on our images. And because the metadata is stored inside the image, unlike tags, it can't be changed. By using appropriate image labels, we can discover the exact revision from our VCS, even if an image has been given an opaque tag, such as latest or stable.
If you want to set these labels dynamically at build time, you can also make use of the ARG instruction in your Dockerfile.
Let's look at an example of using build arg's to set labels. Here is an example Dockerfile:
FROM scratch
ARG SHA
ARG BEAR=Paddington
LABEL git-commit=$GIT_COMMIT \
favorite-bear=$BEAR \
marmalade="5 jars"
When we build the container, we can pass values for our labels using the --build-arg flag. This is useful when we want to pass dynamic values such as a Git commit reference:
docker build --build-arg SHA=`git rev-parse --short HEAD` -t bear .
As with the labels that Kubernetes allows you to attach to the objects in your cluster, you are free to label your images with whatever scheme you choose, and save whatever metadata makes sense for your organization.
The Open Container Initiative (OCI), an organization that promotes standards for container runtimes and their image formats, has proposed a standard set of labels that can be used to provide useful metadata that can then be used by other tools that understand them. If you decide to add labels to your container images, choosing to use part or all of this set of labels might be a good place to start. To know more about these labels, you can head over to our book.
Summary
In this article, we discovered how to push images from our own workstations, how to use IAM permissions to restrict access to our images, and how to allow Kubernetes to pull container images directly from ECR. To know more about how to deploy a production-ready Kubernetes cluster on the AWS platform, and more, head over to our book Kubernetes on AWS.
All Docker versions are now vulnerable to a symlink race attack
GAO recommends for a US version of the GDPR privacy laws
Cloud pricing comparison: AWS vs Azure
Read more