As we see the technology market moving toward containers, DevOps engineers have a crucial task – migrating applications running on virtual machines so that they can run on containers. Well, this is in most DevOps engineers’ job descriptions and is one of the most critical things we do.
While, in theory, containerizing an application is as simple as writing a few steps, practically speaking, it can be a complicated beast, especially if you are not using config management to set up your virtual machines. Virtual machines that run on current enterprises these days were created from a lot of manual labor by toiling sysadmins, improving the servers piece by piece, and making it hard to reach out to the paper trail of hotfixes they might have made until now.
Since containers follow config management principles from the very beginning, it is not as simple as picking up the virtual machine image and using a converter to convert it into a Docker container.
Migrating a legacy application running on virtual machines requires numerous steps. Let’s take a look at them in more detail.
Discovery
First, we start with the discovery phase:
- Understand the different parts of your applications
- Assess what parts of the legacy applications you can containerize and whether it is technically possible to do so
- Define a migration scope and agree on the clear goals and benefits of the migration with timelines
Application requirement assessment
Once the discovery phase is complete, we need to do the application requirement assessment:
- Assess if it is a better idea to break the application into smaller parts. If so, then what would the application parts be, and how will they interact with each other?
- Assess what aspects of the architecture, its performance, and its security you need to cater to regarding your application, and think about the container world’s equivalent.
- Understand the relevant risks and decide on mitigation approaches.
- Understand the migration principle and decide on a migration approach, such as what part of the application you should containerize first. Always start with the application with the least amount of external dependencies first.
Container infrastructure design
Container infrastructure design involves creating a robust and scalable environment to support the deployment and management of containerized applications.
Designing a container infrastructure involves considering factors such as scalability, networking, storage, security, automation, and monitoring. It’s crucial to align the infrastructure design with the specific requirements and goals of the containerized applications and to follow best practices for efficient and reliable container deployment and management.
Once we’ve assessed all our requirements, architecture, and other aspects, we can move on to container infrastructure design:
- Understand the current and future scale of operations when you make this decision. You can choose from many options based on your application’s complexity. The right questions include; how many containers do we need to run on the platform? What kind of dependencies do these containers have on each other? How frequently are we going to deploy changes to the components? What is the potential traffic the application can receive? What is the traffic pattern on the application?
- Based on the answers you get to the preceding questions, you need to understand what sort of infrastructure you will run your application on. Will it be on-premises or the cloud, and will you use a managed Kubernetes cluster or self-host and manage one? You can also look at options such as CaaS for lightweight applications.
- How will you monitor and operate your containers? Will it require installing specialist tools? Will it require integrating with the existing monitoring tool stack? Understand the feasibility and make an appropriate design decision.
- How will you secure your containers? Are there any regulatory and compliance requirements regarding security? Does the chosen solution cater to them?
Containerizing the application
Containerizing an application involves packaging the application and its dependencies into a container image, which can be deployed and run consistently across different environments.
Containerizing an application offers benefits such as improved portability, scalability, and reproducibility. It simplifies the deployment process and allows for consistent application behavior across different environments.
Once we’ve considered all aspects of the design, we can now start containerizing the application:
- This is where we look into the application and create a Dockerfile containing the steps to create the container just as it is currently. This requires a lot of brainstorming and assessment, mostly if config management tools don’t build your application by running on a virtual machine such as Ansible. It can take a long time to figure out how the application was installed, and you need to write the exact steps for this.
- If you plan to break your application into smaller parts, you may need to build your application from scratch.
- You must decide on a test suite that works on your parallel virtual machine-based application and improve it over time.
Testing
Testing containerized applications is an important step to ensure their functionality, performance, and compatibility.
By implementing a comprehensive testing strategy, you can ensure the reliability, performance, and security of your containerized application. Testing at various levels, integrating automation, and closely monitoring the application’s behavior will help you identify and resolve issues early in the development life cycle, leading to a more robust and reliable containerized application.
Once we’ve containerized the application, the next step in the process is testing:
- To prove whether your containerized application works exactly like the one in the virtual machine, you need to do extensive testing to prove that you haven’t missed any details or parts you should have considered previously. Run an existing test suite or the one you created for the container.
- Running an existing test suite can be the right approach, but you also need to consider the software’s non-functional aspects. Benchmarking the original application is a good start, and you need to understand the overhead the container solution is putting in. You also need to fine-tune your application so that it fits the performance metrics.
- You also need to consider the importance of security and how you can bring it into the container world. Penetration testing will reveal a lot of security loopholes that you might not be aware of.
Deployment and rollout
Deploying and rolling out a containerized application involves deploying the container images to the target environment and making the application available for use.
Once we’ve tested our containers and are confident enough, we can roll out our application to production:
- Finally, we roll out our application to production and learn from there if further changes are needed. We then return to the discovery process until we have perfected our application.
- You must define and develop an automated runbook and a CI/CD pipeline to reduce cycle time and troubleshoot issues quickly.
- Doing A/B testing with the container applications running in parallel can help you realize any potential issues before you switch all the traffic to the new solution.
The following diagram summarizes these steps, and as you can see, this process is cyclic. This means that you may have to revisit these steps from time to time based on what you learned from the operating containers in production:
Figure 1.8 – Migrating from virtual machines to containers
Now, let’s understand what we need to do to ensure that we migrate from virtual machines to containers with the least friction and also attain the best possible outcome.
What applications should go in containers?
In your journey of moving from virtual machines to containers, you first need to assess what can and can’t go in containers. Broadly speaking, there are two kinds of application workloads you can have – stateless and stateful. While stateless workloads do not store state and are computing powerhouses, such as APIs and functions, stateful applications, such as databases, require persistent storage to function.
Though it is possible to containerize any application that can run on a Linux virtual machine, stateless applications become the first low-hanging fruits you may want to look at. It is relatively easy to containerize these workloads because they don’t have storage dependencies. The more storage dependencies you have, the more complex your application becomes in containers.
Secondly, you also need to assess the form of infrastructure you want to host your applications on. For example, if you plan to run your entire tech stack on Kubernetes, you would like to avoid a heterogeneous environment wherever possible. In that scenario, you may also wish to containerize stateful applications. With web services and the middleware layer, most applications rely on some form of state to function correctly. So, in any case, you would end up managing storage.
Though this might open up Pandora’s box, there is no standard agreement within the industry regarding containerizing databases. While some experts are naysayers for its use in production, a sizeable population sees no issues. The primary reason is insufficient data to support or disprove using a containerized database in production.
I suggest that you proceed with caution regarding databases. While I am not opposed to containerizing databases, you must consider various factors, such as allocating proper memory, CPU, disk, and every dependency you have on virtual machines. Also, it would help if you looked into the behavioral aspects of the team. If you have a team of DBAs managing the database within production, they might not be very comfortable dealing with another layer of complexity – containers.
We can summarize these high-level assessment steps using the following flowchart:
Figure 1.9 – Virtual machine to container migration assessment
This flowchart accounts for the most common factors that are considered during the assessment. You also need to factor in situations that are unique to your organization. So, it is a good idea to take those into account as well before making any decisions.
Let’s look at some use cases that are suitable for containerization to get a fair understanding. The following types of applications are commonly deployed using containers:
- Microservices architecture: Applications that follow a microservices architecture, where the functionality is divided into small, independent services, are well-suited for containerization. Each microservice can be packaged as a separate container, enabling easier development, deployment, scaling, and management of the individual services.
- Web applications: Web applications, including frontend applications, backend APIs, and web services, can be containerized. Containers provide a consistent runtime environment, making it easier to package and deploy web applications across different environments, such as development, testing, and production.
- Stateful applications: Containers can also be used to run stateful applications that require persistent data storage. By leveraging container orchestration platforms’ features, such as persistent volumes or stateful sets, stateful applications such as databases, content management systems, or file servers can be containerized and managed effectively.
- Batch processing or scheduled jobs: Applications that perform batch processing tasks or scheduled jobs, such as data processing, periodic backups, or report generation, can benefit from containerization. Containers provide a controlled and isolated environment for running these jobs, ensuring consistent execution and reproducibility.
- CI/CD tools: Containerizing CI/CD tools such as Jenkins, GitLab CI/CD, or CircleCI allows for consistent and reproducible build, test, and deployment pipelines. Containers make it easier to manage dependencies, isolate build environments, and enable rapid deployment of CI/CD infrastructure.
- Development and testing environments: Containers are valuable for creating isolated and reproducible development and testing environments. Developers can use containers to package their applications along with the required dependencies, libraries, and development tools. This enables consistent development and testing experiences across different machines and team members.
- Internet of Things (IoT) applications: Containers can be used to deploy and manage applications in IoT scenarios. They provide lightweight and portable runtime environments for IoT applications, enabling easy deployment across edge devices, gateways, or cloud infrastructures.
- Machine learning and data analytics applications: Containerization is increasingly used to deploy machine learning models and data science applications. Containers encapsulate the necessary dependencies, libraries, and runtime environments, allowing for seamless deployment and scaling of data-intensive applications.
It’s important to note that not all applications are ideal candidates for containerization. Applications with heavy graphical interfaces, legacy monolithic architectures tightly coupled to the underlying infrastructure, or applications that require direct hardware access may not be suitable for containerization. Virtual machines or other deployment approaches may be more appropriate in such cases.
Breaking the applications into smaller pieces
You get the most out of containers if you run parts of your application independently of others.
This approach has numerous benefits, as follows:
- You can release your application more often as you can now change a part of your application without this impacting something else; your deployments will also take less time to run.
- Your application parts can scale independently of each other. For example, if you have a shopping app and your orders module is jam-packed, it can scale more than the reviews module, which may be far less busy. With a monolith, your entire application would scale with traffic, and this would not be the most optimized approach from a resource consumption point of view.
- Something that impacts one part of the application does not compromise your entire system. For example, customers can still add items to their cart and check out orders if the reviews module is down.
However, you should also not break your application into tiny components. This will result in considerable management overhead as you will not be able to distinguish between what is what. In terms of the shopping website example, it is OK to have an order container, a reviews container, a shopping cart container, and a catalog container. However, it is not OK to have create order, delete order, and update order containers. That would be overkill. Breaking your application into logical components that fit your business is the right way.
But should you break your application into smaller parts as the very first step? Well, it depends. Most people will want to get a return on investment (ROI) out of their containerization work. Suppose you do a lift and shift from virtual machines to containers, even though you are dealing with very few variables, and you can go into containers quickly. In that case, you don’t get any benefits out of it – especially if your application is a massive monolith. Instead, you would add some application overhead because of the container layer. So, rearchitecting your application to fit in the container landscape is the key to going ahead.