I think I will remember forever a very specific lunch back in 2013 at the local burrito shop, because it is the exact point at which my cloud-native journey began. My colleague, Tim Nee, was making the case that we were not doing cloud correctly. We were treating it like a data center and not taking advantage of its dynamic nature. We were making the classic mistake called lift and shift. We didn't call it that because I don't think that term was in the mainstream yet. We certainly did not use the phrase disposable infrastructure, because it was not in our vernacular yet. But that is absolutely what the conversation was about. And that conversation has forever changed how we think and reason about software systems.
We had handcrafted AMIs and beautiful snowflake EC2 instances that were named, as I recall, after Star Trek characters or something along those lines. These instances ran 24/7 at probably around 10% utilization, which is very typical. We could create new instances somewhat on demand because we had those handcrafted AMIs. But God forbid we terminate one of those instances because there were still lots of manual steps involved in hooking a new instance up to all the other resources, such as load balancers, elastic block storage, the database, and more. Oh, and what would happen to all the data stored on the now terminated instance?
This brings us to two key points. First, disposing of cloud resources is hard, because it takes a great deal of forethought. When we hear about the cloud we hear about how easy it is to create resources, but we don't hear about how easy it is to dispose of resources. We don't hear about it because it is not easy to dispose of resources. Traditional data center applications are designed to run on snowflake machines that are rarely, if ever, retired. They take up permanent residency on those machines and make massive assumptions about what is configured on those machines and what they can store on those machines. If a machine goes away then you basically have to start over from scratch. Sure, bits and pieces are automated, but since disposability is not a first-class requirement, many steps are left to operations staff to perform manually. When we lift and shift these applications into the cloud, all those assumptions and practices (aka baggage) come along with them.
Second, the machine images and the containers that we hear about are just the tips of the iceberg. There are so many more pieces of infrastructure, such as load balancers, databases, DNS, CDN, block storage, blob storage, certificates, virtual private cloud, routing tables, NAT instances, jump hosts, internet gateways, and so on. All of these resources must be created, managed, monitored, understood as dependencies, and, to varying degrees, disposable. Do not assume that you will only need to automate the AMIs and containers.
The bottom line is: if we can create a resource on demand, we should be able to destroy it on demand as well, and then rinse and repeat. This was a new way of thinking. This notion of disposable infrastructure is the fundamental concept that powers cloud-native. Without disposable infrastructure, the promises of speed, safety, and scale cannot even taxi to the runway, much less take flight, so to speak. To capitalize on disposable infrastructure, everything must be automated, every last drop. We will discuss cloud-native automation in Chapter 6, Deployment. But how do disposable infrastructure and automation help deliver on the promise of speed, safety, and scale?
There is no doubt that our first step on our cloud-native journey increased Dante's velocity. Prior to this step, we regularly delivered new functionality to production every 3 weeks. And every 3 weeks it was quite an event. It was not unusual for the largely manual deployment of the whole monolithic system to take upwards of 3 days before everyone was confident that we could switch traffic from the blue environment to the green environment. And it was typically an all-hands event, with pretty much every member of every team getting sucked in to assist with some issue along the way. This was completely unsustainable. We had to automate.
Once we automated the entire deployment process and once the teams settled into a rhythm with the new approach, we could literally complete an entire deployment in under 3 hours with just a few team members performing any unautomated smoke tests before we switched traffic over to the new stack. Having embraced disposable infrastructure and automation, we could deliver new functionality on any given day. We could deliver patches even faster. Now I admit that automating a monolith is a daunting endeavor. It is an all or nothing effort because it is an all or nothing monolith. Fortunately, the divide and conquer nature of cloud-native systems completely changes the dynamic of automation, as we will discuss in Chapter 6, Deployment.
But the benefits of disposable infrastructure encompass more than just speed. We were able to increase our velocity, not just because we had automated everything, but also because automating everything increased the quality of the system. We call it infrastructure as code for a reason. We develop the automation code using the exact same agile methodologies that we use to develop the rest of the system. Every automation code change is driven by a story, all the code is versioned in the same repository, and the code is continuously tested as part of the CI/CD pipeline, as test environments are created and destroyed with every test run.
The infrastructure becomes immutable because there is no longer a need to make manual changes. As a result, we can be confident that the infrastructure conforms to the requirements spelled out in the stories. This, in turn, leads to more secure systems, because we can assert that the infrastructure is in compliance with regulations, such as PCI and HIPAA. Thus, increased quality makes us more confident that we can safely deploy changes while controlling risk to the system as a whole.
Disposable infrastructure facilitates team scale and efficiency. Team members no longer spend a significant amount of time on deployments and fighting deployment-related fires. As a result, teams are more likely to stay on schedule, which increases team morale, which in turn increases the likelihood that teams can increase their velocity and deliver more value. Yet, disposable infrastructure alone does not provide for scalability in terms of system elasticity. It lays the groundwork for scalability and elasticity, but to fully achieve this a system must be architected as a composition of bounded and isolated components. Our soon-to-be legacy system was still a monolith, at this stage in our cloud maturity journey. It had been optimized a bit, here and there, out of necessity, but it was still a monolith and we were only going to get vertical scaleout of it until we broke it apart by strangling the monolith.