Before we look at some of the challenges you may be facing, I quickly wanted to take you through my journey with Infrastructure as Code before it was really what we now know as Infrastructure as Code.
When I talk about Infrastructure as Code, I mean the following:
Infrastructure as Code is an approach to infrastructure management where it is provisioned and managed using code and automation tools rather than manually configuring resources through a user interface.
This allows you to version control, track, and manage your infrastructure in the same way you do with application code and, in many cases, use the same tooling, processes, and procedures you already have in place.
Infrastructure as Code can help improve your infrastructure operations’ efficiency, reliability, and reproducibility by introducing consistency across your deployments and reducing deployment times versus more traditional manual deployments.
My own journey
I have been working with servers of all types for longer than I care to remember; back when I first started working with servers, it was all very much a manual process to do pretty much anything.
The bare-metal days
Before virtualization became a common practice, I remember having to block out a whole day to build a customer’s server. This process would generally start with ensuring that the hardware I was given to work with was of the correct specification – if for some reason it wasn’t, which was quite common, then I would typically have to replace RAM and hard drives, and so on.
This was to ensure that I didn’t get too far into configuring the server, only to find that I had to tear it down and start from scratch; once the hardware was confirmed as being correct, it was time to start on the build itself.
Typically, to build the server, I sat in a tiny, hot, and noisy build room surrounded by equipment, bits of computer, and what felt like reams of paper, which contained not only instructions on how to manually install the various operating systems we supported but also build sheets containing configuration and information on the customer’s required software stack I was deploying.
Once built, the server was packed back into its box, put in the back of someone’s car, and taken to a data center. From there the server was racked and cabled for both power and networking and then powered the server on – if everything was configured correctly, it would spring into life and be available on the network.
After some quick testing, it was back to the comfort of the office to complete the build steps and, finally, hand the server over to the customer for them to deploy their software on.
While this process was fine when there was one or two of these deployments, once in a blue moon, as things got busier, it quickly became unmanageable.
The next logical step was to have a build server that contained drive images for all the supported operating systems and base software stack configurations, with some custom scripts that ran when the server first booted to customize the base configuration and get it onto the network when the server was racked in the data center.
Enter virtualization
Once we started to move from provisioning bare metal servers for customers to virtualized servers, things got a lot easier – for a start, as you didn’t have to physically connect RAM, CPUs, or hard drives to the servers, assuming the cluster you were building the server in had the resource available, it made quite a dramatic change to the deployment time and also resulted in less time in the build room and data center.
At this point, we had built up a collection of custom scripts that connected to both the virtualization hypervisors and virtual machines – these were shared between the team members in our subversion repository and documented in our internal wiki.
This was my first, extremely basic by today’s standards, introduction to Infrastructure as Code.
Virtual machine configuration
The next logical steps were to add a remote configuration into the mix by using a tool, such as Puppet or Chef; we could deploy our virtual machines using our custom scripts and then have the servers call back to our main management server, and then bootstrap itself as per the customer’s desired configuration state.
Putting it all together
This was the final piece of the puzzle, which took our deployments from taking a few days per server to an hour or so, with the bulk of that time waiting for automated tasks to complete – though, as a lot of the initial stages of the deployments were initiated by our in-house DIY scripts, we still had to keep a careful eye on the progress.
This was because there wasn’t much logic built in to handle errors or other unexpected hiccups during the deployment, which, in some cases, resulted in some challenging post-deployment problems – but the least said about those, the better.
Today’s challenges
So, how do today’s challenges differ from my own experiences? During my day job, I get to work with a lot of internal and external teams who are nearly all technical and are very hands-on with the day-to-day management and development of their own applications.
It’s all documented
When discussing Infrastructure as Code with teams, one of the most common answers I get is as follows:
“We have the process to deploy our infrastructure documented, and anyone in the team can work through it to quickly deploy a resource.”
While it is great that there is documentation and that it is accessible by all of the members of the team, you would be surprised, even with the presence of comprehensive and easy-to-follow documentation, at just how much variance there is when people come to actually implement it. They don’t fully understand it because it is simply a set of tasks that lack any context as to why the tasks are being actioned.
Another potential issue is that the process is followed so often by a member of the team that they simply just get on with it, missing any updates or steps that have been added to the documentation.
Worst still – and this is more common than you may think – giving three technical people the same set of tasks to do can sometimes result in three very different outputs, as everyone has different experiences, which normally feeds into how we do things – for example, last time I tried to A, B, and C, X happened, so now I do it C, B, and A or I think it would better to do it B, A, and then C – but don’t have time at the moment to update the documentation.
All of these can introduce inconsistencies in your deployments, which may go unnoticed as everyone thinks they are doing it correctly because they are all following the same set of documentation.
Next, next, next
The next (pun very much intended) answer I normally get is this:
“We don’t need to do it very often, and when we do, it’s just clicking ‘next, next, next’ in an interface – anyone can do it.”
When I get this answer, what I actually hear is, The process to deploy the resource is so easy that we didn’t bother to document it. While this might be the case for some members of the team, not everyone may have the experience or confidence to simply click next, next, next to deploy and configure the resources without a guide.
As I am sure you can imagine, if it is possible for inconsistencies to be present when everyone is following the same set of documentation, then doing the deployment without any of the guardrails that the documentation puts in place is going to introduce even more potential issues further down the line.
Just because a resource has been deployed without error and works does not mean that it has been deployed securely and in such a way that could support your production workloads.
We have everything we need
The final most common answer when discussing Infrastructure as Code is as follows:
“We have deployed everything we need and don’t need any further resources.”
Again, when I get this answer, I normally hear something slightly different – in my experience, this normally means that, a while ago, someone deployed something that is more than capable of the task and has now moved on, either going on to another project or department or has left the company altogether.
While it is great that the resources are running fine, this approach can cause issues if you ever need to redeploy or, worse still, firefight an issue with production, as a lot of knowledge of the underlying configuration is missing.
So, while you know what’s there, you may not necessarily know why.
Conclusion
There are many more examples, but the previous ones are the most common ones I see when working with teams who may not have considered Infrastructure as Code to be a foundation of their deployment pipelines, and if you are reading this, then you may have already come across some of the examples and want to move onto the next step.
So why would you take an Infrastructure-as-Code approach to your deployments? Well, there are several reasons, which include the following:
- Documentation: While we have already mentioned documentation, it’s important to note that if you employ Infrastructure as Code, your deployment is documented as part of your code as it defines the desired state of your infrastructure in a human-readable format.
- Repeatable and consistent: You should be able to pick up your code and deploy it repeatedly – sure, you may make some changes to things such as resource SKUs and names, but that should just be a case of updating some variables that are read at the time of execution rather than rewriting your entire code base.
- Time-saving: As I mentioned, in my own experience, it sometimes took days to deploy resources – eventually, that got down to hours and, with more modern cloud-based resources, minutes.
- Secure: Because you have your infrastructure defined in code, you know that you will have a well-documented end-to-end configuration ready to review as needed. Because it is easily deployable, you can quickly spin up an environment to review or deploy your latest fixes into, safe in the knowledge that it is consistent with your production configuration, as you are not relying on someone manually working through a step-by-step document where something may get missed or misinterpreted.
- Cost savings: I think you should never approach an Infrastructure-as-Code deployment with cost savings being at the top of the list of things you would like to achieve – but it is a most welcome nice-to-have. Depending on your approach, cost savings can be a byproduct of the preceding points. For example, do you need to run your development or testing infrastructure 24/7 when your developers may only need it for a few days a week at most?
Well, that infrastructure can be deployed as part of your build pipeline with next to no or little effort. In that case, you may find yourself in the enviable position of only paying for the resources when you need them rather than paying for them to be available 24/7.
So, now that we have discussed my personal journey with Infrastructure as Code and also gotten an idea of the different scenarios where Infrastructure as Code may come in useful and the potential reasons why you would want to incorporate it into your day-to-day workflows, let’s now discuss some of the basic concepts you need to know about before we start to talk about the tools we are going to look at for the remainder of the book.