What is infrastructure code specifically? It depends highly on your particular infrastructure setup.
In the simplest case, it might be just a bunch of shell scripts and component-specific configuration files (Nginx configuration, cron jobs, and so on) stored in source control. Inside these shell scripts, you specify exact steps computer needs to take to achieve the state you need:
- Copy this file to that folder.
- Replace all occurrences of ADDRESS with mysite.com.
- Restart the Nginx service.
- Send an e-mail about successful deployment.
This is what we call procedural programming. It's not bad. For example, build steps of Continuous Integration tools such as Jenkins that are a perfect fit for a procedural approach—after all, the sequence of command is exactly what you need in this case.
However, you can only go far with shell scripts when it comes to configuring servers and higher-level pieces. The more common and mature approach these days is to use tools that provide a declarative, rather than a procedural, way to define your infrastructure. With declarative definitions, you don't need to think how to do something; you only write what should be there.
Perhaps the main benefit of it is that rerunning a declarative definition will never do the same job twice, whereas executing the same shell script will most likely break something on the second run. The proper configuration management tool will ensure that the server is in the exactly same state as defined in your code. This property of modern configuration and provisioning tools is named idempotency.
Let's look at an example. Let's say that you have a box in your network that hosts a packages repository. For some reason, instead of using DNS server, you want to hardcode the IP address of this box to the /etc/hosts file with the domain name repository.internal.
In Unix-like systems, the /etc/hosts file contains a local text database of DNS records. The system tries to resolve the DNS name by looking at this file first, and asking DNS-server only after.
Not a complex task to do, given that you only need to add a new line to the /etc/hosts file. To achieve this, you could have a script like the following:
echo 192.168.0.5 repository.internal >> /etc/hosts/hosts
Running it once will do the job: required entry will be added to the end of the /etc/hosts file. But what will happen if you execute it again? You guessed right: exactly the same line will be appended again. And, even worse, what if the IP address of the repository box will changes? Then, if you execute your script, you will end up with two different host entries for the same domain name.
You can ensure idempotency yourself inside the script with the high usage of conditional checks. But why reinvent the wheel when there is already a tool to do exactly this job? It would be so much better to just define the end result without composing a sequence of commands to achieve this.
And that is exactly what configuration management tools such as Puppet and Chef do by providing you with a special Domain Specific Language (DSL) to define the desired state of the machine. The certain downside is the necessity to learn a new DSL: a special small language focused on solving one particular task. It's not a complete programming language, neither does it need to be; in this case, its only job is to describe the state of your server.
Let's look at how the same task could be done with the help of a Puppet manifest:
host { 'repository.internal':
ip => '192.168.0.5',
}
Applying this manifest multiple times will never add extra entries, and changing the IP address in the manifest will be reflected correctly in host files, changing the existing entry and not creating a new one.
There is an additional benefit I should mention: on top of idempotency, you often get platform agnosticism. What this means is that the same definition could be used for completely different operating systems without any change. For example, by using the package resource in Puppet, you don't care whether the underlying system uses rpm or deb.
Now you should better understand that, when it comes to configuration management, tools that provide the declarative way of doing things are preferred.
Modern configuration management tools such as Chef or Puppet completely solve the problem of setting up a single machine. There is an increasing number of high-quality libraries (be it cookbooks or modules) for configuring all kinds of software in an (almost) OS-agnostic way. But configuring what goes inside a single server is only part of the picture. The other part, which is located a layer above, also requires new tooling.