Declarative vs Procedural tools for Infrastructure as Code
What is infrastructure code specifically? It highly depends on your particular infrastructure setup.
In the simplest case, it might be just a bunch of shell scripts and component-specific configuration files (Nginx configuration, cron jobs, and so on) stored in source control. Inside these shell scripts, you specify exact steps computer needs to take to achieve the state you need:
- Copy this file to that folder.
- Replace all occurrences of ADDRESS with mysite.com.
- Restart the
Nginx
service. - Send an e-mail about successful deployment.
This is what we call procedural programming. It's not bad. For example, build steps of Continuous Integration tools such as Jenkins that are a perfect fit for a procedural approach—after all the sequence of command is exactly what you need in this case.Â
However, you can only go that far with shell scripts when it comes to configuring servers and higher level pieces. The more common and mature approach these days is to use tools that provide a declarative, rather than a procedural way to define your infrastructure. With declarative definitions, you don't need to think how to do something; you only write what should be there.
Perhaps the main benefit of it is that rerunning a declarative definition will never do the same job twice, whereas executing the same shell script will most likely break something on the second run. Proper configuration management tool will ensure that the server will be in the exactly same state as defined in your code. This property of modern configuration and provisioning tools is named idempotency.
Let's look at an example. Let's say that you have a box in your network that hosts packages repository. For some reason, instead of using DNS server, you want to hardcode the IP address of this box to the  /etc/hosts
file with a domain name repository.internal
.
Note
In Unix-like systems, the  /etc/hosts
file contains a local text database of DNS records. The system tries to resolve DNS name by looking at this file first, and only asking DNS-server only after.
Not a complex task to do, given that you only need to add a new line to the  /etc/hosts
file. To achieve this, you could have a script like the following:
echo 192.168.0.5 repository.internal >> /etc/hosts/hosts
Running it once will do the job: required entry will be added to the end of the /etc/hosts
file. But what will happen if you execute it again? You guessed it right: exactly the same line will be appended again. And even worse, what if the IP address of repository box will change? Then, if you execute your script, you will end up with two different host entries for the same domain name.
You can ensure idempotency yourself inside the script, with the high usage of conditional checks. But why reinvent the wheel when there is already a tool to do exactly this job? It would be so much better to just define the end result, without composing sequence of commands to achieve this.
And that is exactly what configuration management tools such as Puppet and Chef do by providing you a special Domain Specific Language (DSL) for defining the desired state of the machine. The certain downside is the necessity to learn a new DSL: a special small language focused on solving one particular task. It's not a complete programming language, neither does it to be; in this case, its only job is to describe the state of your server.
Let's look at how the same task could be done with the help of a Puppet manifest:
host { 'repository.internal': ip => '192.168.0.5', }
Applying this manifest multiple times will never add extra entries, and changing the IP address in the manifest will be reflected correctly in host files changing the existing entry, and not creating a new one.
Note
There is an additional benefit I should mention: on top of idempotency, you often get platform agnosticism. What it means is that the same definition could be used for completely different operating systems without any change. For example, by using package resource in Puppet, you don't care whether the underlying system uses rpm
or deb
.
Now you should better understand that when it comes to configuration management tools that provide the declarative way of doing things are preferred.
Modern configuration management tools such as Chef or Puppet completely solved the problem of setting up a single machine. There is an increasing number of high-quality libraries (be it cookbooks or modules) for configuring all kinds of software in an (almost) OS-agnostic way. But configuring what goes inside single server is only part of the picture. The other part that is located a layer above also requires a new tooling.