In this article by Veselin Kantsev, the author of the book Implementing DevOps on AWS. Ladies and gentlemen, put your hands in the atmosphere for Programmable Infrastructure is here!
Perhaps Infrastructure-as-Code(IaC) is not an entirely new concept considering how long Configuration Management has been around. Codifying server, storage and networking infrastructure and their relationships however is a relatively recent tendency brought about by the rise of cloud computing. But let us leave Config Management for later and focus our attention on that second aspect of IaC.
You would recall from the previous chapter some of the benefits of storing all-the-things as code:
(For more resources related to this topic, see here.)
That last point was a big win for me personally. Automated provisioning helped reduce the time it took to deploy a full-featured cloud environment from four hours down to one and the occurrences of human error to almost zero (one shall not be trusted with an input field).
Being able to rapidly provision resources becomes a significant advantage when a team starts using multiple environments in parallel and needs those brought up or down on-demand. In this article we examine in detail how to describe (in code) and deploy one such environment on AWS with minimal manual interaction.
For implementing IaC in the Cloud, we will look at two tools or services: Terraform and CloudFormation.
We will go through examples on how to:
For the purpose of these examples, let us assume our application requires a Virtual Private Cloude(VPC) which hosts a Relational Database Services(RDS) back-end and a couple of Elastic Compute Cloud (EC2) instances behind an Elastic Load Balancing (ELB). We will keep most components behind Network Address Translation (NAT), allowing only the load-balancer to be accessed externally.
One of the tools that can help deploy infrastructure on AWS is HashiCorp's Terraform (https://www.terraform.io). HashiCorp is that genius bunch which gave us Vagrant, Packer and Consul. I would recommend you look up their website if you have not already.
Using Terraform (TF), we will be able to write a template describing an environment, perform a dry run to see what is about to happen and whether it is expected, deploy the template and make any late adjustments where necessary—all of this without leaving the shell prompt.
Firstly, you will need to have a copy of TF (https://www.terraform.io/downloads.html) on your machine and available on the CLI. You should be able to query the currently installed version, which in my case is 0.6.15:
$ terraform --version
Terraform v0.6.15
Since TF makes use of the AWS APIs, it requires a set of authentication keys and some level of access to your AWS account. In order to deploy the examples in this article you could create a new Identity and Access Management (IAM) user with the following permissions:
"autoscaling:CreateAutoScalingGroup",
"autoscaling:CreateLaunchConfiguration",
"autoscaling:DeleteLaunchConfiguration",
"autoscaling:Describe*",
"autoscaling:UpdateAutoScalingGroup",
"ec2:AllocateAddress",
"ec2:AssociateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:Describe*",
"ec2:ModifySubnetAttribute",
"ec2:RevokeSecurityGroupEgress",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
"elasticloadbalancing:AttachLoadBalancerToSubnets",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:CreateLoadBalancerListeners",
"elasticloadbalancing:Describe*",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"rds:CreateDBInstance",
"rds:CreateDBSubnetGroup",
"rds:Describe*"
Please refer:
${GIT_URL}/Examples/Chapter-2/Terraform/iam_user_policy.json
One way to make the credentials of the IAM user available to TF is by exporting the following environment variables:
$ export AWS_ACCESS_KEY_ID='user_access_key'
$ export AWS_SECRET_ACCESS_KEY='user_secret_access_key'
This should be sufficient to get us started.
Before we get to coding, here are some of the rules:
A Terraform template generally consists of three sections: resources, variables and outputs. As mentioned in the preceding section, it is a matter of personal preference how you arrange these, however, for better readability I suggest we make use of the TF format and write each section to a separate file. Also, while the file extensions are of importance, the file names are up to you.
In a way, this file holds the main part of a template, as the resources represent the actual components that end up being provisioned. For example, we will be using a VPC resource, RDS, an ELB one and a few others.
Since template elements can be written in any order, Terraform determines the flow of execution by examining any references that it finds (for example a VPC should exist before an ELB which is said to belong to it is created). Alternatively, explicit flow control attributes such as the depends_on are used, as we will observe shortly.
To find out more, let us go through the contents of the resources.tf file.
Please refer to:
${GIT_URL}/Examples/Chapter-2/Terraform/resources.tf
First we tell Terraform what provider to use for our infrastructure:
# Set a Provider
provider "aws"
{
region = "${var.aws-region}"
}
You will notice that no credentials are specified, since we set those as environment variables earlier.
Now we can add the VPC and its networking components:
# Create a VPC
resource "aws_vpc""terraform-vpc"
{
cidr_block = "${var.vpc-cidr}"
tags
{
Name = "${var.vpc-name}"
}
}
# Create an Internet Gateway
resource "aws_internet_gateway""terraform-igw"
{
vpc_id = "${aws_vpc.terraform-vpc.id}"
}
# Create NAT
resource "aws_eip""nat-eip"
{
vpc = true
}
So far we have declared the VPC, its Internet and NAT gateways plus a set of public and private subnets with matching routing tables.
It will help clarify the syntax if we examined some of those resource blocks, line by line:
resource "aws_subnet""public-1" {
The first argument is the type of the resource followed by an arbitrary name.
vpc_id = "${aws_vpc.terraform-vpc.id}"
The aws_subnet resource named public-1 has a property vpc_id which refers to the id attribute of a different resource of type aws_vpc named terraform-vpc. Such references to other resources implicitly define the execution flow, that is to say the VPC needs to exist before the subnet can be created.
cidr_block = "${cidrsubnet(var.vpc-cidr, 8, 1)}"
We will talk more about variables in a moment, but the format is var.var_name.
Here we use the cidrsubnet function with the vpc-cidr variable which returns a cidr_block to be assigned to the public-1 subnet. Please refer to the Terraform documentation for this and other useful functions.
Next we add a RDS to the VPC:
resource "aws_db_instance""terraform" {
identifier = "${var.rds-identifier}"
allocated_storage = "${var.rds-storage-size}"
storage_type= "${var.rds-storage-type}"
engine = "${var.rds-engine}"
engine_version = "${var.rds-engine-version}"
instance_class = "${var.rds-instance-class}"
username = "${var.rds-username}"
password = "${var.rds-password}"
port = "${var.rds-port}"
vpc_security_group_ids = ["${aws_security_group.terraform-rds.id}"]
db_subnet_group_name = "${aws_db_subnet_group.rds.id}"
}
Here we see mostly references to variables with a few calls to other resources.
Following the RDS is an ELB:
resource "aws_elb""terraform-elb"
{
name = "terraform-elb"
security_groups = ["${aws_security_group.terraform-elb.id}"]
subnets = ["${aws_subnet.public-1.id}", "${aws_subnet.public-2.id}"]
listener
{
instance_port = 80
instance_protocol = "http"
lb_port = 80
lb_protocol = "http"
}
tags
{
Name = "terraform-elb"
}
}
Lastly we define the EC2 auto scaling group and related resources:
resource "aws_launch_configuration""terraform-lcfg" {
image_id = "${var.autoscaling-group-image-id}"
instance_type = "${var.autoscaling-group-instance-type}"
key_name = "${var.autoscaling-group-key-name}"
security_groups = ["${aws_security_group.terraform-ec2.id}"]
user_data = "#!/bin/bash n set -euf -o pipefail n exec 1>>(logger -s -t $(basename $0)) 2>&1 n yum -y install nginx; chkconfig nginx on; service nginx start"
lifecycle {
create_before_destroy = true
}
}
resource "aws_autoscaling_group""terraform-asg" {
name = "terraform"
launch_configuration = "${aws_launch_configuration.terraform-lcfg.id}"
vpc_zone_identifier = ["${aws_subnet.private-1.id}", "${aws_subnet.private-2.id}"]
min_size = "${var.autoscaling-group-minsize}"
max_size = "${var.autoscaling-group-maxsize}"
load_balancers = ["${aws_elb.terraform-elb.name}"]
depends_on = ["aws_db_instance.terraform"]
tag {
key = "Name"
value = "terraform"
propagate_at_launch = true
}
}
The user_data shell script above will install and start NGINX onto the EC2 node(s).
We have made great use of variables to define our resources, making the template as re-usable as possible. Let us now look inside variables.tf to study these further.
Similarly to the resources list, we start with the VPC :
Please refer to:
${GIT_URL}/Examples/Chapter-2/Terraform/variables.tf
variable "aws-region" {
type = "string"
description = "AWS region"
}
variable "aws-availability-zones" {
type = "string"
description = "AWS zones"
}
variable "vpc-cidr" {
type = "string"
description = "VPC CIDR"
}
variable "vpc-name" {
type = "string"
description = "VPC name"
}
The syntax is:
variable "variable_name" {
variable properties
}
Where variable_name is arbitrary, but needs to match relevant var.var_name references made in other parts of the template. For example, variable aws-region will satisfy the ${var.aws-region} reference we made earlier when describing the region of the provider aws resource.
We will mostly use string variables, however there is another useful type called map which can hold lookup tables. Maps are queried in a similar way to looking up values in a hash/dict (Please see: https://www.terraform.io/docs/configuration/variables.html).
Next comes RDS:
variable "rds-identifier" {
type = "string"
description = "RDS instance identifier"
}
variable "rds-storage-size" {
type = "string"
description = "Storage size in GB"
}
variable "rds-storage-type" {
type = "string"
description = "Storage type"
}
variable "rds-engine" {
type = "string"
description = "RDS type"
}
variable "rds-engine-version" {
type = "string"
description = "RDS version"
}
variable "rds-instance-class" {
type = "string"
description = "RDS instance class"
}
variable "rds-username" {
type = "string"
description = "RDS username"
}
variable "rds-password" {
type = "string"
description = "RDS password"
}
variable "rds-port" {
type = "string"
description = "RDS port number"
}
Finally, EC2:
variable "autoscaling-group-minsize" {
type = "string"
description = "Min size of the ASG"
}
variable "autoscaling-group-maxsize" {
type = "string"
description = "Max size of the ASG"
}
variable "autoscaling-group-image-id" {
type="string"
description = "EC2 AMI identifier"
}
variable "autoscaling-group-instance-type" {
type = "string"
description = "EC2 instance type"
}
variable "autoscaling-group-key-name" {
type = "string"
description = "EC2 ssh key name"
}
We now have the type and description of all our variables defined in variables.tf, however no values have been assigned to them yet.
Terraform is quite flexible with how this can be done. We could:
Using a key=value pairs file proves to be quite convenient within teams, as each engineer can have a private copy (excluded from revision control). If the file is named terraform.tfvars it will be read automatically by Terraform, alternatively -var-file can be used on the command line to specify a different source.
Below is the content of our sample terraform.tfvars file:
Please refer to:
${GIT_URL}/Examples/Chapter-2/Terraform/terraform.tfvars
autoscaling-group-image-id = "ami-08111162"
autoscaling-group-instance-type = "t2.nano"
autoscaling-group-key-name = "terraform"
autoscaling-group-maxsize = "1"
autoscaling-group-minsize = "1"
aws-availability-zones = "us-east-1b,us-east-1c"
aws-region = "us-east-1"
rds-engine = "postgres"
rds-engine-version = "9.5.2"
rds-identifier = "terraform-rds"
rds-instance-class = "db.t2.micro"
rds-port = "5432"
rds-storage-size = "5"
rds-storage-type = "gp2"
rds-username = "dbroot"
rds-password = "donotusethispassword"
vpc-cidr = "10.0.0.0/16"
vpc-name = "Terraform"
A point of interest is aws-availability-zones, it holds multiple values which we interact with using the element and split functions as seen in resources.tf.
The third, mostly informational part of our template contains the Terraform Outputs. These allow for selected values to be returned to the user when testing, deploying or after a template has been deployed. The concept is similar to how echo statements are commonly used in shell scripts to display useful information during execution.
Let us add outputs to our template by creating an outputs.tf file:
Please refer to:
${GIT_URL}/Examples/Chapter-2/Terraform/outputs.tf
output "VPC ID" {
value = "${aws_vpc.terraform-vpc.id}"
}
output "NAT EIP" {
value = "${aws_nat_gateway.terraform-nat.public_ip}"
}
output "ELB URI" {
value = "${aws_elb.terraform-elb.dns_name}"
}
output "RDS Endpoint" {
value = "${aws_db_instance.terraform.endpoint}"
}
To configure an output you simply reference a given resource and its attribute. As shown in preceding code, we have chosen the ID of the VPC, the Elastic IP address of the NAT gateway, the DNS name of the ELB and the Endpoint address of the RDS instance.
The Outputs section completes the template in this example. You should now have four files in your template folder: resources.tf, variables.tf, terraform.tfvars and outputs.tf.
We shall examine five main Terraform operations:
In the following command line examples, terraform is run within the folder which contains the template files.
Before going any further, a basic syntax check should be done with the terraform validate command. After renaming one of the variables in resources.tf, validate returns an unknown variable error:
$ terraform validate
Error validating: 1 error(s) occurred:
* provider config 'aws': unknown variable referenced: 'aws-region-1'. define it with 'variable' blocks
Once the variable name has been corrected, re-running validate returns no output, meaning OK.
The next step is to perform a test/dry-run execution with terraform plan, which displays what would happen during an actual deployment. The command returns a colour coded list of resources and their properties or more precisely:
$ terraform plan
Resources are shown in alphabetical order for quick scanning. Green resources will be created (or destroyed and then created if an existing resource exists), yellow resources are being changed in-place, and red resources will be destroyed.
To literally get the picture of what the to-be-deployed infrastructure looks like, you could use terraform graph:
$ terraform graph > my_graph.dot
DOT files can be manipulated with the Graphviz open source software (Please see : http://www.graphviz.org) or many online readers/converters. Below is a portion of a larger graph representing the template we designed earlier:
If you are happy with the plan and graph, the template can now be deployed using terraform apply:
$ terraform apply
aws_eip.nat-eip: Creating...
allocation_id: "" =>"<computed>"
association_id: "" =>"<computed>"
domain: "" =>"<computed>"
instance: "" =>"<computed>"
network_interface: "" =>"<computed>"
private_ip: "" =>"<computed>"
public_ip: "" =>"<computed>"
vpc: "" =>"1"
aws_vpc.terraform-vpc: Creating...
cidr_block: "" =>"10.0.0.0/16"
default_network_acl_id: "" =>"<computed>"
default_security_group_id: "" =>"<computed>"
dhcp_options_id: "" =>"<computed>"
enable_classiclink: "" =>"<computed>"
enable_dns_hostnames: "" =>"<computed>"
Apply complete! Resources: 22 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the following path. This state is required to modify and destroy your infrastructure, so keep it safe. To inspect the complete state use the terraform show command.
State path: terraform.tfstate
Outputs:
ELB URI = terraform-elb-xxxxxx.us-east-1.elb.amazonaws.com
NAT EIP = x.x.x.x
RDS Endpoint = terraform-rds.xxxxxx.us-east-1.rds.amazonaws.com:5432
VPC ID = vpc-xxxxxx
At the end of a successful deployment, you will notice the Outputs we configured earlier and a message about another important part of Terraform – the state file.(Please refer to: https://www.terraform.io/docs/state/):
Terraform stores the state of your managed infrastructure from the last time Terraform was run. By default this state is stored in a local file named terraform.tfstate, but it can also be stored remotely, which works better in a team environment.
Terraform uses this local state to create plans and make changes to your infrastructure. Prior to any operation, Terraform does a refresh to update the state with the real infrastructure.
In a sense, the state file contains a snapshot of your infrastructure and is used to calculate any changes when a template has been modified. Normally you would keep the terraform.tfstate file under version control alongside your templates. In a team environment however, if you encounter too many merge conflicts you can switch to storing the state file(s) in an alternative location such as S3 (Please see: https://www.terraform.io/docs/state/remote/index.html).
Allow a few minutes for the EC2 node to fully initialize then try loading the ELB URI from the preceding Outputs in your browser. You should be greeted by NGINX as shown in the following screenshot:
As per Murphy 's Law, as soon as we deploy a template, a change to it will become necessary. Fortunately, all that is needed for this is to update and re-deploy the given template.
Let us say we need to add a new rule to the ELB security group (shown in bold below).
resource "aws_security_group""terraform-elb" {
name = "terraform-elb"
description = "ELB security group"
vpc_id = "${aws_vpc.terraform-vpc.id}"
ingress {
from_port = "80"
to_port = "80"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = "443"
to_port = "443"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
$ terraform plan
...
~ aws_security_group.terraform-elb
ingress.#: "1" =>"2"
ingress.2214680975.cidr_blocks.#: "1" =>"1"
ingress.2214680975.cidr_blocks.0: "0.0.0.0/0" =>"0.0.0.0/0"
ingress.2214680975.from_port: "80" =>"80"
ingress.2214680975.protocol: "tcp" =>"tcp"
ingress.2214680975.security_groups.#: "0" =>"0"
ingress.2214680975.self: "0" =>"0"
ingress.2214680975.to_port: "80" =>"80"
ingress.2617001939.cidr_blocks.#: "0" =>"1"
ingress.2617001939.cidr_blocks.0: "" =>"0.0.0.0/0"
ingress.2617001939.from_port: "" =>"443"
ingress.2617001939.protocol: "" =>"tcp"
ingress.2617001939.security_groups.#: "0" =>"0"
ingress.2617001939.self: "" =>"0"
ingress.2617001939.to_port: "" =>"443"
Plan: 0 to add, 1 to change, 0 to destroy.
$ terraform apply
...
aws_security_group.terraform-elb: Modifying...
ingress.#: "1" =>"2"
ingress.2214680975.cidr_blocks.#: "1" =>"1"
ingress.2214680975.cidr_blocks.0: "0.0.0.0/0" =>"0.0.0.0/0"
ingress.2214680975.from_port: "80" =>"80"
ingress.2214680975.protocol: "tcp" =>"tcp"
ingress.2214680975.security_groups.#: "0" =>"0"
ingress.2214680975.self: "0" =>"0"
ingress.2214680975.to_port: "80" =>"80"
ingress.2617001939.cidr_blocks.#: "0" =>"1"
ingress.2617001939.cidr_blocks.0: "" =>"0.0.0.0/0"
ingress.2617001939.from_port: "" =>"443"
ingress.2617001939.protocol: "" =>"tcp"
ingress.2617001939.security_groups.#: "0" =>"0"
ingress.2617001939.self: "" =>"0"
ingress.2617001939.to_port: "" =>"443"
aws_security_group.terraform-elb: Modifications complete
...
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
Some update operations can be destructive (Please refer: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html.You should always check the CloudFormation documentation on the resource you are planning to modify to see whether a change is going to cause any interruption. Terraform provides some protection via the prevent_destroy lifecycle property (Please refer: https://www.terraform.io/docs/configuration/resources.html#prevent_destroy).
This is a friendly reminder to always remove AWS resources after you are done experimenting with them to avoid any unexpected charges.
Before performing any delete operations, we will need to grant such privileges to the (terraform) IAM user we created in the beginning of this article. As a shortcut, you could temporarily attach the Aministrato rAccess managed policy to the user via the AWS Console as shown in the following figure:
To remove the VPC and all associated resources that we created as part of this example, we will use terraform destroy:
$ terraform destroy
Do you really want to destroy?
Terraform will delete all your managed infrastructure.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
Terraform asks for a confirmation then proceeds to destroy resources, ending with:
Apply complete! Resources: 0 added, 0 changed, 22 destroyed.
Next, we remove the temporary admin access we granted to the IAM user by detaching the Aministrator Access managed policy as shown in the following screenshot:
Then verify that the VPC is no longer visible in the AWS Console.
In this aticlewe looked at the importance and usefulness of Infrastructure as Code and ways to implement it using Terraform or AWS CloudFormation.
We examined the structure and individual components of both a Terraform and a CF template then practiced deploying those onto AWS using the CLI.I trust that the examples we went through have demonstrated the benefits and immediate gains from the practice of deploying infrastructure as code.
So far however, we have only done half the job. With the provisioning stage completed, you would naturally want to start configuring your infrastructure.
Further resources on this subject: