Terraform: Up & Running, 3rd edition is now published!

Part 2 of a blog post series that covers the top 10 problems that have been fixed in Terraform since the 2nd edition

Yevgeniy Brikman
Gruntwork
Published in
12 min readSep 28, 2022

--

I’m excited to announce that the 3rd edition of Terraform: Up & Running has been published! It includes over 100 pages of new content, including two totally new chapters, one on managing secrets with Terraform and one on working with multiple providers, plus major changes to all the existing chapters, updating the book from Terraform 0.12 all the way to Terraform 1.2. To give you a preview of all this new content, I’ve created a two-part blog post series where I do quick walkthroughs of the top 10 problems that have been fixed in Terraform over the last few years.

In the first part of the blog post series, I walked through the first 5 of these problems: how to use multiple providers (multiple regions, accounts, clouds, including Kubernetes), how to manage Terraform provider versions (lock files, required_providers), how to securely manage secrets with Terraform (KMS, Vault), how to set up a secure CI / CD pipeline for Terraform (OIDC, isolated workers), and how to iterate over modules (count, for_each). In this second part of the blog post series, I’ll go over 5 more problems and their solutions, based on snippets from the 3rd edition of the book:

  1. Validation. How to use validation, precondition, and postcondition blocks to perform checks before and after deployment.
  2. Refactoring. How to use moved blocks to safely refactor your Terraform code without having to do state surgery manually.
  3. Static analysis: How to perform other types of automated testing on Terraform code, such as static analysis.
  4. Policy enforcement: How to enforce company policies and compliance requirements using tools such as Terratest, OPA, and Sentinel.
  5. Maturity. How Terraform has become more stable due to the Terraform 1.0 release, the growth of the community, and the HashiCorp IPO.

Input validation

The problem

When you build a module in Terraform, you can allow users to configure that module using input variables. But how do you enforce that the inputs users pass in are valid? For example, you might have a module that deploys EC2 instances in AWS and you allow users to specify the type of instance to deploy using an input variable called ec2_type:

variable "ec2_type" {
description = "The type of EC2 instance to deploy"
type = string
}
resource "aws_instance" "instance" {
ami = "ami-abcd1234"
instance_type = var.ec2_type
}

For a long time, Terraform has supported type constraints on variables, such as enforcing that ec2_type is set to a string. But how do you enforce requirements that go beyond type constraints, such as only allowing users to use instance types from a pre-approved list or from the AWS Free Tier (e.g., t2.micro, t3.micro)?

The solution

As of Terraform 0.13, you can add validation blocks to any input variable to perform checks that go beyond basic type constraints. For example, you can add a validation block to the ec2_type variable to ensure not only that the value the user passes in is a string (which is enforced by the type constraint) but that the string has one of two allowed values from the AWS Free Tier:

variable "ec2_type" {
description = "The type of EC2 instance to deploy"
type = string
validation {
condition = contains(["t2.micro", "t3.micro"], var.ec2_type)
error_message = "Only Free Tier instance types are allowed."
}
}

The way a validation block works is that the condition parameter should evaluate to true if the value is valid and false otherwise. The error_message parameter allows you to specify the message to show the user if they pass in an invalid value. For example, here’s what happens if you try to set ec2_type to m4.large, which is not part of the AWS Free Tier:

$ terraform apply -var ec2_type="m4.large"│ Error: Invalid value for variable

│ on main.tf line 17:
│ 1: variable "ec2_type" {
│ ├────────────────
│ │ var.ec2_type is "m4.large"

Only Free Tier instance types are allowed.

│ This was checked by the validation rule at main.tf:21,3-13.

Validation blocks are a great way to catch basic input errors, but they have a major limitation: the condition in a validation block can only reference the surrounding input variable. If you try to reference any other input variables, local variables, resources, or data sources, you will get an error.

To perform more complicated checks, you can to use precondition and postcondition blocks, which were introduced in Terraform 1.2. For example, you could use a precondition block to do a more robust check that the ec2_type the user passes in is in the AWS Free Tier. Instead of a hardcoded list of instance types, which can quickly go out of date, you can use the instance_type_data data source to get up-to-date information from AWS:

data "aws_ec2_instance_type" "check" {
instance_type = var.ec2_type
}

And then you can figure out if the instance type is part of the Free Tier as follows:

locals {
is_free_tier = data.aws_ec2_instance_type.check.free_tier_eligible
}

Now you can add a precondition block to your aws_instance resource to show an error if is_free_tier evaluates to false:

resource "aws_instance" "instance" {
ami = "ami-abcd1234"
instance_type = var.ec2_type
lifecycle {
precondition {
condition = local.is_free_tier
error_message = "Only Free Tier instance types are allowed."
}
}
}

You can add precondition blocks to any resource or data source, and, unlike validation blocks, precondition blocks can reference local variables, data sources, etc, which allows you to do more dynamic checks. Here’s what happens if you try to set ec2_type to m4.large, which is not part of the AWS Free Tier:

$ terraform apply -var ec2_type="m4.large"│ Error: Resource precondition failed

│ on main.tf line 25, in resource "aws_instance" "instance":
│ 18: condition = local.is_free_tier
│ ├────────────────
│ │ local.is_free_tier is false

Only Free Tier instance types are allowed.

As you can see, precondition blocks are checked before apply, so they help you check basic assumptions and catch errors before any changes have been deployed. In an analogous way, you can add postcondition blocks to any resource or data source to enforce basic guarantees and notify the user of any errors after deployment.

Refactoring

The problem

Refactoring Terraform code can be tricky. For example, consider again the Terraform code to deploy an EC2 instance:

resource "aws_instance" "instance" {
ami = "ami-abcd1234"
instance_type = var.ec2_type
}

Let’s say you ran apply on this code, using it to deploy a Jenkins server. Later on, to make the intent of the code clearer, you decided to change the name of the aws_instance resource from instance to jenkins:

resource "aws_instance" "jenkins" {
ami = "ami-abcd1234"
instance_type = var.ec2_type
}

This seems like a tiny, inconsequential refactor, but what would happen if you were to run apply again? The answer: Terraform would terminate the old EC2 instance, deleting any data on it (including all your Jenkins configuration!), and then create a totally new EC2 instance. Whoops.

The reason this happens is because Terraform maintains a state file so it knows what it deployed in the past, and can therefore update it in the future (for more info, see How to manage Terraform state). Within the state file, Terraform identifies resources by their names: e.g., the state for the EC2 instance you deployed was under the name aws_instance.instance. If you rename a resource, as far as Terraform knows, you are asking it to delete the old resource (aws_instance.instance), and add a totally new, unrelated resource (ws_instance.jenkins).

In the past, the only solution was to manually remember every refactor that you did, and manually execute terraform state mv commands to tell Terraform that a resource now lives at a new name. For example, to avoid Terraform terminating and replacing your EC2 instance after the rename, you would manually have to run:

terraform state mv aws_instance.instance aws_instance.jenkins

Having to remember to run CLI commands manually is error prone, especially if you refactored a module used by dozens of teams in your company, and each of those teams needs to remember to run terraform state mv to avoid downtime and data loss.

The solution

Terraform 1.1 added a way to handle state updates automatically: moved blocks. Any time you refactor your code, you should add a moved block to capture how the state should be updated. You can add the moved block in any .tf file in your Terraform code, though to make them easier to find, you may wish to pick a convention, such as putting all moved blocks in a moved.tf file. For example, to capture that the aws_instance resource was renamed from instance to jenkins, you would add the following moved block:

moved {
from = aws_instance.instance
to = aws_instance.jenkins
}

Now, whenever anyone runs apply on this code, Terraform will automatically detect if it needs to update the state file:

Terraform will perform the following actions:

# aws_instance.instance has moved to
# aws_instance.jenkins
resource "aws_instance" "jenkins" {
ami = "ami-abcd1234"
instance_type = "t2.micro"
# (8 unchanged attributes hidden)
}

Plan: 0 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value:

If you enter yes, Terraform will update the state automatically, and as the plan shows no resources to add, change, or destroy, Terraform will make no other changes (no instances will be destroyed or recreated) — which is exactly what you want!

Static analysis

The problem

The previous edition of Terraform: Up & Running added an entire chapter on testing Terraform code. Most of that chapter was focused on unit, integration, and end-to-end testing, with lots of examples that use the open source Terratest library to go through the full testing lifecycle of deploy, validate, and undeploy.

While this style of testing is the gold standard for Terraform, many readers were also interested in other, more lightweight testing approaches. Is there some way to test your Terraform code without actually going through the full deploy cycle?

The solution

In the third edition of the book, I do a brief overview of static analysis for Terraform code. The idea behind static analysis is to parse the code and analyze it without actually executing it in any way.

For example, the simplest static analysis tool is terraform validate, which is built into Terraform itself, and allows you to catch basic syntax issues. For example, imagine you took the code to deploy an EC2 instance from the preceding sections, put it in a folder called ec2-instance, and then, from another folder, tried to ec2-instance as a module as follows:

module "instance" {
source = "../ec2-instance"
}

Notice how this code does not set the ec2_type input variable, even though it’s required. You can catch this bug by running validate on the code:

$ terraform validate
│ Error: Missing required argument

│ on main.tf line 1, in module "instance":
│ 1: module "instance" {

│ The argument "ec2_type" is required, but no definition was found.

The validate command is limited solely to syntactic checks, but there are other static analysis tools out there that allow you to perform more advanced checks, such as enforcing that security groups cannot be too open or enforcing that all EC2 Instances follow a specific tagging convention.

Here’s a table from the book that shows some of the more popular static analysis tools that work with Terraform, including terraform validate, tfsec, tflint, and Terrascan, and how they compare in terms of popularity and maturity, based on stats I gathered from GitHub in February 2022:

Policy enforcement

The problem

Over the last few years, a number of policy-as-code tools have emerged that allow you to define and enforce business requirements as code. The question is, how do you use policy-as-code tools with Terraform?

The solution

In the third edition of the book, I show how to use policy-as-code tools to analyze your terraform plan output. Since terraform plan partially executes your code (it executes the read steps, but not the write steps), this style of testing can catch more errors than static analysis, but still less than a unit or integration test.

For example, consider again the Terraform code to deploy an EC2 instance:

resource "aws_instance" "instance" {
ami = "ami-abcd1234"
instance_type = var.ec2_type
}

A common policy most companies want to enforce is that all resources are tagged: for example, you may want to ensure that every resource that is managed by Terraform has a ManagedBy = terraform tag (note that the preceding EC2 instance code is missing this tag!).

One way to enforce such a policy is to use a tool called Open Policy Agent (OPA). You can define a policy to check for the ManagedBy tag in a file called enforce_tagging.rego as follows:

package terraform

allow {
resource_change := input.resource_changes[_]
resource_change.change.after.tags["ManagedBy"]
}

To run this policy against your Terraform code, the first step is to run terraform plan and save the output to a file:

terraform plan -out tfplan.binary

OPA only operates on JSON, so the next step is to convert the plan file to JSON using the terraform show command:

terraform show -json tfplan.binary > tfplan.json

Finally, you can run the opa eval command to check this plan file against the enforce_tagging.rego policy:

opa eval \
--data enforce_tagging.rego \
--input tfplan.json \
--format pretty \
data.terraform.allow
undefined

Since the ManagedBy tag was not set, the output from OPA is undefined, which you could detect and cause to fail a test in your CI / CD pipeline. This would force the developer to add the tag to their code:

resource "aws_instance" "instance" {
ami = "ami-abcd1234"
instance_type = var.ec2_type
tags = {
ManagedBy = "terraform"
}
}

If you re-run terraform plan, terraform show, and opa eval, this time, you’ll get true, which means the policy has passed:

opa eval \
--data enforce_tagging.rego \
--input tfplan.json \
--format pretty \
data.terraform.allow
true

Here’s a table from the book that shows some of the more popular policy-as-code tools that work with Terraform, including OPA, Terratest, Sentinel, Checkov, and terraform-compliance, and how they compare in terms of popularity and maturity, based on stats I gathered from GitHub in February 2022:

Maturity

The problem

When I wrote the first edition of Terraform: Up & Running, Terraform was an immature, pre-1.0.0 tool. That meant that it was hard to hire people who had expertise in it; it was hard to find good documentation, guides, blog posts, and other online resources to learn the tool (which is why I wrote the book in the first place!); it was hard to find good off-the-shelf modules and plugins so you didn’t have to spend months building everything yourself (which is why we started Gruntwork!); it was hard to keep your code working over the long term, as there were frequent bugs, testing practices were poorly developed, and the new releases of Terraform usually included breaking changes, so you had to follow long upgrade guides and make sweeping changes across your whole codebase to stay up to date.

So what has changed since them? Has Terraform gotten any more mature?

The solution

The short answer is: yes!

In 2021, Terraform hit a big milestone: the 1.0 release. This signified not only that Terraform had reached a certain level of maturity but also brought with it a number of compatibility promises. Namely, the promise is that all the 1.x releases will be backward compatible, so upgrading between v1.x releases no longer requires changes to your code, workflows, or state files. The tooling for managing Terraform versions is more mature these days, too: there is a required_providers block and lock file for managing provider versioning (as covered in part 1 of this series) and you can use tools like tfenv to manage the version of Terraform itself (and tgswitch if you’re a Terragrunt user).

Moreover, HashiCorp announced in 2021 that Terraform has been downloaded more than 100 million times, has had over 1,500 open source contributors, and is in use at ~79% of Fortune 500 companies, so it’s safe to say that the ecosystem has grown and matured significantly over the last several years. There are now more developers, providers, reusable modules, tools, plugins, classes, books, and tutorials for Terraform than ever before. Moreover, HashiCorp, the company that created Terraform, had its IPO (initial public offering) in 2021, so Terraform is no longer backed by a small startup but by a large, stable, publicly traded company, for which Terraform is its biggest business line.

While there is still plenty of room for Terraform to grow and mature, it’s no longer a new, unproven tool. In fact, it’s arguably the de facto standard for infrastructure as code these days, and has proven itself mature enough to be used in production across thousands of companies.

Conclusion

You’ve now seen 5 more problems that have been solved in the Terraform world in the last few years and are now covered by the 3rd edition of Terraform: Up & Running, including how to validate module inputs, how to safely refactor your code, how to test your code with static analysis and policy-as-code tools, and how Terraform’s maturity has improved the last few years.

If you enjoyed this content and want to go deeper, grab yourself a copy of Terraform: Up & Running, 3rd edition, and let me know what you think!

--

--

Co-founder of Gruntwork, Author of “Hello, Startup” and “Terraform: Up & Running”