A guide to automating HashiCorp Vault #3: Authenticating with an IAM user or role

Etiene Dalcol
Gruntwork
Published in
6 min readJan 14, 2019

--

This is the 3rd part of the automating HashiCorp Vault series. In part 2, we talked about how we can authenticate to a Vault cluster using instance metadata, after spinning it up and auto-unsealing, which was addressed in the first post. In this third and final post, we’ll talk about an alternative way to authenticate to Vault that you can use with IAM users and roles.

One of the limitations of the ec2 method is that it does not work for many different types of AWS services such as Lambda functions and ECS tasks. A similar situation will happen between the gce and iam methods of GCP. Although we’ve been walking through AWS examples in this series of blog posts (you can find the full code examples here), the methods in the two clouds are analogous, and you can find the GCP specific code here.

While only EC2 Instances have an Instance Metadata endpoint, almost all AWS resources can call the AWS Security Token Service (STS) to look up their own identity. Vault’s AWS iam auth method takes advantage of this by allowing you to create a signed request to STS, but instead of sending the request yourself, you send that signed request data to Vault. Vault executes the request and finds out your real identity from AWS (again, our trusted 3rd party).

A brief visualization of Vault’s AWS iam authentication method workflow

The Vault cluster should have policies to query the necessary information from AWS, especially if you use wildcards for configuring the IAM user or role. You can do this with Terraform, using Gruntwork’s vault-cluster module:

resource "aws_iam_role_policy" "vault_iam" {
name = "vault_iam"
role = "${module.vault_cluster.iam_role_id}"
policy = "${data.aws_iam_policy_document.vault_iam.json}"
}
data "aws_iam_policy_document" "vault_iam" {
statement {
effect = "Allow"
actions = ["iam:GetRole", "iam:GetUser"]
# List of ARNs Vault machines can query
# For more security, it could be set to specific roles or users:
# resources = ["${aws_iam_role.example_instance_role.arn}"]
resources = [
"arn:aws:iam::*:user/*",
"arn:aws:iam::*:role/*",
]
}
statement {
effect = "Allow"
actions = ["sts:GetCallerIdentity"]
resources = ["*"]
}
}
module "vault_cluster" {
source = "github.com/hashicorp/terraform-aws-vault.git/modules/vault-cluster?ref=v0.11.3"
# ... other Gruntwork's vault-cluster module vars
}

Normally, the iam method would be ignorant of the specifics of EC2 instances, but through the method AssumeRole of AWS STS, Vault can infer that the IAM principal is attached specifically to an EC2 instance.

data "aws_iam_policy_document" "example_instance_role" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}
resource "aws_iam_role" "example_client_instance_role" {
name_prefix = "auth-example-iam-role"
assume_role_policy = "${data.aws_iam_policy_document.example_instance_role.json}"
}
resource "aws_iam_instance_profile" "example_instance_profile" {
path = "/"
role = "${aws_iam_role.example_client_instance_role.name}"
}
resource "aws_instance" "example_client_auth_to_vault" {
iam_instance_profile = "${aws_iam_instance_profile.example_instance_profile.name}"
# ... other instance vars
}

This means that you can also limit the Vault Role to some instance attributes even when using the iam auth method instead of the ec2 method. You can do that by setting an inferred entity type when configuring your Vault Role. If you need more granular controls for EC2 instances, however, you should still use the ec2 method. And if you’re authenticating a different type of resource,
such as a Lambda function, the iam method is the only method you can use.

vault auth enable awsvault policy write "example-policy" -<<EOF
path "secret/example_*" {
capabilities = ["create", "read"]
}
EOF
vault write \
auth/aws/role/example-role-name
auth_type=iam \
policies=example-policy \
max_ttl=500h \
bound_iam_principal_arn=$client_instance_role_arn \
inferred_entity_type="ec2_instance" \
bound_ami_id=$ami_id # only when EC2 instance is inferred

The client trying to authenticate will create a request to the method GetCallerIdentity of the AWS STS API (but not yet send it). This method basically answers the question “Who am I?” This request is then signed with the AWS credentials of the client and the signed result is sent with the
login request to the Vault Server. As we mentioned before, AWS has already done the hard work of distributing credentials to things running on it and, even better, it also has the notion of temporary credentials. EC2 instances can get their instance profile via the metadata service, Lambda functions can get their credentials through environment variables and ECS tasks have their ECS-specific metadata service.

Still, this part is the the most complex step of the iam authentication process. Creating the correct canonical request has many bits and pieces that can go wrong. Encrypting the correct parts to include in the authorization header can be a very time consuming process as the failed responses from Vault will often be unhelpful, which is probably intentional. For this reason, I heavily discourage trying to do this by yourself from scratch. I’d recommend instead using either the Vault cli tool (preferable), which already does a lot of the hard work for you, or use the AWS SDK in some programming language.

For a Go example, you can just look at Vault’s source code. Here is a Python 2 example using botocore, adapted from an example by J. Thompson posted at the Vault mailing list:

import botocore.session
from botocore.awsrequest import create_request_object
import json
import base64
import sys
def headers_to_go_style(headers):
retval = {}
for k, v in headers.iteritems():
retval[k] = [v]
return retval
def generate_vault_request(awsIamServerId):
session = botocore.session.get_session()
client = session.create_client('sts')
endpoint = client._endpoint
operation_model = client._service_model.operation_model('GetCallerIdentity')
request_dict = client._convert_to_request_dict({}, operation_model)
request_dict['headers']['X-Vault-AWS-IAM-Server-ID'] = awsIamServerId request = endpoint.create_request(request_dict, operation_model) # It's a CaseInsensitiveDict, which is not JSON-serializable
headers = json.dumps(headers_to_go_style(dict(request.headers)))
return {
'iam_http_request_method': request.method,
'iam_request_url': base64.b64encode(request.url),
'iam_request_body': base64.b64encode(request.body),
'iam_request_headers': base64.b64encode(headers),
}
if __name__ == "__main__":
awsIamServerId = sys.argv[1]
print json.dumps(generate_vault_request(awsIamServerId))

You can use the Vault server’s address as an argument to this script, which will be included in the request as a server id. This is useful as a security boundary. For example, if credentials get compromised for a dev Vault cluster, it won’t be useful for breaching the prod Vault cluster.

signed_request=$(python /opt/vault/scripts/sign-request.py vault.service.consul)iam_request_url=$(echo $signed_request | jq -r .iam_request_url)
iam_request_body=$(echo $signed_request | jq -r .iam_request_body)
iam_request_headers=$(echo $signed_request | jq -r .iam_request_headers)
# The role name necessary here is the Vault Role name
# not the AWS IAM Role name
data=$(cat <<EOF
{
"role":"example-role-name",
"iam_http_request_method": "POST",
"iam_request_url": "$iam_request_url",
"iam_request_body": "$iam_request_body",
"iam_request_headers": "$iam_request_headers"
}
EOF
)
curl --fail \
--request POST \
--data "$data" \
"https://vault.service.consul:8200/v1/auth/aws/login"

When the Vault server receives a login request with the iam method, it can execute the STS request without actually knowing the contents of the signed part. Amazon identifies who signed it, which the Vault Server then can check against the IAM principal bounded to a previously created Vault Role. It is important to note that although the Vault Role is configured with the IAM principal ARN, what Vault actually checks against is a unique internal ID from AWS. So if you destroy and recreate your IAM Role, Vault will reject the login attempt.

As mentioned previously, the Vault cli tool makes this work much simpler. So the last example could be much simplified. Internally, however, the workflow is exactly the same.

vault login \
-address=https://vault.service.consul:8200 \
-method=aws \
header_value=vault.service.consul \
role=vault-role-name

Next steps

In case you noticed that so far whenever we used the address of the Vault server on our examples, we used vault.service.consul, that’s because we are using HashiCorp Consul not only as a storage backend but also as a Service Discovery mechanism. For the complete code of Vault clusters running with Consul and multiple examples of common use cases, check our open source repositories for AWS and GCP.

Thanks

Many thanks to Joel Thompson, who contributed the iam authentication method to Vault and gave a fantastic and very detailed talk during HashiConf’17. His talk was immensely helpful to understand the inner pieces of Vault’s authentication workflows and my work would have been much harder without it.

Your entire infrastructure. Defined as code. In about a day. Gruntwork.io.

--

--

Founder of Polygloss, the language learning app for expressing yourself, not memorizing. Software Engineer and NLP researcher, feminist, polyglot, chaotic good.