Whitelisting AWS Roles in Kubernetes

As we migrate applications running in AWS to containers and Kubernetes we also need to accommodate each application’s AWS permissions, as supplied by roles. If the applications assume AWS IAM roles that allow them to perform AWS operations, these roles should still work inside Kubernetes running inside AWS. At issue is how we allow pods to assume needed roles for which they are authorized, while not permitting pods to assume roles for which they are not authorized.

When running Kubernetes in AWS we’re fortunate to be able to choose between two mature AWS role-assumption solutions: kiam and kube2iam. Both of these solutions allow Kubernetes pods to assume AWS IAM roles, provided that the appropriate underlying Kubernetes node role is trusted by the target role that is needed by the pod. In kiam, the node role that needs to be trusted is that of a master node. In kube2iam, it is the role assumed by the node on which the pod is running. Both solutions act like proxies to perform the sts:AssumeRole operation in AWS. Since running containers are actually Linux processes, containers can make the needed AWS EC2 metadata calls to get credentials based on roles assumed by the cluster node.

Note: I have used both kiam and kube2iam, and I am not going to deep dive into their respective approaches, nor am I going to choose one over the other.

Both kiam and kube2iam use pod-level annotations to specify the role (AWS ARN) that a pod needs in order to interact with AWS. Below is an example of a Kubernetes deployment resource that specifies the AWS IAM role to be used by the pod.

The spec.template.metadata.annotations section contains the iam.amazonaws.com/role element that specifies the AWS IAM role that this pod needs to assume. In a kube2iam implementation, the role assumed by the underlying node on which the pod is running must be trusted by the target role that the pod assumes. In a kiam implementation, the master node role must be trusted by the target role needed by the pod in question.

Note: AWS IAM roles have three parts:

1. The role that is assumed by the resource wanting to perform AWS work.

2. At least one attached policy that specifies the permissions that the role will have and convey to the assuming principal.

3. A trust document that specifies trusted resources that are allowed to assume the role.

The Roles

When running Kubernetes on AWS EC2, your nodes will each be assigned roles, with attached policies, that provide permissions to perform AWS operations needed to survive as a cluster node. A typical Kubernetes worker policy is seen below.

The role to which this policy is attached can be assumed by the Kubernetes cluster worker EC2 node, because the trust document, also attached to the role, trusts the AWS EC2 service.

kiam and kube2iam act as role-assumption proxies and allow pods, with role annotations, to pull credentials using the AWS EC2 metadata based on the role annotated on the pod spec. This works only if the desired role is setup to trust the appropriate Kubernetes cluster node role. Roles can trust other roles by supplying the trusted role’s Amazon Resource Name (ARN) as a trusted entity. This configures a transitive trust relationship.

With these solutions, individual pods can be given access to roles, as needed, without overtly granting all roles to all pods in the cluster. The downside is that the respective Kubernetes cluster nodes are the trusted principal, through their respective assumed AWS IAM role. Potentially any user that knows of a role, and who has privileges to change cluster state, could annotate a pod with a role, even if that pod is really not authorized to use that role.

Role Shopping

Role shopping, a.k.a. privilege escalation, is when a pod is annotated with a role that was not meant for its use. Since the role, that is trusted by the pod target role, is actually assumed by the underlying AWS EC2 instance, there is potential for pods to be annotated to assume roles for which they are not authorized. To prevent this from happening both kiam and kube2iam use the concept of namespace restrictions to restrict which roles can be used by pods. At the time of this writing, kiam uses a namespace annotation that contains a regular expression to whitelist roles allowed in the respective namespace. Below is an example of that configuration, taken from the kiam GitHub repository.

In this kiam example above, the RegEx pattern used is very permissive.

Kube2iam has a similar approach with namespace restrictions, except that kube2iam uses an array structure to capture lists of roles allowed in the namespace, as seen below.

The above kube2iam namespace annotation uses wildcards to allow groups of roles.

Note: The kube2iam namespace restrictions functionality is optional.

The solutions used by kiam and kube2iam would seem to imply that role whitelists are better managed at each namespace. And, the customary security model would dictate that users of namespaces would not be able to edit the configuration of their namespace . In this model, management of whitelisted roles are centrally managed by cluster admins, configuring each namespace.

While I appreciate both the kiam and kube2iam approaches to preventing role shopping, I propose a third approach that is more centrally managed, policy driven, and implementation agnostic. In this solution kiam and kube2iam are augmented with a policy driven solution that uses centralized whitelists of roles for each namespace. To build this approach, I will consider the requirements for centralized role-to-namespace whitelisting, and a preventative control, applied in a generic way, to the API server events before cluster state is changed.

Requirements for Centralized Control to Prevent Role Shopping

To prevent role shopping, a preventative control should be in place that prevents roles from being used by unauthorized pods, even if permissive kiam or kube2iam configurations exist. This control should prevent the Kubernetes API server from making any cluster state changes that would result in a pod being annotated with an unauthorized role. An unauthorized role would be any role not specifically whitelisted for a given namespace.

This control should also be policy-driven for ease of use and targeted enforcement, and should be based on least-privilege whitelisting. This control should be based on centrally stored and managed policies, and data and implementation agnostic. To be clear, this centralized whitelisting is in addition to any kiam or kube2iam namespace configuration.

Kubernetes Validating Admission Controllers and OPA

Per requirements, we need to prevent unwanted Kubernetes cluster state changes; we only want to allow (or admit) valid changes to the cluster state. This is the job of the Kubernetes ValidatingAdmissionWebhook, configurable in the API server bootstrap settings. This will seem familiar to followers of my previous article Policy Enabled Kubernetes with Open Policy Agent. Details for setting up the ValidatingAdmissionWebhook with OPA are covered in that article.

Along with the ValidatingAdmissionWebhook, we need Open Policy Agent (OPA). OPA is:

“…a lightweight general-purpose policy engine that can be co-located with your service. You can integrate OPA as a sidecar, host-level daemon, or library.

Services offload policy decisions to OPA by executing queries. OPA evaluates policies and data to produce query results (which are sent back to the client). Policies are written in a high-level declarative language and can be loaded into OPA via the filesystem or well-defined APIs.”

Again, since I went into such detail about the Kubernetes/OPA configuration in my previous article, I will not be covering the duplicative content here. The concepts of the Kubernetes integration to OPA can also be found here.

Kubernetes Admission Validation with OPA

After successful configuration, a Kubernetes webhook will listen for events from the API server, before any cluster state change is applied. The webhook will make HTTP POST calls to the RESTful OPA server to submit the API server event payload for evaluation. A policy placed on the OPA server will match the metadata of the inbound API server event; this policy will evaluate the event payload, and return a {allowed”:”true/false”} element to the API server. A true response will signal the API server to apply the change; a false will prevent the API server from applying the requested cluster state change. Upon a failed state change, the API sever will respond to the client with a message explaining why the API event was not applied.

Policy to Whitelist Roles

With OPA set up to listen for and evaluate API server events for validity, we can now write policies to whitelist roles. Since pods are arranged in namespaces, whitelists need a principal, and kiam and kube2iam use namespaces, my approach is to whitelist roles per namespace. The namespace is the logical choice as it is a Kubernetes resource used to organize related resources, isolate non-related resources, and manage security boundaries.

Note: It is generally accepted practice to organize applications under unique labels or organization units. These labels/organization units can be used to manage application risk by setting up logical boundaries for AWS resources, such as roles, security groups, etc. Matching these organization units, one-to-one, to Kubernetes namespaces makes it easy to maintain the application-specific risk/security boundaries within Kubernetes.

The role-to-namespace whitelist policy (seen below) is written as a cluster-wide resource to be managed by cluster admins. The policy is written in REGO, the OPA policy language, and is stored in a ConfigMap resource in the opa namespace. This enforces another control point, as namespace owners cannot simply opt into using a role they desire. They must request that the role be whitelisted, to their namespace, by cluster admins. While kiam and kube2iam partially support this approach in their own ways, the policy-driven approach is more centrally managed, in the opa namespace.

import data.kubernetes.namespacesdeny[msg] {
input.request.kind.kind = "Deployment"
input.request.operation = "CREATE"
role := input.request.object.spec.template.metadata.annotations["iam.amazonaws.com/role"]
name := input.request.object.metadata.name
namespace := input.request.object.metadata.namespace
not ns_roles_whitelisted(namespace,role)
msg := sprintf("invalid deployment: role is not whitelisted for namespace., name=%q, namespace=%q, role=%q", [name,namespace,role])
ns_roles_whitelisted(n,r) {
# a dictionary mapping each namespace to the set of permitted roles for that namespace
whitelist := {
"prod-ns": {"ec2:readonly", "iam:readonly"},
"dev-ns": {"ec2:full", "iam:full", "ec2:readonly", "iam:readonly", "admin"},
"role-test-ns": {"arn:aws:iam::012345678901:role/role-name"},

The OPA policy above is written to match API server event metadata for Kubernetes deployment resources that are being created. It will be used to evaluate deployment resources being created, where the spec.template.metadata.annotations.iam.amazonaws.com/role element contains an AWS IAM role value. The ns_roles_whitelisted() function will search the whitelist dictionary to validate that a requested role is whitelisted by the requesting namespace. If the annotated role is not whitelisted for the requesting namespace, the API server event will not be permitted to apply the cluster state change.

In practice, this solution should be written to listen for CREATE/UPDATE changes to any Kubernetes resource (Deployment, Pod, etc.) that could result in an annotated pod.

It’s important to point out that OPA matches policies to apply evaluations based on JSON structure of inbound data to be evaluated. So, this policy will only be used when an attempt is made to CREATE a Deployment resource, and the input.request.object.spec.template.metadata.annotations[“iam.amazonaws.com/role”] element is present, in the API server event payload, with a value. API events to CREATE a deployment resource whose payloads do not contain the AWS role annotation will not be evaluated by this policy.


This OPA policy-driven solution is used to enforce specific policies to prevent unwanted cluster changes, and authorize applications, arranged in Kubernetes namespaces, to use AWS IAM roles during normal operation. Since OPA policies are stored as Kubernetes ConfigMap resources, changing these policies is very easy. While kiam and kube2iam both have their own namespace restrictions, this approach is more centrally managed, policy driven, and less dependent on the the individual implementations of kiam or kube2iam. An added advantage is that is notifies the user of a potential misconfigurations, before they result in runtime issues. Given its domain agnostic approach, OPA is a suitable solution to manage Kubernetes cluster governance and compliance, as well as AuthZ concerns.


The ideas on this article are my own thoughts and do not reflect or represent those of my employers, customers, or students.

Cloud and Containerization SME

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store