EKS, Cluster Authentication and Autoscaling of nodes/pods

With this post you'll get a better understanding of the Amazon Elastic Kubernetes Service offering, how the authentication to the cluster works, and what are the configuration steps to perform, in order to have a cluster ready to handle workloads.

17.12.2019

Christian Hunter

What to expect

We will quickly go through the definition of Kubernetes, the description of the Elastic Kubernetes Service (EKS) service, and then I will show you how to configure authentication through IAM, enable cluster autoscaling and add worker nodes to a Kubernetes cluster deployed in EKS.

What is Kubernetes?

Kubernetes is an open source platform for managing containerized applications and services that facilitates both declarative configuration and automation.

According to Dan Kohn, executive director of the Cloud Native Computing Foundation (CNCF), in a podcast with Gordon Haff:

“Containerization is this trend that’s taking over the world to allow people to run all kinds of different applications in a variety of different environments. When they do that, they need an orchestration solution in order to keep track of all of those containers and schedule them and orchestrate them. Kubernetes is an increasingly popular way to do that.”

What is AWS EKS?

Quoting the official documentation^[1]

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to stand up or maintain your own Kubernetes control plane.

Amazon EKS runs Kubernetes control plane instances across multiple Availability Zones to ensure high availability. Amazon EKS automatically detects and replaces unhealthy control plane instances, and it provides automated version upgrades and patching for them.

It is important to clarify that EKS provides the control plane as a managed service, and nothing more, meaning that the worker nodes where your workloads are going to be deployed, are not part of this service.

The worker nodes are EC2 instances that have to be registered with the cluster somehow; but before getting into that, we need to understand how the Cluster authentication is done and which are the components related to that process.

Cluster Authentication

Amazon EKS uses IAM to provide authentication to the cluster^[2], but it still relies on the native Kubernetes Role Based Access Control (RBAC) for authorization. This means that IAM is only used for authentication of valid IAM entities. All permissions for interacting with the EKS API is managed through the native RBAC system.

The following diagram explains the authentication process:

Managed Kubernetes Blog - authentication process

Authentication process via AWS IAM

AWS IAM Authenticator for Kubernetes

IAM Authenticator is a tool designed to use AWS IAM credentials to authenticate to a Kubernetes cluster. It has two main use cases:

Simplify the credentials management for Kubernetes access
If you are an administrator running a Kubernetes cluster on AWS, you already need to manage AWS IAM credentials to provision and update the cluster. By using AWS IAM Authenticator for Kubernetes, you avoid having to manage a separate credential for Kubernetes access.
Simplify the bootstrap process
If you are building a Kubernetes installer on AWS, AWS IAM Authenticator for Kubernetes can simplify your bootstrap process. You won’t need to somehow smuggle your initial admin credential securely out of your newly installed cluster. Instead, you can create a dedicated role at cluster provisioning time and set up Authenticator to allow cluster administrator logins.

How does AWS IAM Authenticator work?

It works using the AWS sts:GetCallerIdentity API endpoint. This endpoint returns information about whatever AWS IAM credentials you use to connect to it.

Client side (aws-iam-authenticator token)

This API is used in a somewhat unusual way by having the Authenticator client generate and pre-sign a request to the endpoint. That request is serialized into a token that can pass through the Kubernetes authentication system.

Server side (aws-iam-authenticator server)

The token is passed through the Kubernetes API server and into the Authenticator server’s /authenticate endpoint via a webhook configuration. The Authenticator server validates all the parameters of the pre-signed request to make sure nothing looks funny. It then submits the request to the real https://sts.amazonaws.com server, which validates the client’s HMAC signature and returns information about the user. Now that the server knows the AWS identity of the client, it translates this identity into a Kubernetes user and groups via a simple static mapping.

kubectl and kubeconfig

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can use kubectl to deploy applications, inspect and manage cluster resources, and view logs. For a complete list of kubectl operations, see Overview of kubectl.

The kubectl command-line tool uses kubeconfig files to find the information it needs to choose a cluster and communicate with the API server of a cluster.

Managing Users or IAM Roles for EKS

When you create an Amazon EKS cluster, the IAM entity user or role^[3], such as a federated user that creates the cluster, is automatically granted system:masters permissions in the cluster’s RBAC configuration. The group system:masters allows super-user access to perform any action on any resource.

To grant additional AWS users or roles the ability to interact with your cluster, you must edit the aws-auth ConfigMap within Kubernetes.

The aws-auth ConfigMap is initially created to allow your worker nodes to join your cluster, but you also use this ConfigMap to add RBAC access to IAM users and roles.

For more information about different IAM identities, see Identities (Users, Groups, and Roles) in the IAM User Guide. For more information on Kubernetes RBAC configuration, see Using RBAC Authorization.

IAM Roles for Service Accounts

We just discussed about how are users/roles allowed to interact with the cluster, but there was no explanation on how we could have a fine-grained control over the different services our workloads have access to.

With IAM roles for service accounts^[4] on Amazon EKS clusters, you can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. With this feature, you no longer need to provide extended permissions to the worker node IAM role so that pods on that node can call AWS APIs.

Applications must sign their AWS API requests with AWS credentials. This feature provides a strategy for managing credentials for your applications, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances. The applications in the pod’s containers can then use an AWS SDK or the AWS CLI to make API requests to authorized AWS services.

To wrap-up, kubectl is used to interact with Kubernetes clusters. EKS uses IAM for cluster authentication and provides the users with the AWS IAM Authenticator, alternatively the AWS CLI can be used but it has to be version +1.16.283, which comes with the IAM Authenticator embedded. Finally, with IAM Roles for Service Accounts, you can provide fine-grained AWS permissions to the workloads running in your worker nodes. But how are worker nodes added to the cluster? We will discuss that now.

Adding worker nodes to your cluster

Again, quoting the AWS documentation:^[5]

Worker machines in Kubernetes are called nodes. Amazon EKS worker nodes run in your AWS account and connect to your cluster control plane via the cluster API server endpoint.

Amazon EKS worker nodes are standard Amazon EC2 instances, and you are billed for them based on normal EC2 prices.^[6]

AWS provides a specific AMI that is optimized for EKS. The AMI is configured to work with Amazon EKS out of the box. It includes Docker, kubelet and the AWS IAM Authenticator. The AMI also contains a specialized bootstrap script that allows it to discover and connect to your clusters control plane automatically.

The registration process of the worker nodes consists of two steps:

allow the EC2 instances (workers) to join the cluster and
authenticate them with the cluster.

Allow the workers to join the cluster

To allow the worker nodes to join the cluster, you need to apply a ConfigMap that associates the instance role used with the worker nodes to the Kubernetes RBAC groups system:bootstrappers and system:nodes.

The group system:bootstrappers allows access to the resources required to perform Kubelet TLS bootstrapping.

The system:nodes group allows access to resources required by the kubelet, including read access to all secrets and write access to all pod status objects.

This is how the ConfigMap looks:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
	- rolearn: <ARN of instance role (not instance profile)>
  	username: system:node:{{EC2PrivateDNSName}}
  	groups:
    	- system:bootstrappers
    	- system:nodes

Once the ConfigMap is applied, the nodes that match the ARN listed in the ConfigMap will be allowed to be authenticated by the cluster.

Have your worker nodes authenticated by the cluster

The EKS cluster is provisioned in a way that allows authentication with bootstrap tokens. A Bootstrap token is a simple bearer token that is meant to be used when creating new clusters or joining new nodes to a cluster.

This token is used to temporarily authenticate the EKS master to submit a certificate signing request (CSR) for a locally created key pair. The request is then automatically approved and the operation ends saving a ca.crt file and a kubelet.conf file to be used by the kubelet for joining the cluster.

This is the kubelet bootstrap process workflow:

Managed Kubernetes Blog - bootstrap process workflow

The authentication of your worker nodes is performed by executing the bootstrapping script that comes within the AMI. This is done at boot time.

The EKS bootstrapping script requires a single parameter: the EKS cluster name. You can find details about the script HERE.

Once the worker nodes have been bootstrapped, they will join the EKS cluster and be ready to accept workloads.

Since the worker nodes are EC2 instances, you can take advantage of other EC2 features such as Auto Scaling, and then create fleets of worker nodes in Auto Scaling Groups (ASG). This has the benefit of auto-healing when one of the EC2 instances becomes non-responsive; however, this does not provide autoscaling of nodes when you might have more traffic than usual and your workload might require additional nodes to handle it.

To enable autoscaling of worker nodes, you must deploy a service designed for this purpose: the Cluster Autoscaler.

Autoscaling of worker nodes

Cluster Autoscaler^[7] is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:

there are pods that failed to run in the cluster due to insufficient resources,
there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.

The Cluster Autoscaler on AWS scales worker nodes within any specified autoscaling group. It will run as a Deployment in your cluster.

IMPORTANT! You must define ASGs per Availability Zone as you might get into trouble if you make use of a multi-AZ ASG, and of workloads in Kubernetes that require EBS storage, because EBS volumes are always bound to a single Availability Zone.

It is worth mentioning that this does not provide autoscaling of pods. For that purpose there are two other components that need to be configured in Kubernetes itself: the Horizontal Pod Autoscaler (HPA), and the Vertical Pod Autoscaler (VPA).

Horizontal Pod Autoscaler

The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on that resource’s CPU utilization.

The HPA is part of the core functionality of Kubernetes, but it requires that a metrics source (such as the Kubernetes metrics server) is installed in the cluster.

Vertical Pod Autoscaler

The Kubernetes Vertical Pod Autoscaler automatically adjusts the CPU and memory reservations for your pods to help “right size” your applications. This adjustment can improve cluster resource utilization and free up CPU and memory for other pods. The VPA is not part of the core functionality of Kubernetes but, like the HPA, it requires a metrics source.

Combining vertical and horizontal scaling

In principle it may be possible to use both vertical and horizontal scaling^[8] for a single workload (group of pods), as long as the two mechanisms operate on different resources. The right approach is to let the Horizontal Pod Autoscaler scale the group based on the bottleneck resource. The Vertical Pod Autoscaler could then control other resources.

Examples:

A CPU-bound workload can be scaled horizontally based on the CPU utilization while using vertical scaling to adjust memory.
An IO-bound workload can be scaled horizontally based on the IO throughput while using vertical scaling to adjust both memory and CPU.

However this is a more advanced form of autoscaling and it is not well supported by the current version of Vertical Pod Autoscaler. The difficulty comes from the fact that changing the number of instances affects not only the utilization of the bottleneck resource (which is the principle of horizontal scaling), but potentially also non-bottleneck resources that are controlled by VPA. The VPA model will have to be extended to take the size of the group into account when aggregating the historical resource utilization and when producing a recommendation, in order to allow combining it with HPA. Long story short: It is not recommended having both enabled.

Conclusion

AWS EKS is the Kubernetes service offering from Amazon, it provides the control-plane as a managed service; the access to the cluster is managed by IAM and IAM authenticator is used in combination with kubectl to validate your identity and grant/deny access. Worker nodes are EC2 instances that must be associated with the cluster in a two-step process: 1) allow the workers to join the cluster and 2) have the workers authenticated by the cluster. If you want EKS to use Autoscaling, you must deploy two services: the Cluster Autoscaler and the Horizontal Pod Autoscaler. Finally, if you want to have a fine-grained control over the different AWS services that the deployed workloads might have access to, you must define IAM roles for EKS and for Service Accounts.

This seems to be a good place to take a break and digest all this information.

Stay tuned! In the next blog post we will get our hands dirty and, with the help of terraform, we’ll build an EKS cluster with multiple sets of worker nodes and will deploy the Cluster Autoscaler and metrics server; hopefully learning a few terraform tricks along the way.

Credits for cover image go to: Rinson Chory on Unsplash