Brief Primer on CDK

Some might say that the AWS Cloud Development Kit (CDK) is just a “glorified wrapper” on top of AWS CloudFormation but it is more than that. This articles gives a brief introduction about it and acts as a companion piece to Infrastructure Tests with CDK.
29.07.2021
Tags

This article is a companion piece to the Infrastructure Tests with CDK article. It is not meant as an exhaustive introduction to the CDK, but should only give enough background information to be able to follow the other article without much prior knowledge of the CDK. For both articles, basic familiarity with AWS CloudFormation is assumed.

Some people might say that the AWS Cloud Development Kit (CDK) is just a “glorified wrapper” on top of CloudFormation. And to a certain extent this might be true. Basically, CDK is a set of libraries - available for six, well-known general purpose programming languages - that provide APIs for creating CloudFormation resources. Instead of defining the resources statically in YAML or JSON files as required by CloudFormation, you write actual code to define your infrastructure in either TypeScript, JavaScript, Python, Java, C# or Go. This code is then “synthesized” to CloudFormation templates and deployed as CloudFormation stacks via the CDK CLI.

Being able to use an actual programming language comes with several advantages:

  1. You have the rich ecosystem of the chosen languages available, providing you battle-tested libraries for almost every use case, well-established package managers and auxiliary tooling such as linters.
  2. You have better ways to structure and define your stacks and their resources by using loops, conditional statements, etc.
  3. The learning curve for developers is less steep since they are already familiar with most of the technologies being used and CDK is basically just yet another library.

Additionally, CDK simplifies certain aspects compared to plain CloudFormation, especially in terms of stack management and deployment.

However, the biggest benefit in my opinion - and the reason why “glorified wrapper” falls short - is the fact that by following best software development practices, you can create infrastructure code that is readable, maintainable and testable. And - most importantly - you can modularize the code base and provide reusable components that can be used by different teams and across different accounts.

Before going into details how this can be achieved, I want to briefly define some key terminology.

Key Terminology

The main entry point in CDK is the app.

An app may contain one or more stacks which map directly to CloudFormation stacks.

A stack contains constructs (and is a construct itself). Constructs are the main building blocks of any CDK app and come in three different levels: L1, L2 and L3. Built-in constructs for all AWS services are provided by the CDK’s Construct Library. Additionally, you can write your own.

Combining built-in constructs with your own custom constructs allows you to create the reusable components mentioned before. Since I deem this one of the most important aspects on a conceptual level, I’m going to dedicate the remainder of this article to a more in-depth explanation of the constructs.

Constructs

L1 Constructs

L1 constructs are the lowest level constructs and match 1:1 to a CloudFormation resource. In theory, you could “lift-and-shift” your existing CloudFormation stacks to CDK by mapping the CloudFormation resources directly to their matching CDK L1 constructs. These constructs don’t provide much in the sense of abstraction, but are sometimes still needed since higher-level constructs are not available for all AWS services.

An example of an L1 construct is the CfnBucket, matching the CloudFormation resource AWS::S3::Bucket. One reason why you would use this is to set the ObjectLockConfiguration since this is not yet available in the higher-level L2 construct S3Bucket.

L2 Constructs

L2 makes up a large part of the built-in constructs. They vastly improve the usability of the low-level CloudFormation API. This is done in two ways: First, the API itself is more user-friendly in terms of naming, abstractions and default assignments. Second, a lot of glue code is created automatically, not only by adding additional resources, but also by making use of custom resources that take care of common tasks not achievable with plain CloudFormation.

An example of an L2 construct is the previously mentioned S3Bucket. It also represents an S3 Bucket like the L1 construct CfnBucket, but with a higher level of abstraction. For example, you can enable versioning via a simple boolean value instead of configuring a whole object. CDK maps this simple value then under the hood to the underlying CloudFormation JSON structure. Having this abstraction in place makes it easier to get started and also to reason about the code.

An example for the previously mentioned glue code is the creation of an Elasticsearch Cluster with a custom domain. Using CDK, it is sufficient to set the desired custom domain name and pass the hosted zone. CDK will automatically add the required CNAME record and certificate to the stack. With plain CloudFormation, this would need to be done manually, requiring a deeper understanding of how Route53 and ACM work. CDK automatically chooses the correct record type and picks the sensible default of DNS validation for the certificate. Thus, the developer does not need to know how the custom domain is internally represented. Additionally, the core purpose of the stack is easier to grasp since this boilerplate code is created under the hood, resulting in fewer lines of (visible) code.

L3 Constructs

The highest level constructs are the L3 constructs, also called patterns. They are made up of other constructs and provide an easy way to achieve bigger common tasks.

An example of this is the ApplicationLoadBalancedFargateService which creates an ECS Service behind a load balancer and everything else needed for it, optionally including the ECS cluster and a VPC. Thus, the developer does not need to know much about the underlying infrastructure concepts and can work with a simplified configuration interface. This makes it fairly easy to get started with infrastructure automation on AWS.

The Cost of Abstraction

Of course, like any other abstraction layer, this abstraction comes with some drawbacks as well. Not only are configuration options limited, but at some point you might reach the point where you need to gain a deeper understanding, whether for trouble-shooting or to prevent suboptimal usage.

One example is the previously mentioned automatic creation of VPCs. While this can be useful to get started quickly, it can accidentally result in a messy network setup. And having to change something so crucial and central as networking at a later point can result in significant effort.

And the abstraction layer does not only exist for the L3 and L2 constructs, but for general CloudFormation functionality as well. One example of this is the feature to share values across stacks by using the Import/Export functionality. CDK makes use of this functionality, but the fact in itself and its implications might not be obvious to new users, potentially leading to unwanted side effects and issues down the road. But again, this makes it also really easy to get started and to understand stack dependencies.

Conclusion

As shown by the examples above, just using the built-in higher-level CDK constructs already comes with a lot of benefits compared to plain CloudFormation. It reduces the amount of required boilerplate code and results in a much less-steep learning curve thanks to the provided abstraction layer.

But the real benefit actually comes with the possibility to write and share constructs or patterns tailored to the use cases of your own organization. Developing, testing and releasing them following a process already well-established in almost all software development teams and using the equally well-established tooling from the chosen language’s ecosystem makes a huge difference.

Of course, there are some more advantages when using CDK and obviously several disadvantages as well. One main point to definitely be aware of when starting to use CDK is that the learning curve can be quite steep without much prior software development knowledge. Whether this is worth the investment depends - as always - on the circumstances. However, I do believe that with infrastructure setups growing larger and more complicated, the cost of maintaining them becomes an ever more important factor. Thus, there is no way around treating infrastructure setups as software systems, deserving and requiring the same treatment as any other larger piece of software. This obviously includes automated testing. To learn more about that, head over to the main article: Infrastructure Tests with CDK. And if you’re interested in further details about the CDK or sharing your experiences, please drop us a line. We love to hear from you!