When it comes to restaurants, consistency is key. Being able to create a salad once is simple, but being able to recreate the exact salad multiple times tends to be more difficult. Little variances between them may cause issues, like too much or not enough salad dressing, ruining the balance of the composed salad. Infrastructure shares the same behavior – while most manually created environments are similar, they aren’t exactly the same and this can cause difficulties when troubleshooting issues. AWS CloudFormation allows you to codify your complex infrastructure graphs in a single template and explicitly dictate the desired state of all your resources. CloudFormation will go and create your resources exactly as described in a safe, consistent, and repeatable way. Updating your resource graphs is simple, too. You update the template to your desired configuration, create a Change Set, and we make the live resources’ configuration, match the desired state.
Sometimes, though, users make out-of-band changes to resources, outside of the purview of CloudFormation. For example, a user might go and change the read/write capacity of an Amazon DynamoDB table through the DynamoDB console. This in response of a late night alarm or perhaps this may even happen automatically, due to automatic scaling configurations. The drift between the template’s desired state and the resource’s actual state introduces undesired inconsistencies, which may cause issues. For example, you lose some of the benefits of CloudFormation: Environment reproducibility and auditing. In some severe cases, this may cause CloudFormation to misinterpret changes and fail to update or delete drifted stacks altogether.
Drift detection and remediation
By combining the drift detection and resource import features, we’ve given you the ability remediate some drift cases, in a safe and efficient way. You can use drift detection on either particular resources or the entire stack. Drift detection compares the desired resource states defined in the template and compares them against the live state values expressed by the resources themselves.
Once drift is detected, you have a few options. The simplest option tends to be updating the template itself to match the current live state. However, this may require potentially undesirable changes to your resource. Another option is to update the resource itself to match the template, but sometimes the live resource state is the desired state. Resource import affords us a third option. By decoupling the resource from the template, with a deletion policy of retain and importing the same resource back, we can adopt the updated property changes without having to execute a resource update that requires replacement.
Remediation in Action
Let’s consider the following scenario.
Our on-call engineer was woken up in the middle of the night, due to an alarm on our DynamoDB table notifying of throttling issues. The engineer was able to diagnose the root cause as not having enough provisioned capacity set for the table. This table in particular, didn’t have consistent load and the alarm frequently fired. The engineer quickly worked to resolve the issue, by enabling Pay Per Request, allowing the table to scale to meet demand, without the need of extensive capacity planning.
The engineer thought they took the appropriate corrective actions – resolving the alarm. However, their actions ended up causing a drift between the CloudFormation template that created the resource itself and the resource. We have been assigned to remediate this drift.
The original DynamoDB table has the following template snippet:
Resources: GamesTable: Type: AWS::DynamoDB::Table Properties: TableName: Games AttributeDefinitions: - AttributeName: key AttributeType: S KeySchema: - AttributeName: key KeyType: HASH BillingMode: PROVISIONED ProvisionedThroughput: ReadCapacityUnits: 5 WriteCapacityUnits: 10
We can run drift detection and view the following result:
We want to remediate the consistency issues, as we don’t want a state where our resource’s values differ from our template’s values. To do this, we can follow these steps:
1. Update our CloudFormation template to add a
DeletionPolicy: Retain on the resource, and update our stack.
Resources: GamesTable: Type: AWS::DynamoDB::Table DeletionPolicy: Retain Properties: TableName: Games AttributeDefinitions: - AttributeName: key AttributeType: S KeySchema: - AttributeName: key KeyType: HASH BillingMode: PROVISIONED ProvisionedThroughput: ReadCapacityUnits: 5 WriteCapacityUnits: 10
2. Remove the resource from our template and update our stack once more. Check to make sure that the resource no longer exists in our CloudFormation stack (the Games table in this example).
3. Re-add the resource to our template, but this time update provisioned throughput property to match the live state of the resource.
Resources: GamesTable: Type: AWS::DynamoDB::Table DeletionPolicy: Retain Properties: TableName: Games AttributeDefinitions: - AttributeName: key AttributeType: S KeySchema: - AttributeName: key KeyType: HASH BillingMode: PAY_PER_REQUEST
4. Select our stack and Import resource into stack, from the Stack actions menu. Follow the wizard to import our updated template.
5. Make sure to enter the correct identifier value to identify the resource to import.
6. Finally, click on Import resources to import the resources back to the stack in the current state.
CloudFormation will now import the resource into our stack.
Resource import and drift detection are available now and you can start using them today. They are available in US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), EU (London), EU (Paris), EU (Stockholm), South America (São Paulo), Middle East (Bahrain), Asia Pacific (Hong Kong), China (Ningxia), China (Beijing), and AWS GovCloud (US).
We walked you through a use case of detecting drift on a DynamoDB table resource, as well as abandoning and re-importing the resource to preserve the intended resource configuration. For stateful resources like databases and queues, this ‘abandon and re-import’ strategy is the safest way to ensure that your templates continue to reflect the most accurate, up-to-date state over time. We look forward to hearing your feedback!
About the Authors
Dan is a Developer Advocate for AWS CloudFormation based in Seattle. Dan writes blogs, templates, code, and tips to consistently improve the developer experience for CloudFormation users. When Dan’s not bolted to his laptop, you can find him playing a new board game or cooking up something in the kitchen. You can find more of Dan on Twitter (@TheDanBlanco) or on the AWS Developers #cloudformation Slack Channel.
Prabhu is a Sr. Software Development Manager on the AWS CloudFormation team. He is passionate about all things distributed and enjoys growing customer-centric development teams at Amazon. Before joining AWS, Prabhu led various teams delivering monitoring solutions for IT systems and network infrastructure, data center automation and build systems.