Amazon DevOps Guru is a machine learning (ML) powered service that helps developers and operators automatically detect anomalies and improve application availability. DevOps Guru utilizes machine learning models informed by years of Amazon.com and AWS operational excellence in order to identify anomalous application behavior (e.g., increased latency, error rates, resource constraints) and surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies critical issues, it automatically sends an alert to Amazon Simple Notification Service (SNS) and Systems Manager OpsCenter, and then provides a summary of related anomalies on the service console. The summary, or insight, includes the likely root cause, as well as context regarding when and where the issue occurred. When possible, DevOps Guru also provides recommendations for remediating the issue.

Since DevOps Guru was announced at re:Invent 2020, AWS has released additional features and expanded the resources covered. Now that it’s GA, we support CloudWatch Agent Container Insights, and an improved dashboard experience that includes a system health summary based on resources, as well as the existing CloudFormation Stacks view.

As a best practice, AWS recommends multi-region and multi-account deployments as your workloads expand in size and complexity. Enabling Amazon DevOps Guru in multi-account along with AWS Organizations offers benefits, such as centrally controlling which AWS resources DevOps Guru monitors across your organization with quick and centralized deployment. This also lets you configure how DevOps Guru notifies the operations team when it detects anomalies and generates insights. You can send the alarms to a SNS topic, create incident items on Systems Manager OpsCenter, or both.

As seen in this post, configuring DevOps Guru for your multi-account environment using AWS CloudFormation stacks was already possible. The new integration with AWS Systems Manager Quick Setup makes it easier and quicker to configure DevOps Guru across accounts in your AWS Organizations. This post walks you through the steps to use this new feature and utilize the real-time visualization of your configuration status provided by Quick Setup.

Solution overview

Quick Setup is a Systems Manager feature that lets you configure and deploy AWS services quickly with the recommended best practices. By utilizing Quick Setup, you can instantly setup services in individual or across multiple AWS accounts and regions within your AWS Organizations.

Instead of manually logging into individual accounts and configuring each region where your applications run, you can use Quick Setup to enable DevOps Guru across your multiple organizational units (OUs) and regions following the AWS best practices. Once you enable DevOps Guru for the selected OUs and regions, it will monitor your applications to provide insights if it detects anomalous behavior. DevOps Guru surfaces the issue by creating an OpsItem on Systems Manager OpsCenter, and by publishing a message on an SNS topic.

The following diagram shows a typical AWS Organization setup, with multiple organizational units containing different AWS accounts. It shows an administrator account where you enable DevOps Guru from Systems Manager Quick Setup, and a dedicated OpsCenter dashboard on each target account in order to collect the anomalies detected by DevOps Guru through OpsItems.

Architecture diagram depicting how to use Systems Manager Quick Setup to enable DevOps Guru in your Organizational Units from a central account

Figure 1: Target Architecture diagram

Specify which AWS resources you want DevOps Guru to analyze by using AWS CloudFormation stacks, or enable it to monitor every resource in your AWS account and region. Either way, in order to detect the anomalies and create insights, DevOps Guru monitors Amazon CloudWatch metrics along with AWS CloudTrail logs that your AWS resources generate.

Prerequisites

Before getting started, make sure you have these prerequisites:

  • An organization with AWS Organizations. If you are not familiar with AWS Organizations terminology, refer to AWS Organizations terminology and concepts
  • Two or more organizational units (OUs)
  • One or more target AWS accounts in each OU
  • One central (or administrator) account with privileges to manage the target accounts

Enabling DevOps Guru with Systems Manager Quick Setup

Follow these steps to enable DevOps Guru by using Quick Setup:

  1. Choose a home region– If you are using AWS Quick Setup for the first time, choose home region, which is the region where Quick Setup creates the AWS resources used to deploy your configurations.

Note: You can’t change the home region after you choose it.

On the Systems Manager Quick Setup console, select the home region to get started

Figure 2: Get Started with Quick Setup

  1. Choose a configuration type– To get started with Quick Setup for DevOps Guru, choose the DevOps Guru service from the available configuration types. These configuration types help you set up the service or feature to use the recommended best practices.

Under Quick Setup create configuration page, the DevOps Guru configuration type is selected

Figure 3: Quick Setup Configuration Type

  1. Specify configuration options– After setting up a configuration type, you can see the Customize DevOps Guru configuration options By default, the SNS notifications and AWS Systems Manager OpsItems options are enabled. If you want DevOps Guru to analyze all of your organization’s AWS resources in all of your accounts, then select Analyze all AWS resources in all the accounts in my organization.

Note: The configuration options that you choose will apply to every selected AWS account in the Organizational Units and Regions. You’ll be charged by the number of AWS resources hours that DevOps Guru analyzes. Turning on DevOps Guru with such a broad scope can increase your bill. To understand how DevOps Guru will affect your bill, see Estimate Amazon DevOps Guru resource analysis costs to learn how to use our cost estimation tool. Learn more about the costs on DevOps Guru pricing.

The Configuration options section displays options to select whether you want Quick Setup to enable SNS notifications and Systems Manager OpsItems

Figure 4: Customize DevOps Guru Configuration options

  1. In the Schedule section, choose how frequently you want Quick Setup to remediate changes made to resources that differ from your configuration. If you want to apply those changes only once, then select the option Default. If you require a different frequency, then select the option Custom, and choose the frequency in the dropdown. If you don’t want Quick Setup to remediate changes, then choose Disable remediation under Custom.

The schedule section allows you to choose the default schedule to apply configuration options only once, or select the custom option to apply configurations every day, every 7 days, every 14 days, every 30 days, or disable it

Figure 5: Quick Setup Schedule

  1. In the Targets section, choose to allow Amazon DevOps Guru to analyze either your organizational units (OUs), or the account you’re logged in to. If you choose Custom, then in the Target OUs section select the check boxes of the OUs and Regions where you want to enable Amazon DevOps Guru.

Under the Targets section, you can select the option to apply the configurations only on the current account, or select the custom option to choose which OUs and Regions Quick Setup will deploy the configurations

Figure 6: Quick Setup Targets Organizational Units (OUs) for deployment

  1. If you choose Current account, then you can select Current Region or Choose Regions from the list in the region where you want to enable Amazon DevOps Guru.

Under the Targets section, if you select the option Current account, you can select the options to either apply the configurations to the current region, or choose specific regions within the account

Figure 7: Quick Setup Target AWS Regions for deployment

  1. At the end you receive a summary based on your configuration selections. Click on the Create button to create the configuration.

The Summary section describes which options you’ve chosen for review before you confirm the configuration creation

Figure 8: Quick Setup Configurations Summary

  1. After creating your configurations, you will see the details of every configuration.

On the Quick Setup console, you can visualize a list of all the configurations you have created, including information about the configuration type, OUs, regions, and deployment status

Figure 9: Quick Setup Configuration List

  1. To see more configuration details, select any configuration from the list and click View details. Once you are on the Configuration details page, choose to Edit or Delete your configuration. Also, you can Add or Remove OUs and Regions regarding your configurations.

The Configuration details page displays a summary of the target OUs and regions. It also displays the configuration options you have chosen when you created the configuration. From this page, you can edit the configuration by editing the target OUs and regions

Figure 10: Quick Setup Configurations Details

Integrating DevOps Guru with Systems Manager OpsCenter

AWS Systems Manager OpsCenter provides a centralized location for the operations or IT professionals team to view, investigate, and resolve operational work items, or OpsItems, that are related to AWS resources or services. Each OpsItem provides contextually relevant information, such as the name and ID of the AWS resource that generated the OpsItem, alarm or event details, alarm history, and an alarm timeline graph.

Enable/disable OpsItem when configuring DevOps Guru Configurations (as shown above in step 3). By default, AWS Systems Manager OpsItems is enabled for DevOps Guru Configuration. After creating a configuration, you can still enable/disable OpsItem by using Edit configuration options:

Under the Edit configuration options page, the option to enable Systems Manager OpsItem is selected

Figure 11: Quick Setup Edit Configuration Options

If you are setting up OpsCenter for the first time, it creates IAM roles and Default rules for OpsItems, as shown below:

The confirmation page displays a notification Systems Manager will create IAM roles for OpsCenter to enable OpsItems

Figure 12: AWS SSM OpsCenter Setup

Once DevOps Guru detects anomalous behavior on your application, it generates an insight on the target account, i.e., the account where the affected services run. If you have enabled the OpsItems integration, you will see a link to the OpsItem on the DevOps Guru insight:

The DevOps Guru insights page contains a link to the OpsItem created when the anomaly was detected. The link opens the OpsItem page

Figure 13: Sample DevOps Guru Insights

The OpsItems also appears at the OpsCenter console:

The OpsCenter console displays the OpsItem related to the DevOps Guru insight

Figure 14: Sample OpsItem created in OpsCenter

Select your respective OpsItem to get more information about the triggered OpsItem:

The OpsItem details page displays a description of the incident, title, creation date, severity, status, source, and last updated date. It also contains links to the related resources

Figure 15: Sample OpsItem details part-I

When you identify high priority issues, utilize OpsCenter to run Automation runbooks and quickly resolve those issues. By default, OpsCenter tags OpsItems using key:DevOps-GuruInsightSsmOpsItemRelated and value:true. Create or update OpsItem tags as documented on Tagging OpsItems.

On the OpsCenter console, there is a list of available runbooks that can be applied to remediate the incident

Figure 16: Sample OpsItem details part-II

The Similar OpsItems and Related OpsItems features are useful for investigating operational issues while providing context regarding an issue’s scope. The Similar OpsItems feature is a system-generated list of OpsItems that might be related or of interest to you.

The OpsItem details page displays a list of other OpsItems that are potentially related to the current issue

Figure 17: Sample OpsItem details part-III

Also, you can update the status of individual OpsItems as shown below:

The OpsItem details page display an option to update its status to open, in progress, or resolved

Figure 18: Sample OpsItem Set Status Options

OpsCenter shows the summary for all open and in progress OpsItems:

The OpsCenter console displays the total number of OpsItems, how many are open, and how many are in progress

Figure 19: OpsCenter Summary Dashboard

Conclusion

This post demonstrated how you can easily enable DevOps Guru on your multi-account organization by utilizing Systems Manager Quick Setup. Simple configuration from a central (administrator) account lets you configure DevOps Guru to monitor your AWS resources on specific organizational units (OUs) and regions. In conjunction with the integration with the Systems Manager OpsCenter, this gives you dashboards that collect the anomalies detected by DevOps Guru.

Now that Amazon DevOps Guru is generally available, for the first three months you can try it on the free tier that includes 7,200 AWS resource hours per month for free on each resource group A and B. Also, you can Estimate Amazon DevOps Guru resource analysis costs from the AWS console. This feature scans selected resources to automatically generate a monthly cost estimate. Furthermore, refer to Gaining operational insights with AIOps using Amazon DevOps Guru to learn more about how DevOps Guru helps you increase your applications’ availability, and check out this workshop for a hands-on walkthrough of DevOps Guru’s main features and capabilities.

About the authors

Rafael Ramos

Rafael Ramos

Rafael is a Solutions Architect at AWS, where he helps ISVs on their journey to the cloud. He spent over 13 years working as a software developer, and is passionate about DevOps and serverless. Outside of work, he enjoys playing tabletop RPG, cooking and running marathons.

Rahul Gaikwad

Rahul Sharad Gaikwad

Rahul is a Cloud Migration Specialist with Amazon Web Services. He helps customers and partners on their Cloud and DevOps adoption journey. He is passionate about technology and enjoys collaborating with customers. In his spare time, he focus on his PhD Research work. He also enjoys gymming and spending time with his family.