Organizations of all sizes and across all industries gather and analyze metrics or key performance indicators (KPIs) to help their businesses run effectively and efficiently. Operational metrics are used to evaluate performance, compare results, and track relevant data to improve business outcomes. For example, you can use operational metrics to determine application performance (the average time it takes to render a page for an end user) or application availability (the duration of time the application was operational). One challenge that most organizations face today is detecting anomalies in operational metrics, which are key in ensuring continuity of IT system operations.

Traditional rule-based methods are manual and look for data that falls outside of numerical ranges that have been arbitrarily defined. An example of this is an alert when transactions per hour fall below a certain number. This results in false alarms if the range is too narrow, or missed anomalies if the range is too broad. These ranges are also static. They don’t change based on evolving conditions like the time of the day, day of the week, seasons, or business cycles. When anomalies are detected, developers, analysts, and business owners can spend weeks trying to identify the root cause of the change before they can take action.

Amazon Lookout for Metrics uses machine learning (ML) to automatically detect and diagnose anomalies without any prior ML experience. In a couple of clicks, you can connect Lookout for Metrics to popular data stores like Amazon Simple Storage Service (Amazon S3), Amazon Redshift, and Amazon Relational Database Service (Amazon RDS), as well as third-party software as a service (SaaS) applications (such as Salesforce, Dynatrace, Marketo, Zendesk, and ServiceNow) via Amazon AppFlow and start monitoring metrics that are important to your business.

This post demonstrates how you can connect to your IT operational infrastructure monitored by Dynatrace using Amazon AppFlow and set up an accurate anomaly detector across metrics and dimensions using Lookout for Metrics. The solution allows you to set up a continuous anomaly detector and optionally set up alerts to receive notifications when anomalies occur.

Lookout for Metrics integrates seamlessly with Dynatrace to detect anomalies within your operational metrics. Once connected, Lookout for Metrics uses ML to start monitoring data and metrics for anomalies and deviations from the norm. Dynatrace enables monitoring of your entire infrastructure, including your hosts, processes, and network. You can perform log monitoring and view information such as the total traffic of your network, the CPU usage of your hosts, the response time of your processes, and more.

Amazon AppFlow is a fully managed service that provides integration capabilities by enabling you to transfer data between SaaS applications like Datadog, Salesforce, Marketo, and Slack and AWS services like Amazon S3 and Amazon Redshift. It provides capabilities to transform, filter, and validate data to generate enriched and usable data in a few easy steps.

Solution overview

In this post, we demonstrate how to integrate with an environment monitored by Dynatrace and detect anomalies in the operation metrics. We also determine how application availability and performance (resource contention) were impacted.

The source data is a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances that is monitored by Dynatrace. Each EC2 instance is installed with Dynatrace OneAgent to collect all monitored telemetry data (CPU utilization, memory, network utilization, and disk I/O). Amazon AppFlow enables you to securely integrate SaaS applications like Dynatrace and automate data flows, while providing options to configure and connect to such services natively from the AWS Management Console or via API. In this post, we focus on connecting to Dynatrace as our source and Lookout for Metrics as the target, both of which are natively supported applications in Amazon AppFlow.

The solution enables you to create an Amazon AppFlow data flow from Dynatrace to Lookout for Metrics. You can then use Lookout for Metrics to detect any anomalies in the telemetry data, as shown in the following diagram. Optionally, you can send automated anomaly alerts to AWS Lambda functions, webhooks, or Amazon Simple Notification Service (Amazon SNS) topics.

ML 4682 image001

The following are the high-level steps to implement the solution:

  1. Set up Amazon AppFlow integration with Dynatrace.
  2. Create an anomaly detector with Lookout for Metrics.
  3. Add a dataset to the detector and integrate Dynatrace metrics.
  4. Activate the detector.
  5. Create an alert.
  6. Review the detector and data flow status.
  7. Review and analyze any anomalies.

Set up Amazon AppFlow integration with Dynatrace

To set up the data flow, complete the following steps:

  1. On the Amazon AppFlow console, choose Create flow.
    ML 4682 image003
  2. For Flow name, enter a name.
  3. For Flow description, enter an optional description.
    ML 4682 image005
  4. In the Data encryption section, you can choose or create an AWS Key Management Service (AWS KMS) key.
  5. Choose Next.
    ML 4682 image007
  6. For Source name, choose Dynatrace.
  7. For Choose Dynatrace Connection, choose the connection you created.
  8. For Choose Dynatrace object, choose Problems (this is the only object supported as of this writing).

For more information about Dynatrace problems, see Problem overview page.

  1. For Destination name, choose Amazon Lookout for Metrics.
    ML 4682 image009
  2. For API token, generate an API token from the Dynatrace console.
  3. For Subdomain, enter your Dynatrace portal URL address.
  4. For Data encryption, choose the AWS KMS key.
  5. For Connection Name, enter a name.
  6. Choose Connect.
    ML 4682 image011
  7. For Flow trigger, select Run flow on schedule.
  8. For Repeats, choose Minutes (alternatively, you can choose hourly or daily).
  9. Set the trigger to repeat every 5 minutes.
  10. Enter a starting time.
  11. Enter a start date.
    ML 4682 image013

Dynatrace requires a between date range filter to be set.

  1. For Field name, choose Date range.
  2. For Condition, choose is between.
  3. For Criteria 1, choose your start date.
  4. For Criteria 2, choose your end date.
    ML 4682 image015
  5. Review your settings and choose Create flow.
    ML 4682 image017 ML 4682 image019

Create an anomaly detector with Lookout for Metrics

To create your anomaly detector, complete the following steps:

  1. On the Lookout for Metrics console, choose Create detector.
    ML 4682 image021
  2. For Detector name, enter a name.
  3. For Description, enter an optional description.
  4. For Interval, choose the time between each analysis. This should match the interval set on the flow.
    ML 4682 image023
  5. For Encryption, create or choose an existing AWS KMS key.
  6. Choose Create.
    ML 4682 image025

Add a dataset to the detector and integrate Dynatrace metrics

The next step in activating your anomaly detector is to add a dataset and integrate the Dynatrace metrics.

  1. On the detector details, choose Add a dataset.
    ML 4682 image027
  2. For Name, enter the data source name.
  3. For Description, enter an optional description.
    ML 4682 image029
  4. For Timezone, choose the time zone relevant to your dataset. This should match the time zone used in Amazon AppFlow (which picks up from the browser).
  5. For Datasource, choose Dynatrace.
  6. For Amazon AppFlow flow, choose the flow that you created.
  7. For Permissions, choose a service role.
    ML 4682 image031
  8. Choose Next.
  9. For Map fields, the detector tracks 5 measures; in this example I choose impactLevel and hasRootCause.

The map fields are the primary fields that the detector monitors. The fields that are relevant to monitor from an operational KPI should be considered.
ML 4682 image033

  1. For Dimensions, the detector creates segments in measure values. For this post, I choose severityLevel.
    ML 4682 image035
  2. Review the settings and choose Save dataset.
    ML 4682 image037

Activate the detector

You’re now ready to activate the newly created detector.
ML 4682 image039

Create an alert

You can create an alert to send automated anomaly alerts to Lambda functions; webhooks; cloud applications like Slack, PagerDuty, and DataDog; or to SNS topics with subscribers that use SMS, email, or push notifications.

  1. On the detector details, choose Add alerts.
    ML 4682 image041
  2. For Alert Name, enter the name.
  3. For Sensitivity threshold, enter a threshold at which the detector sends anomaly alerts.
  4. For Channel, choose either Amazon SNS or Lambda as the notification method. For this post, I use Amazon SNS.
  5. For SNS topic, create or choose an existing SNS topic.
  6. For Service role, choose an execution role.
  7. Choose Add alert.
    ML 4682 image043

Review the detector and flow status

On the Run history tab, you can confirm that the flows are running successfully for the interval chosen.
ML 4682 image045

On the Detector log tab, you can confirm that the detector records the results after each interval.
ML 4682 image047

Review and analyze any anomalies

On the main detector page, choose View anomalies to review and analyze any anomalies.
ML 4682 image049

On the Anomalies page, you can adjust the severity score on the threshold dial to filter anomalies above a given score.

ML 4682 image051

The following analysis represents the severity level and impacted metrics. The graph suggests anomalies detected by the detector with the availability and resource contention being impacted. The anomaly was detected on June 28 at 14:30 PDT and has a severity score of 98, indicating a high severity anomaly that needs immediate attention.

ML 4682 image053

Lookout for Metrics also allows you to provide real-time feedback on the relevance of the detected anomalies, which enables a powerful human-in-the-loop mechanism. This information is fed back to the anomaly detection model to improve its accuracy continuously, in near-real time.

ML 4682 image055

Conclusion

Anomaly detection can be very useful in identifying anomalies that could signal potential issues within your operational environment. Timely detection of anomalies can aid in troubleshooting, help avoid loss in revenue, and help maintain your company’s reputation. Lookout for Metrics automatically inspects and prepares the data, selects the best-suited ML algorithm, begins detecting anomalies, groups related anomalies together, and summarizes potential root causes.

To get started with this capability, see Amazon Lookout for Metrics. You can use this capability in all Regions where Lookout for Metrics is publicly available. For more information about Region availability, see AWS Regional Services.


About the Author

Sumeeth SiriyurSumeeth Siriyur is a Solutions Architect based out of AWS, Sydney. He is passionate about infrastructure services and uses AI services to influence IT infrastructure observability and management. In his spare time, he likes binge-watching and works to continually improve his outdoor sports.