Today, AWS X-Ray is pleased to announce the general availability of Insights, a feature that helps you proactively detect performance issues in your applications. AWS X-Ray helps developers and DevOps engineers analyze and debug production environments and distributed applications, such as those built using a microservices architecture. Using anomaly detection, X-Ray Insights determines if the fault rates for your services are outside the normal range and creates actionable insights that address these questions:
- What is the underlying issue?
- What is the root cause?
- What is the impact to end users?
You can use Insights to create notifications that alert your operations team in real time. Notifications help your team respond more quickly when troubleshooting is required, thus improving application availability.
When you’re troubleshooting applications that are based on distributed microservices architecture, it can be challenging to correlate user-reported issues with problems in the underlying services. Operations teams often spend a lot of time looking at the metrics, logs, and trace data of multiple microservices to determine the root cause. Even after the root cause is identified, it is hard to determine which other services are affected and how many end user workflows or business transactions have been impacted.
You want this information upfront so that your engineers can focus on resolving the issue. You also want to detect issues proactively instead of waiting for end users to report problems.
AWS X-Ray Insights uses statistical modeling to train on fault rate data for all services and creates a prediction band of acceptable fault rates. Any seasonal and cyclical variations are taken into account while training. If the fault rates go beyond the acceptable range, an insight that contains information about the root cause service, affected services, and impact to user requests is generated. X-Ray also creates an incident timeline and records all important events as the incident progresses. Any changes to the overall end user impact or affected services are captured as an event in the incident. Notifications for these events are sent to Amazon EventBridge and can be forwarded to any target or integrated into your internal operational processes to notify your operations team or on-call engineers. With X-Ray Insights, your teams are no longer burdened with triage tasks because Insights automates the process of identifying the issue, its root cause, and its impact.
X-Ray Insights use cases
Let’s look at some use cases for Insights.
Figure 1 shows a list of insights and details for each that include description, duration, root cause service, anomalous services, and more. You can use this page to quickly identify issues in your application. Insights are generated per X-Ray group to help you pin down the affected user flows or parts of the application.
Identify root cause and affected services
When you choose an insight, you’ll see a trace map that highlights the anomalous services and the root cause of the issue. This eliminates the grunt work common in the triage process. A developer or DevOps engineer can quickly start troubleshooting the service identified as the root cause.
You can also see fault rates over time for all anomalous services. In Figure 3, the pink band shows the acceptable range of fault rates for a service based on training data. The red line shows the actual fault rate. The www and API services showed anomalous behavior. When the fault rate for the www service went beyond the prediction band, it was recorded as an insight. Similar behavior was observed for the API service.
Understand overall user impact
In Figure 4, you can see the impact of the incident to the client or end user. In this case, the failure rate for client requests was between 40-70% as the incident progressed.
Send real-time alerts
X-Ray records events for each insight to capture all changes to the incident. By default, each insight has at least two events: one when the insight is created and another when it is closed. Any major change in user or service impact is captured as an event.
X-Ray can send these events to EventBridge and you can configure EventBridge to forward it to any destination you choose. You can use these notifications to alert your operations team in real time or to take automated actions to fix the issue. To learn more about how to use Insights notifications, see the Send real-time alerts about application anomalies using AWS X-Ray Insights blog post.
To start using AWS X-Ray Insights, go to the AWS X-Ray console, choose Groups, and then choose Enable insights. You do not need additional instrumentation to use this feature. X-Ray will run the anomaly detection algorithm on incoming traces to generate insights.
X-Ray Insights is available in all commercial regions. For information about the cost of using X-Ray Insights, see the pricing page.
About the Author
Nikhil Shetty is a Sr. Product Manager in AWS focused on monitoring distributed applications built using microservices architecture. Currently, he is working on developing various features for the distributed tracing service, AWS X-Ray.