At re:Invent 2020, we launched Amazon Managed Service for Prometheus, a fully managed Prometheus-compatible service in Preview on AWS. It is a secure and scalable service customers can utilize to collect infrastructure and application metrics from workloads hosted on various environments, such as Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Compute Cloud (EC2), and also from hybrid environments like on-premises virtual machines.

Metrics is a key pillar of observability, and Prometheus is a highly popular open-source metric monitoring solution that graduated from the Cloud Native Compute Foundation (CNCF). Prometheus is widely supported in modern, container-based environments, and many popular workloads such as NGINX, Java/JMX, Envoy, and dozens of other workloads natively export metrics in Prometheus format. This allows customers to easily collect and monitor health and performance.

Customers utilizing Amazon Managed Service for Prometheus benefit from the robust Prometheus-compatible backend that they do not have to manage. Simply deploying a collection agent such as AWS Distro for OpenTelemetry Collector (ADOT Collector) or Prometheus Server lets customers securely remote write the metrics to Amazon Managed Service for Prometheus.

Since we launched the service in Preview, our customers are utilizing the service to monitor various workloads, and they have provided invaluable feedback for developing service feature improvements. We are excited to announce that the Amazon Managed Service for Prometheus is now Generally Available.

What’s New

  • Alert Manager and Rules management
    • Customers can utilize Prometheus compatible Alert Manager and Rules management capability in Amazon Managed Service for Prometheus. This will alleviate the need for customers to host their own Alert and Rules management capabilities. Configuration and features details can be found below in the “Alert Manager deep dive” section.
  • Tagging support
    • Customers can now tag their workspaces to help manage, identify, organize, filter, and control access. This new feature lets customers add tags to workspaces to identify their resources, provide granular access control to workspaces, and identify cost utilization for workspaces based on tags.
  • CloudFormation and CDK (AWS Cloud Development Kit) support
    • Automating, creating, and managing the lifecycle of a workspace, rules, and alert management configuration is critical for operations. The GA launch lets customers utilize AWS CloudFormation and CDK to create and manage Amazon Managed Service for Prometheus workspaces, associate AWS Tags, as well as create and manage alerting and recording rules.
  • Availability in additional AWS Regions
    • We are adding three new Asia Pacific and Europe Regions. As a result, customers can now create Amazon Managed Service for Prometheus workspaces in the following AWS Regions: US East-1 (N.Virginia), US-East-2 (Ohio), US-West-2 (Oregon), EU-Central-1 (Frankfurt), EU-West-1 (Ireland), EU-North-1 (Stockholm), AP-SOUTHEAST-2 (Sydney), AP-NORTHEAST-1 (Tokyo), and AP-SOUTHEAST-1 (Singapore).
  • CloudTrail integration
    • Amazon Managed Service for Prometheus users can now see an expanded set of logs in AWS CloudTrail on workspace, as well as alert and recording rules operations such as creation, modification, and deletion.

Configuring Recording Rules & Alert Manager

The steps below walk through the configuration of rules and the alert manager in Amazon Managed Service for Prometheus. Alert Manager currently supports Amazon Simple Notification Service (SNS) as a receiver destination.

Prerequisites

Ensure you have the following installed and configured in order to follow the setup instructions:

Setup Amazon Managed Service for Prometheus

Customers utilizing Prometheus in their container environments face challenges when managing a highly-available, scalable, and secure Prometheus server environment, infrastructure for long-term storage, and access control. Amazon Managed Service for Prometheus solves these problems by providing a fully-managed service that is tightly integrated with AWS Identity and Access Management (IAM) in order to control authentication and authorization, and also provides auditing abilities via CloudTrail logs. Start utilizing Amazon Managed Service for Prometheus by following these two steps:

  • Create an Amazon Managed Service for Prometheus workspace.
  • Configure your AWS Distro for Open Telemetry to remote-write into the Amazon Managed Service for Prometheus workspace. You can also utilize the Prometheus server to scrape the metrics and send them to Amazon Managed Service for Prometheus by following this link.

Create a workspace

A workspace is a logical space dedicated to the storage, alerting, and querying of metrics from one or more Prometheus servers. A workspace supports fine-grained access control for authorizing its management, such as update, list, describe, and delete, as well as metrics ingestion and querying.

To set up a workspace using CLI, run the following command:

aws amp create-workspace --alias my-first-workspace

This command returns the following data:

  • workspaceId is the unique ID for this workspace. Note this ID.
  • arn is the ARN for this workspace.
  • status is the current workspace status. Immediately after you create the workspace, this will probably be CREATING.

Using CloudFormation to create a workspace

The AWS::APS::Workspace type specifies an Amazon Managed Service for Prometheus workspace. In order to declare this entity in your AWS CloudFormation template, utilize the following syntax:

Resources: APSWorkspace: Type: AWS::APS::Workspace Properties: Alias: <your workspace name> Tags: - Key: Key1 Value: Value1

Add a tag to a workspace

Adding tags to a workspace can help identify and organize your AWS resources as well as manage their access. First, add one or more tags (key-value pairs) to a workspace. After you have tags, create IAM policies to manage workspace access based on these tags. Utilize the console or the AWS CLI in order to add tags.

aws amp tag-resource --resource-arn arn:aws:aps:us-west-2:<your-account-id>:workspace/IDstring --tags Status=Secret,Team=My-Team

Ingest Prometheus metrics to a workspace using AWS Distro for OpenTelemetry

This section describes how to configure the AWS Distro for OpenTelemetry (ADOT) Collector to scrape from a Prometheus-instrumented application, and then send the metrics to Amazon Managed Service for Prometheus.

Collecting Prometheus metrics with ADOT involves two OpenTelemetry components: the Prometheus Receiver and the AWS Prometheus Remote Write Exporter. Configure the Prometheus Receiver by using your existing Prometheus configuration to conduct service discovery and metric scraping. The Prometheus Receiver scrapes metrics in the Prometheus exposition format. Any applications or endpoints you want to scrape should be configured with the Prometheus client library. The Prometheus Receiver supports the full set of Prometheus scraping and re-labeling configurations as described in Configuration in the Prometheus documentation. Paste these configurations directly into your ADOT Collector configurations.

The AWS Prometheus Remote Write Exporter utilizes the remote_write endpoint in order to send the scraped metrics to your management portal workspace. The HTTP requests to export data will be signed with AWS SigV4, which is the AWS protocol for secure authentication. For more information, see Signature Version 4 signing process.

Before beginning the following ingestion setup steps, set up your IAM role for service account and trust policy. Create the IAM role for the service account by following the steps in Set up service roles for the ingestion of metrics from Amazon EKS clusters. The ADOT Collector will use this role when scraping and exporting metrics.

Next, edit the trust policy for the amp-iamproxy-ingest-role created in the previous step, and then replace aws-amp with adot-col. The final policy will appear as follows:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::account-id:oidc-provider/oidc.eks.aws_region.amazonaws.com/id/openid" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.eks.aws_region.amazonaws.com/id/openid:sub": "system:serviceaccount:adot-col:amp-iamproxy-ingest-service-account" } } } ]
}

Utilize a sample application in order to demonstrate the metric scraping using ADOT and remote write the metrics to the workspace. Fork and clone the sample app from the repository at aws-otel-community.

Execute the following commands:

cd ./sample-apps/prometheus
docker build . -t prometheus-sample-app:latest

Push the image to registry, such as ECR or DockerHub, and run the following command by changing the image to the one that you just pushed by setting the image property to the container image repo URL in the prometheus-sample-app.yaml file.

curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/examples/eks/aws-prometheus/prometheus-sample-app.yaml -o prometheus-sample-app.yaml
kubectl apply -f prometheus-sample-app.yaml

Now, run the following command in order to deploy the ADOT collector:

curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/examples/eks/aws-prometheus/prometheus-daemonset.yaml -o prometheus-daemonset.yaml

Then, edit the prometheus-daemonset.yaml file, substituting the remote_write endpoint for your Amazon Managed Service for Prometheus workspace for YOUR_ENDPOINT and your Region for YOUR_REGION. Utilize the remote_write endpoint displayed in theAmazon Managed Service for Prometheus console when viewing your workspace details.

As well, change YOUR_ACCOUNT_ID in the service account section of the Kubernetes configuration to your AWS account ID.

In this example, the ADOT Collector configuration utilizes an annotation (scrape=true) in order to tell which target endpoints to scrape. This allows the ADOT Collector to distinguish the sample app endpoint from kube-system endpoints in your cluster. Remove this from the re-label configurations if you want to scrape a different sample app.

Run the following command in order to deploy the ADOT collector and verify if it is running successfully:

kubectl apply -f prometheus-daemonset.yaml

Test if the metrics have been received by Amazon Managed Service for Prometheus by using awscurl. This tool lets you send HTTP requests through the command line with AWS Sigv4 authentication. This means that you must have AWS credentials set up locally with the correct permissions in order to query from Amazon Managed Service for Prometheus.

In the following command, replace Amazon_Managed_Service_for_Prometheus_ENDPOINT with the information for your Amazon Managed Service for Prometheus workspace:

awscurl --service="aps" --region="Amazon_Managed_Service_for_Prometheus_REGION" "https://Amazon_Managed_Service_for_Prometheus_ENDPOINT/api/v1/query?query=adot_test_gauge0"

Alert Manager deep dive

This section will walk you through configuring rules and the alert manager in Amazon Managed Service for Prometheus via the AWS management console.

Recording Rules & Alerting Rules

Amazon Managed Service for Prometheus integrates the collecting and processing of time series data in an alerting system. Prometheus supports two types of rules that may be configured and then evaluated at regular intervals: recording rules and alerting rules. Customers can utilize PromQL to configure rules. The same query language used for ad-hoc queries and dashboarding is also used to define alerting rules.

Recording rules let you precompute frequently needed or computationally expensive expressions, and then save their result as a new set of time series. Then, querying the precomputed result will often be much faster than executing the original expression whenever it is needed.

Alerting rules let you define alert conditions based on Prometheus expression language expressions and send notifications to Alert Manager. Whenever the alert expression results in one or more vector elements at a given point in time, the alert counts as active for these elements’ label sets.

Recording and alerting rules exist in a rule group. Rules within a group are run sequentially at a regular interval, and with the same evaluation time. The recording rule names must be valid metric names. The alerting rules names must be valid label values.

In the following rules file example, we define an alerting rule that causes the Alert Manager to send a notification if a certain condition (defined in expr) holds true for a specified time period (for).

cat << EOF > rules.yaml
groups: - name: test rules: - record: metric:recording_rule expr: rate(adot_test_counter0[5m]) - name: alert-test rules: - alert: metric:alerting_rule expr: rate(adot_test_counter0[5m]) > 0.014 for: 5m
EOF

The configuration shows that we are running an expression for the metric “adot_test_counter0” over five minutes, and that alerts will be fired when the value exceeds the threshold “0.014” for five minutes. For more examples of configuring alert rules for different use-cases, refer to the link.

Once the file is created, upload the YAML file in the Amazon Managed Service for Prometheus console:

Rules management tab on Amazon Managed Service for Prometheus service console

Figure 1: Rules management

Alert Manager

The Alert Manager handles alerts sent by client applications, such as the Prometheus server. It handles deduplicating, grouping, and routing them to the correct SNS receiver. Customers can silence alerts, for example, if you know there is planned maintenance. Inhibition rules let customers suppress alerts in order to reduce alert fatigue. For example, firing one cluster alert and muting individual node or container alerts.

Alerting rules configured will send alerts to the Alert Manager, which can route notifications to SNS via the SNS receiver. Amazon Managed Service for Prometheus’s Alert Manager currently supports SNS as its destination. In turn, this can send notifications to different destinations such as Slack, PagerDuty, OpsGenie, etc.

The following YAML displays an example of an alert manager definition that sends notifications to SNS. Make sure you have created the SNS topic and subscriptions before proceeding by following this link.

cat << EOF > alertmanager.yaml
alertmanager_config: | route: receiver: 'default' receivers: - name: 'default' sns_configs: - topic_arn: arn:aws:sns:us-east-1:209466312345:Amazon_Managed_Service_for_Prometheus-AlertManager sigv4: region: us-east-1 attributes: key: severity value: SEV2
EOF

Once the file is created, it can be uploaded in the Amazon Managed Service for Prometheus workspace under the Alert manager tab:

Alert manager tab showing alert configuration on Amazon Managed Service for Prometheus service console

Figure 2: Alert configuration details

The SNS topic specified in the configuration must have the right access policy for Amazon Managed Service for Prometheus in order to send messages to the topic:

{ "Sid": "Allow_Publish_Alarms", "Effect": "Allow", "Principal": { "Service": "aps.amazonaws.com" }, "Action": [ "sns:Publish", "sns:GetTopicAttributes" ], "Resource": "arn:aws:sns:<region-code>:<account_id>:<topic_name>"
}

Once the above steps are complete, verify the setup end-to-end by using Amazon Managed Service for Prometheus as a datasource in an Amazon Managed Grafana instance. Look for the metric “metric:recording_rule”, and, if you successfully find the metric, then you’ve successfully created a recording rule:

Grafana console showing a metric created as a result of recording rule in Amazon Managed Service for Prometheus

Figure 3: Recording rule metric on Grafana console

Validate the setup by giving it some load by running the following commands.

Replace the $WORKSPACE_ID with the ID of your Amazon Managed Service for Prometheus workspace:

awscurl https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/rules  --service="aps"

You should see a success result which confirms that the alert has been fired. We can query the Alert Manager endpoint to confirm the same:

awscurl https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/alertmanager/api/v2/alerts --service="aps" -H "Content-Type: application/json"

Clean up

Run the following command to terminate the Amazon Managed Service for Prometheus workspace. Make sure you delete the EKS Cluster created as well:

aws amp delete-workspace <your workspace alias>

Next steps

Try Amazon Managed Services for Prometheus today! With the service now Generally Available and in additional Regions, we are excited to see what our customers will build with it.

Also, engage AWS Partners such as AppDynamics, Contino, Dynatrace, kubecost, NTT Data, Rafay, Reply, Tech Mahindra, and Wipro in order to provide experience, knowledge, and additional services to ingest or query operational metrics from Amazon Managed Service for Prometheus. To learn more about our partners, visit our partner page.

We also published additional blogs covering other topics as part of the GA launch:

Customers can also utilize Amazon Managed Grafana for visualizing metrics from Amazon Managed Service for Prometheus and many other data sources. Amazon Managed Grafana became Generally Available during the end of August, and we encourage you to read this blog post that dives into that service. To get a hands-on experience of the service, investigate the Observability Workshop.

To learn more and see Amazon Managed Service for Prometheus in action, register for our webinar and join us on October 6, 2021 at 9:30am PST. We look forward to seeing you there.

About the authors

Imaya

Imaya Kumar Jagannathan

Imaya is a Principal Solution Architect focused on AWS observability services including Amazon CloudWatch, AWS X-Ray, Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS Distro for Open Telemetry. He is passionate about monitoring and observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about microservice architecture design. He loves programming on C#, working with containers and serverless technologies. Find him on Twitter & LinkedIn – @imaya.

Vikram Venkataraman

Vikram Venkataraman

Vikram Venkataraman is a Senior Technical Account Manager at Amazon Web Services and also a container enthusiast. He helps organization with best practices for running workloads on AWS. In his spare time, he loves to play with his two kids and follows Cricket.

marc chene

Marc Chene

Marc is a Principal Product Manager focused on monitoring microservices and containers for modern application environments. Marc works with customers to understand, build trust, and deliver the best user experience in an agile way. Currently he is focused on delivering the best observability experience across time series data such as metrics, logs, and distributed tracing using CloudWatch and open source tooling such as Grafana and Prometheus.