Prometheus is a popular open-source metrics monitoring solution that is widely used in a variety of workloads. Although it’s common for customers to use Prometheus to monitor container workloads, it’s also used to monitor Amazon Elastic Compute Cloud (Amazon EC2) instances and virtual machines (VMs) and servers in on-premises environments.
Amazon Managed Service for Prometheus (AMP) is a Prometheus-compatible monitoring service for infrastructure and application metrics that makes it easy for customers to securely monitor their workloads at scale. Customers using Prometheus in self-hosted environments face challenges in managing a highly available, scalable, and secure Prometheus server environment, infrastructure for long-term storage, and access control. AMP solves these problems by providing a fully managed environment that is tightly integrated with AWS Identity and Access Management (IAM) to control authentication and authorization.
To start using AMP, complete these two simple steps:
- Create an AMP workspace.
- Configure your Prometheus server to remote write into the AMP workspace.
To remote write into your workspace, you need an IAM role with IAM permissions and policies. This poses a challenge for on-premises environments where IAM roles aren’t available to the instance. A common solution to this problem is to use programmatic access keys that are essentially long-term credentials stored in a secure location and retrieved by the application during startup. This approach makes it difficult to comply with best practices like the rotation of the credentials.
A better approach is the use of temporary credentials using AWS Security Token Service (AWS STS), but this requires the use of identity federation (SAML, OIDC, and so on) and changes in the remote write part of Prometheus.
You can use Systems Manager to manage your infrastructure on AWS and your on-premises resources. You can use the Systems Manager console to view operational data from AWS services and automate operational tasks across your AWS resources. Systems Manager helps you maintain security and compliance by scanning your managed instances and reporting on (or taking corrective action on) any policy violations it detects.
A managed instance is a machine configured for use with Systems Manager. Supported machine types include EC2 instances, on-premises servers, and VMs, including VMs in other cloud environments. Supported operating system types include Windows Server, macOS, Raspbian, and multiple distributions of Linux.
When Systems Manager is configured to manage hybrid environments, the SSM Agent is deployed to those instances and an IAM role must be created for them. The SSM Agent will go over an activation process using TLS and Amazon or private certificates using AWS Certificate Manager (ACM). Most modern operating systems (Windows and Linux) already include Amazon certificates. (Only one certificate is required.) For information about installing a certificate manually, see Install a TLS certificate on on-premises servers and VMs.
What about Prometheus?
During the registration process of the SSM Agent, a credential file is created in the home path of the user running the SSM Agent (by default,
root). The SSM Agent will keep this file updated by requesting temporary credentials through AWS STS. It assigns a role to the instance that you specified during the activation process. The same credentials can be used for remote write operations in your AMP cluster by configuring the required permissions.
Figure 1: Solution architecture
Configure the SSM Agent
The SSM Agent is an OpenSource project. You can access the public repository on GitHub. In this blog post, I’ll follow the steps in Setting up AWS Systems Manager for hybrid environments. I assume that Systems Manager is already configured in your environment, as described in Step 1: Complete general Systems Manager setup steps.
Create an AMP workspace
The following script will create an AMP workspace in the us-east-1 Region. If you prefer, you can change the
WORKLOAD_REGION variable to use another AWS Region where AMP is supported.
Create an IAM service role for the hybrid environment
Use the following commands to create an IAM role with a policy that allows Systems Manager to assume the role on behalf of your VM.
Assign some policies to this empty role. The first policy,
AmazonSSMManagedInstanceCore, is needed for basic operations performed by the SSM Agent. The second policy,
AmazonPrometheusRemoteWriteAccess, allows the role to perform remote write operations into the AMP workspace you created earlier.
Create a managed instance activation for a hybrid environment
To set up servers and VMs in your hybrid environment as managed instances, you need to create a managed instance activation. After you successfully complete the activation, you immediately receive an activation code and activation ID. You specify this code and ID combination when you install the SSM Agent on servers and VMs in your hybrid environment. The code and ID provide secure access to Systems Manager from your managed instances. For more information, see Setting up AWS Systems Manager for hybrid environments.
This credential pair is used to register the VM in Systems Manager. It will not be preserved or used to communicate with the service. After the instance is registered, the SSM Agent will generate an asymmetric key pair and use it to obtain the temporary credentials required to function properly. This pair is uniquely tied to this machine. You can remove the registration from Systems Manager at any time, which makes it a better option than long-term credentials.
I won’t dive deep into the options for creating this activation. You should enforce sensitive values in this command like the number of instances that can be registered with this combination of code and ID (in this case, one), the expiration date of the activation window (the time this pair can be used to activate new servers), and proper tagging.
Make a note of the activation ID and code. You’ll need them in the next step.
Install the SSM Agent in a hybrid environment
Execute the rest of the commands in this post in the on-premises VM.
I’m using an Ubuntu 20.04 instance running on VirtualBox. The steps to install and configure this instance are beyond the scope of this post. I installed the instance with the minimum requirements and updated it before starting. For instructions for Linux, see Install SSM Agent for a hybrid environment (Linux). For instructions for Windows, see Install SSM Agent for a hybrid environment (Windows).
On the VM, install the SSM Agent using the prebuilt Debian package:
Next, register the SSM Agent with your account using the activation ID and code:
If the process is successful, you’ll see a message like the following that includes the managed instance ID:
To confirm that the instance is reporting properly, in the Systems Manager console, choose Fleet Manager. The instance should be displayed and the SSM Agent status should be
Online. After a few seconds, the information about the instance should be populated along with the tags passed to the activation request.
Figure 2: Instance overview in Fleet Manager
The SSM Agent will manage the credentials in the root folder of the user that executed the agent (by default,
To check if the file exists and is not empty:
Install a Prometheus server
Now that the SSM Agent is providing a set of credentials to the
root user on the instance, you can install the Prometheus server to start exporting metrics to your AMP workspace. Use the following commands to download Prometheus into a new folder:
Now, configure Prometheus to send metrics (
remote_write) to your AMP workspace and then start Prometheus.
Note: In this sample I’m running, Prometheus is in foreground and from a temporary folder. This won’t be practical for most scenarios. You will likely have to run Prometheus as a system service. In those cases, be aware that Prometheus must share the same credential file. The AWS SDK will look for it in the home folder of the user (
$HOME/.aws/credentials). For simplicity, I’m running both processes as
root user. Depending on your OS, you might have to take precautions to avoid sharing the same user and apply least privileges permissions.
After the Prometheus server is up and running, the metrics will be sent to the AMP remote_write destination. You can visualize the metrics by installing Grafana on your local environment or by creating an Amazon Managed Grafana workspace.
The following figure shows how to visualize metrics by querying AMP through an AMG workspace.
Figure 3: go_gc_duration_seconds_count
Use Grafana Agent instead of Prometheus server
The Grafana Cloud Agent is a open-source, lightweight alternative to running a full Prometheus server. It keeps the parts required for discovering and scraping Prometheus exporters and sending metrics to the backend (in this case, AMP), removing subsystems such as the storage, query, and alerting engines.
In this section, I’ll show you how you can deploy the Grafana Cloud Agent to collect metrics as an alternative to the Prometheus server. If the Prometheus server is still running, press
Control - C to close the session in the console, and then execute the following commands to install the Grafana Cloud Agent:
This Grafana Cloud Agent configuration enables the
node_exporter module. If you check the metrics available for AMG now, you’ll find more information available. The Grafana Cloud Agent is sending that information.
Figure 4: node_netstat_Tcp_InSegs
Use the OpenTelemetry Collector
AWS Distro for OpenTelemetry Collector is an AWS-supported version of the upstream OpenTelemetry Collector. It’s distributed by Amazon and supports the selected components from the OpenTelemetry community. It is fully compatible with AWS computing platforms, including Amazon EC2, Amazon Elastic Container Service, and Amazon Elastic Kubernetes Service. It enables users to send telemetry data to Amazon CloudWatch metrics, traces, and logs and other supported backends.
In this section, I’ll show you how you can deploy the AWS Distro for OpenTelemetry Collector to collect metrics as an alternative to using the Prometheus server and Grafana Cloud Agent. If the Grafana Cloud Agent is still running, press
Control - C to close the session in the console, and then execute the following commands to install the AWS Distro for OpenTelemetry Collector:
I am running the Amazon Distro for Open Telemetry (ADOT) Collector as the root user in order to reuse the shared credential file. The default user for the OTEL Collector (
aot) will not have access to the shared credential file in the
root user home folder. You can see the OTEL Collector is sending metrics in AMG:
Figure 5: otelcol_process_runtime_total_alloc_bytes
To avoid ongoing charges in your AWS account, run the following commands to delete the resources you created. You will also need to clean up or terminate your VM.
rm -f SSMService-Trust.json aws iam detach-role-policy --role-name SSMServiceRoleRemoteWrite --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess aws iam detach-role-policy --role-name SSMServiceRoleRemoteWrite --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore aws iam delete-role --role-name SSMServiceRoleRemoteWrite aws amp delete-workspace --workspace-id $WORKSPACE_ID --region $WORKLOAD_REGION
In this blog post, I showed you how you can set up a secure environment to collect Prometheus metrics from an on-premises VM and remote write metrics to AMP. The SSM Agent plays a key role here by providing temporary credentials to the Prometheus server and rotating the authentication keys regularly. For more information, see About SSM Agent.
You can easily collect Prometheus metrics from Amazon EKS, Amazon ECS, and EC2 instances. For more information, see these resources:
- Getting Started with Amazon Managed Service for Prometheus
- Using Amazon Managed Service for Prometheus to monitor EC2 environments
- Automating the installation and configuration of Prometheus using Systems Manager documents
- Metrics collection from Amazon ECS using Amazon Managed Service for Prometheus
- Hands-on experience using the Observability Workshop