A key challenge for any developer operations team is to gain full observability of a service’s health. You may already use great monitoring products from providers such as Amazon, Google, Splunk, and others. However, most of these vendors define their own data specification for metrics, traces, and logs. It is difficult for customers to switch their monitoring data to different vendors and visualization consoles. Doing so requires spending a great deal of effort changing code in their applications. Also, applications are increasingly using complex distributed services to solve real problems. A distributed monitoring framework, such as OpenTelemetry, is necessary to correlate monitoring data in an easy way and is critical for greater service visibility and maintenance.

OpenTelemetry is an open source framework that defines a monitoring data model and data correlation ability that can be adapted by any monitoring vendor. AWS’s leading monitoring service, Amazon CloudWatch, is providing an open source distribution of OpenTelemetry called AWS Distro for OpenTelemetry to make getting started with OpenTelemetry easier. In this post, we introduce how to send application metrics to CloudWatch with AWS Distro for OpenTelemetry.

Deep dive on CloudWatch support with OpenTelemetry metrics

OpenTelemetry provides different language SDKs to instrument code for collecting metrics data in an application. To enhance automation abilities, the data collector software can run as a daemon service to process monitoring data from various applications. AWS Distro for OpenTelemetry extends the original OpenTelemetry with native support for CloudWatch and AWS X-Ray services.

diagram illustrates CloudWatch Metrics data workflow with AWS Distro for OpenTelemetry

Metrics instrumentation in application

The auto-instrumentation for metrics does not have support in OpenTelemetry yet. For now, we have to instrument our code to generate application metrics manually. In a Java application, for example, we need the following OpenTelemetry SDK dependencies to create metrics instruments with OpenTelemetry:

  • implementation (“io.opentelemetry:opentelemetry-sdk-metrics:0.9.0-SNAPSHOT”)
  • implementation (“io.opentelemetry:opentelemetry-exporters-otlp:0.9.0-SNAPSHOT”)

You can find the sample code for setting up application metrics instrumentation in our AWS Distro for OpenTelemetry tutorials.

In the next section, we discuss how to use AWS Distro for OpenTelemetry Collector to send OpenTelemetry metrics to CloudWatch with CloudWatch EMF logs.

Support CloudWatch metrics with AWS Distro for OpenTelemetry Collector

AWS Distro for OpenTelemetry Collector (AWS OTel Collector) is one part of AWS Distro for OpenTelemetry built on the OpenTelemetry Collector by AWS. It inherits all of OpenTelemetry Collector’s features, including configuration setup. In this section, we explain how AWS OTel Collector supports OpenTelemetry metrics.

To learn more about AWS Distro OpenTelemetry Collector, visit the AWS Distro for OpenTelemetry Collector installation guide. Once installed, AWS OTel Collector provides a default configuration automatically.

receivers: otlp: protocols: grpc: endpoint: localhost:55680 http: endpoint: localhost:55681
processors: batch/traces: timeout: 1s send_batch_size: 50 batch/metrics: timeout: 60s
exporters: awsemf: awsxray:
service: pipelines: traces: receivers: [otlp] processors: [batch/traces] exporters: [awsxray] metrics: receivers: [otlp] processors: [batch/metrics] exporters: [awsemf]

The default configuration enables AWS EMF exporter (awsemf) in the metrics pipeline. It converts OpenTelemetry metrics to CloudWatch EMF batched logs for sending to the CloudWatch backend. Using CloudWatch, we can query EMF logs with log insights, visualize via dashboards, and create metric alarms.

Extending CloudWatch EMF logs to support OpenTelemetry metrics

OpenTelemetry defines two types of metrics aggregations, Sum and MinMaxSumCount. Each metrics aggregation has its own semantics for creating metrics for different use scenarios. Previously, CloudWatch EMF logs only supported metrics with the Sum aggregation format. To embrace OpenTelemetry fully, CloudWatch has extended EMF schema to support both MinMaxSumCount and Sum metrics aggregations. The extended schema accommodates all the possible OpenTelemetry metrics formats.

OpenTelemetry MinMaxSumCount Metrics in EMF representation looks similar to the following:

{ "OTLib": "cloudwatch-otel", "_aws": { "CloudWatchMetrics": [ { "Namespace": "AOT/AOTServiceDemo", "Dimensions": [ [ "apiName", "statusCode" ], [ "apiName" ], [ "statusCode" ] ], "Metrics": [ { "Name": "latency", "Unit": "ms" } ] } ], "Timestamp": 1601258104342 }, "apiName": "/querySpan", "latency": { "Max": 1, "Min": 0, "Count": 14, "Sum": 9 }, "statusCode": "200"
}

More metric dimension rollup options

CloudWatch EMF logs enable us to ingest metrics data as logs to CloudWatch metrics, including metrics with high cardinality. High cardinality means that a metric can have many dimensions when instrumented in the application. If each dimension value varies (high-cardinality) from different instruments, the combination of dimensions generates many metrics that can increase costs at scale. To help our customer achieve the same alerting actions with lower cost, we allow customers to do a single dimension rollup. This limits the dimension combinations and generates fewer metrics, while maintaining our ability to set desired alarms.

The available dimension rollup options in EMF exporter are:

  • ZeroAndSingleDimensionRollup: Enables both zero dimension rollup and single dimension rollup.
  • SingleDimensionRollupOnly: Enables single dimension rollup.
  • NoDimensionRollup: No dimension rollup (only keep original metrics that contain all dimensions).

For example, let’s say we emit a metric named latency to record API latency, and it has three dimensions: apiName, statusCode, and userId. With the SingleDimensionRollupOnly option, the exporter generates metrics with one dimension, thereby limiting the number of dimension combinations if dimension values vary with each API call.

Metric batching

AWS Distro for OpenTelemetry Collector inherits all the data processors from OpenTelemetry Collector. The batch processor is enabled by default to improve CloudWatch EMF logs requests throughput. With the settings, OpenTelemetry metrics data batches in memory until it reaches either timeout threshold (default 30s) or batch size (default 8192). The batched metrics are sent to the CloudWatch backend in an efficient fashion.

  • send_batch_size (default = 8192): Number of spans or metrics after which a batch sends.
  • timeout (default = 30s): Time duration after which a batch sends regardless of size.

Conclusion

With the launch of AWS Distro for OpenTelemetry, Amazon CloudWatch fully supports OpenTelemetry metrics. There are new features developed regularly to increase support of OpenTelemetry in AWS. For more updates, visit the AWS Distro for OpenTelemetry repos.