In this post, two AWS interns—Eric Lee and Connor Lindsey—describe their experience building a Prometheus remote write exporter for the popular open source observability project OpenTelemetry

As an engineer, understanding the performance and health of your applications and services is crucial. However, this can be challenging, especially when monitoring across various languages, infrastructures, and services. OpenTelemetry, a Cloud Native Computing Foundation (CNCF) open source project, has created an open standard specification that “makes robust, portable telemetry a built-in feature of cloud-native software.” OpenTelemetry provides a single set of APIs, libraries, agents, and collector services to capture distributed traces and metrics from your application. You can analyze them using Prometheus, Jaeger, and other observability tools.

The project is targeting General Availability (GA) before the end of 2020. Open source contributors from all over the world are rapidly adding functionality to OpenTelemetry. Each OpenTelemetry component, including SDKs, exporters, and instrumentation libraries, is important in connecting observability data across services, applications, and observability backends.

Prometheus is a popular backend used to visualize and alert based on metrics data. Prometheus offers a remote write API that extends Prometheus’ functionality. It does so by exporting metrics data from Prometheus to other services such as Graphite, InfluxDB, and Cortex. We designed and developed an in-process exporter to send metrics data from Go services instrumented by the OpenTelemetry Go SDK to Cortex. Cortex is a horizontally scalable and long-term storage solution commonly used as a remote write destination for Prometheus.

Diagram illustrating the comparison between the Cortex Export Pipeline.

Before we implemented our project, users could send metrics data to Cortex by collecting the data in their applications through the OpenTelemetry-Go SDK. They would then send it to another service, such as Prometheus and/or the OpenTelemetry Collector. Finally, one of those services exported the data to Cortex.

There was an opportunity to simplify the export process for users by reducing component complexity. Our exporter acts on this opportunity by removing the Prometheus and/or OpenTelemetry Collector components and exports data from the Go SDK directly to Cortex. It aims to be the simplest way for users to export metrics data from a Go application to Cortex. In building this project, we hoped to both improve developer efficiency and lower operational costs.

Exporter design and implementation

When we began our project, there was only a draft specification for Metrics Exporters. Community feedback from contributors and users drives rapid evolution, which meant we could jump in and push the specification forward. Discussion around OpenTelemetry takes place in Gitter channels, on OpenTelemetry GitHub repos through issues and pull requests, and weekly special interest group meetings.

We used a comprehensive development workflow, starting with a requirements analysis to identify features we needed to support. We followed up on the requirements analysis with a design in which we mapped out the components and general structure of the exporter. After that, we completed implementation and testing docs that outlined how we would turn our design into code. Along the way, we received guidance and reviews from both AWS employees and OpenTelemetry maintainers.

To meet the needs of OpenTelemetry users, we set the following project goals:

  1. Implement the Metrics Exporter interface from the OpenTelemetry Go SDK.
  2. Convert OpenTelemetry metric data to the TimeSeries format accepted by Cortex.
  3. Send metrics data via HTTP to Cortex through Prometheus’s remote_write API.
  4. Use Go, AWS, open source software, and OpenTelemetry best practices regarding design, development, and testing.

Exporter entity relationship diagram

 

Diagram illustrating the Exporter sequence.

 

The Exporter is written in Go and implements the OpenTelemetry-Go SDK’s Exporter interface. The Exporter works with the SDK’s Push Controller, which periodically aggregates check-pointed data from any created instruments. The Controller passes the data to the Exporter by calling the Export() function. The check-pointed data, processed by the OpenTelemetry-Go SDK, includes information such as timestamps, labels, and values.

Within the Exporter, the check-pointed data iterates and converts to TimeSeries based on the data’s aggregation type. The OpenTelmetry-Go SDK provides multiple aggregation types that reflect different types of data, where each type of data becomes an individual TimeSeries. For example, a MinMaxSumCount aggregation is converted to four TimeSeries—Min, Max, Sum, and Count. As stated by their names, the four TimeSeries contain the minimum, maximum, sum, and count of the values recorded to an instrument. Other aggregation types include Sum, Histogram, Distribution, and LastValue.

These TimeSeries group together in an array, and convert to a WriteRequest, a struct defined by Prometheus for use in its remote_write API. Snappy compresses this WriteRequest into a Protocol Buffer message, attaches it to an HTTP request, and sends it to Cortex. The Exporter can be configured to handle various use cases such as different endpoints, custom push intervals, HTTP authentication, and more.

Testing

We followed test-driven development best practices while implementing the Exporter. We designed and wrote the tests before implementing the necessary code to pass said tests. Individual components were tested with unit tests, and we used integration tests to verify that the different components worked together as expected.

For our tests, we used Go’s built-in testing library in addition to the libraries testify and go-cmp.

Sample usage

The Cortex Exporter requires an instrumented Go application that is using our Exporter and a running instance of Cortex. End users can set up the Exporter in their application, create instruments, and collect metrics as shown in the following image.

We also created an example project in our documentation that uses Docker Compose to set up multiple containers. These containers handle creating the Cortex instance and running the example Go application for the user. The example project also creates a Docker container for Grafana, which lets the user to visualize the exported metrics data. Docker Compose allows the user to run the project in a single line: docker-compose up, which makes running the example project simple for the user.

Grafana dashboard display metrics from the Exporter.

Conclusion

While working on the design and implementation of this project, we learned about the process of building, documenting, and testing high-quality code. We also learned how to work with a large community of open source contributors. We embraced open source principles by fostering transparency and discussion. We also contributed documentation, examples, and tests, which are all fundamental for successful open source development.

Working with the OpenTelemetry community has been an amazing experience. We’d like to extend the invitation to join us to each of you. You can learn more about OpenTelemetry at the links below. Join the discussion, test new features, and contribute your ideas and experience. We hope to see you there!

OpenTelemetry resources:

Eric Lee

Eric Lee

Eric Lee is a rising senior majoring in computer science at UC Davis and current software engineering intern at AWS. He is interested in machine learning and cloud computing.

Connor Lindsey

Connor Lindsey

Connor Lindsey is a senior at Brigham Young University currently interning as a software developer at AWS. He is interested in React, React Native, and UI/UX design.

The content and opinions in this post are those of the third-party authors and AWS is not responsible for the content or accuracy of this post.