Amazon QuickSight recently added support for Amazon Athena Federated Query, which allows you to query data in place from various data sources. With this capability, QuickSight can extend support to query additional data sources like Amazon CloudWatch Logs, Amazon DynamoDB, and Amazon DocumentDB (with Mongo DB compatibility) via their existing Amazon Athena data source. You can also use the Athena Query Federation SDK to write custom connectors and query any source accessible with a Java API, whether it is relational, non-relational, object, or a custom data endpoint.
A common analytics use case is to access data from a REST API endpoint and blend it with information from other sources. In this post, I walk you through the process of setting up a custom federated query connector in Athena to query data from a REST API endpoint and build a QuickSight dashboard that blends data from the REST API endpoint with other data sources.
To illustrate this use case, we work with Slack, the makers of a leading channel-based messaging platform, to test their Member Analytics API, which can help our mock company, Example Corp, understand Slack adoption and member engagement across different teams.
How the Slack Member Analytics API works
The following diagram illustrates the Slack Member Analytics API.
The Slack Member Analytics API is a REST API endpoint available for Slack Enterprise Grid customers. Authorized users and services can access the member usage stats dataset via the
admin.analytics.getFile endpoint of the Slack Web API. The data consists on a new-line delimited JSON file with daily Slack activity stats at the member level. A record looks like the following code:
To request data, you must provide a
date argument in the format of
type argument with the value
member, and an OAuth bearer token as the header. The response is a compressed (.gzip) JSON file with data for the requested date. See the following code of a sample request:
Building the solution for Example Corp
For our use case, Example Corp has recently purchased Slack for 1,000 users and as the Collaboration team onboards new teams to Slack, they want to measure Slack adoption and engagement within each new team. If they see low adoption or engagement within a group at the company, they can work with that group to understand why they aren’t using Slack and provide education and support, as needed.
Example Corp wants to provide analysts access to the Slack member usage stats to run ad hoc queries in place (directly from the source) without maintaining a new extract, transform, and load (ETL) pipeline. They use the QuickSight cross data source join feature to blend their Slack usage stats with their HR dataset.
To achieve this, Example Corp implements the following steps:
- Authorize the custom federated query connector with Slack to access the Member Analytics API.
- Develop and deploy a custom federated query connector in the Example Corp AWS account.
- Create a dataset in the Example Corp QuickSight environment that reads Slack member usage data for the last 30 days and blends it with an HR dataset.
- Create a QuickSight dashboard that shows usage trends of provisioned vs. active users.
Example Corp program managers can now monitor slack engagement using their QuickSight Dashboard (see the following screenshot).
The following diagram illustrates the overall architecture of the solution.
The following sections describe the components in detail and provide sample code to implement the solution in your environment.
Authorizing the custom federated query connector to access the Slack Analytics API
Data REST API endpoints typically have an authentication mechanism such as standard HTTP authentication or a bearer token. In the case of the Slack Web API, a bearer token is required on every request. The Slack Member Analytics API uses an OAuth protocol to authorize applications’ read access to data from an organization’s Slack environment.
To perform the OAuth handshake, Example Corp deploys a custom web application on Amazon Elastic Compute Cloud (Amazon EC2) and registers it as a new Slack application. When it’s deployed, Example Corp Slack admins can access the web application UI to authenticate with Slack and authorize read access to the custom federated query connector. After successful authentication, the custom web application stores the bearer token as a secret in AWS Secrets Manager. Only the custom application server and the federated query connector have access to this secret.
The following is an architecture diagram and brief description of the OAuth authorization workflow between Slack.com and the custom web application. As a prerequisite, you need to register your custom application with Slack.
- The Slack admin accesses the custom application UI from their browser and chooses Add to Slack to begin the authorization process.
- The custom application redirects the admin to Slack.com to authenticate and authorize the client with an
admin.analytics:readaccess for Example Corp Slack Enterprise Grid.
- Slack.com redirects the admin back to the custom application UI, passing a temporary authorization code in the request.
- On the backend, the custom application retrieves Slack client secrets from a Secrets Manager secret. The Slack client secrets are obtained during the Slack application registration.
- The custom application server makes a request for a bearer token to the Slack API, passing both the temporary authorization code and the Slack client secrets.
- If both the temporary authorization code and the client secrets are valid, then the Slack API returns a bearer token to the custom application server.
- The custom application saves the bearer token in the Secrets Manager secret.
- Finally, the application sends a confirmation of successful authorization to the admin.
Slack admins can revoke access to the application from the organization’s console at any time.
You can find the source code and detailed instructions to deploy this sample OAuth web application in the GitHub repo. When the authorization workflow is complete, you can pause or stop the resources running the web application. Going forward, the federated query connector accesses the token from Secrets Manager.
Deploying the custom federated query connector
When the OAuth workflow is complete, we can deploy the custom federated query connector in the Example Corp AWS environment. For Example Corp, we develop a custom AWS Lambda function using the Athena Query Federation Java SDK and a Java HTTP client to connect with the Slack Member Analytics REST API. Finally, we register it as a new data source within Athena.
The following is a diagram of how the custom connector workflow operates.
The workflow includes the following steps:
- Users submit a query to Athena using the following query:
select * from <catalog_name>.slackanalytics.member_analytics where date='2020-11-10', where <catalog_name> is the name specified when creating the Athena data source.
- Athena compiles the query and runs the Lambda function to retrieve the Slack authorization token from Secrets Manager and determine the number of partitions based on the query predicates (where clause).
- The Slack Member Analytics Connector partitions the data by date and runs a Lambda function for each partition (date) specified in the query. For example, if the predicate is
WHERE date IN (‘2020-11-10’, ‘2020-11-12’), Athena runs two instances of the Lambda function. When no dates are specified in the where clause, the connector gets data for the last 30 days.
- Each instance of the Lambda function makes a request to the Slack Member API to retrieve data for each day.
- Finally, Athena performs any aggregation and computation specified in the query and return the results to the client.
You can deploy this sample Slack Member Analytics Lambda function in your AWS environment via AWS CloudFormation with the following template. If you want to modify and build the connector from scratch, you can find the source code and instructions in the GitHub repo.
After the Lambda function has been deployed, create a new data source in Athena. For step-by-step instructions, see Deploying a Connector and Connecting to a Data Source.
- On the Athena console, in the query editor, choose Connect data source.
- Select All other data sources.
- Point your catalog to your new Lambda function.
You should be able to browse your new catalog within Athena from the Athena console and query the Slack Member Analytics API using SQL.
Creating a dataset that reads Slack member usage data and blends it with an HR dataset
As a prerequisite to query the Slack Member Analytics API from QuickSight, we must provide the proper permission for QuickSight to access the federated query data source in Athena. We do this directly from the QuickSight admin UI following these steps:
- As an admin, on the Admin menu, choose Manage QuickSight.
- Under Security & Permissions, choose QuickSight access to AWS services.
- Choose Add or Remove services.
- Select Athena.
- Choose Next when prompted to set the Amazon Simple Storage Service (Amazon S3) bucket and Lambda function permissions.
QuickSight browses the Athena catalogs and displays any Lambda functions associated with your account. If you don’t see a Lambda function, it means you haven’t mapped a data source within Athena.
- Select the function.
- Choose Finish.
When the Example Corp QuickSight environment has the proper permissions, analysts can query the Slack Analytics Member API using their existing Athena data source. For instructions on creating your own dataset, see Creating a Dataset Using Amazon Athena Data.
The custom connector appears as a new Catalog, Database, and Tables option.
- In QuickSight, on the Datasets page, choose New dataset.
- Choose Athena as your data source.
- Choose Create dataset.
- Choose your table or, for this use case, choose Use custom SQL.
For this analysis, we write a custom SQL that gets member activity for the last 30 days:
With the QuickSight cross data source join feature, analysts can enrich the Slack member stats with their HR info. For this use case, we imported a local
HR_dataset.csv file containing the list of subscribed users with their respective Example Corp department, and joined them via the
The result is a dataset with Slack activity by employee and department. We’ve also updated the date field from a
String type to a
Date type using the QuickSight Data Prep page to take advantage of additional visualization features with
Date type fields.
Creating a QuickSight dashboard that shows usage trends of provisioned vs. active users
Example Corp Analysts want to visualize the trend of provisioned users vs. active users and understand Slack adoption by department. To support these visualizations, we created the following calculated fields within our QuickSight analysis:
- active –
- provisioned –
You can also create these calculated fields when you create your dataset. This way, you can reuse them in other QuickSight analyses.
We use QuickSight narrative insights, a line chart, a bar chart, and a pivot table with conditional formatting to create the following analysis.
From this analysis, Example Corp can see that the adoption trend is positive; however, there is an adoption gap within the Marketing team. The program managers can engage the Marketing department leads and focus their training resources to improve their adoption.
This dashboard can now be published to stakeholders within the organization as needed—either within the QuickSight app or embedded within existing enterprise applications.
With the recent integration of QuickSight and Athena Federated Query, organizations can access additional data sources beyond those already supported by QuickSight. Analysts can leverage QuickSight capabilities to analyze and build dashboards that blend data from a variety of data sources, and with the Athena Query Federation SDK, you can build custom connectors to access relational, non-relational, object, and custom data endpoints using standard SQL.
To get started, try the lab Athena Deploying Custom Connector.
About the Author
Pablo Redondo Sanchez is a Senior Solutions Architect at Amazon Web Services. He is a data enthusiast and works with customers to help them achieve better insights and faster outcomes from their data analytics workflows. In his spare time, Pablo enjoys woodworking and spending time outdoor with his family in Northern California.