This post is written by Dhiraj Mahapatro, Senior Specialist SA, Serverless.
In October 2021, AWS announced visualizing AWS Step Functions from the AWS Batch console. Now you can also visualize Step Functions from the Amazon Athena console.
Amazon Athena is an interactive query service that makes it easier to analyze Amazon S3 data using standard SQL. Athena is a serverless service and can interact directly with data stored in S3. Athena can process unstructured, semistructured, and structured datasets.
AWS Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Step Functions workflows manage failures, retries, parallelization, service integrations, and observability so builders can focus on business logic. Athena is one of the service integrations that are available for Step Functions.
This blog walks through Step Functions integration in Amazon Athena console. It shows how you can visualize and operate Athena queries at scale using Step Functions.
Introducing workflow orchestration in Amazon Athena console
AWS customers store large amounts of historical data on S3 and query the data using Athena to get results quickly. They also use Athena to process unstructured data or analyze structured data as part of a data processing pipeline.
Data processing involves discrete steps for ingesting, processing, storing the transformed data, and post-processing, such as visualizing or analyzing the transformed data. Each step involves multiple AWS services. With Step Functions workflow integration, you can orchestrate these steps. This helps to create repeatable and scalable data processing pipelines as part of a larger business application and visualize the workflows in the Athena console.
With Step Functions, you can run queries on a schedule or based on an event by using Amazon EventBridge. You can poll long-running Athena queries before moving to the next step in the process, and handle errors without writing custom code. Combining these two services provides developers with a single method that is scalable and repeatable.
Step Functions workflows in the Amazon Athena console allow orchestration of Athena queries with Step Functions state machines:
Using Athena query patterns from Step Functions
Execute multiple queries
In Athena, you run SQL queries in the Athena console against Athena workgroups. With Step Functions, you can run Athena queries in a sequence or run independent queries simultaneously in parallel using a parallel state. Step Functions also natively handles errors and retries related to Athena query tasks.
Workflow orchestration in the Athena console provides these capabilities to run and visualize multiple queries in Step Functions. For example:
- Choose Get Started from Execute multiple queries.
- From the pop-up, choose Create your own workflow and select Continue.
A new browser tab opens with the Step Functions Workflow Studio. The designer shows a workflow pattern template pre-created. The workflow loads data from a data source running two Athena queries in parallel. The results are then published to an Amazon SNS topic.
Alternatively, choosing Deploy a sample project from the Get Started pop-up deploys a sample Step Functions workflow.
This option creates a state machine. You then review the workflow definition, deploy an AWS CloudFormation stack, and run the workflow in the Step Functions console.
Once deployed, the state machine is visible in the Step Functions console as:
Select the AthenaMultipleQueriesStateMachine to land on the details page:
To see the Step Functions workflow in action, choose Start execution. Keep the optional name and input and choose Start execution:
The state machine completes the tasks successfully by Executing multiple queries in parallel using Amazon Athena and Sending query results using the SNS topic:
The state machine used the Amazon Athena
GetQueryResults tasks. The Workflow orchestration in Athena console now highlights this newly created Step Functions state machine:
Any state machine that uses this task in Step Functions in this account is listed here as a state machine that orchestrates Athena queries.
Query large datasets
You can also ingest an extensive dataset in Amazon S3, partition it using AWS Glue crawlers, then run Amazon Athena queries against that partition.
Select Get Started from the Query large datasets pop-up, then choose Create your own workflow and Continue. This action opens the Step Functions Workflow studio with the following pattern. The Glue crawler starts and partitions large datasets for Athena to query in the subsequent query execution task:
Step Functions allows you to combine Glue crawler tasks and Athena queries to partition where necessary before querying and publishing the results.
Keeping data up to date
You can also use Athena to query a target table to fetch data, then update it with new data from other sources using Step Functions’ choice state. The choice state in Step Functions provides branching logic for a state machine.
You are not limited to the previous three patterns shown in workflow orchestration in the Athena console. You can start from scratch and build Step Functions state machine by navigating to the bottom right and using Create state machine:
Create State Machine in the Athena console opens a new tab showing the Step Functions console’s Create state machine page.
Refer to building a state machine AWS Step Functions Workflow Studio for additional details.
Step Functions integrates with all Amazon Athena’s API actions
In September 2021, Step Functions announced integration support for 200 AWS services to enable easier workflow automation. With this announcement, Step Functions can integrate with all Amazon Athena API actions today.
Step Functions can automate the lifecycle of an Athena query: Create/read/update/delete/list workGroups; Create/read/update/delete/list data catalogs, and more.
Other AWS service integrations
Step Functions’ integration with the AWS SDK provides support for 200 AWS Services and over 9,000 API actions. Athena tasks in Step Functions can evolve by integrating available AWS services in the workflow for their pre and post-processing needs.
For example, you can read Athena query results that are put to an S3 bucket by using a
GetObject S3 task AWS SDK integration in Step Functions. You can combine different AWS services into a single business process so that they can ingest data through Amazon Kinesis, do processing via AWS Lambda or Amazon EMR jobs, and send notifications to interested parties via Amazon SNS or Amazon SQS or Amazon EventBridge to trigger other parts of their business application.
Workflow orchestrations in the Athena console only show Step Functions state machines that use Athena’s optimized API integrations. This includes StartQueryExecution, StopQueryExecution, GetQueryExecution, and GetQueryResults.
Step Functions state machines do not show in the Athena console when:
- A state machine uses any other AWS SDK Athena API integration task.
- The APIs are invoked inside a Lambda function task using an AWS SDK client (like Boto3 or Node.js or Java).
First, empty DataBucket and AthenaWorkGroup to delete the stack successfully. To delete the sample application stack, use the latest version of AWS CLI and run:
aws cloudformation delete-stack --stack-name <stack-name>
Alternatively, delete the sample application stack in the CloudFormation console by selecting the stack and choosing Delete:
Amazon Athena console now provides an integration with AWS Step Functions’ workflows. You can use the provided patterns to create and visualize Step Functions’ workflows directly from the Amazon Athena console. Step Functions’ workflows that use Athena’s optimized API integration appear in the Athena console. To learn more about Amazon Athena, read the user guide.
For more serverless learning resources, visit Serverless Land.