This post written by Dhiraj Mahapatro, Senior Specialist SA, Serverless.

AWS Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Step Functions workflows manage failures, retries, parallelization, service integrations, and observability so builders can focus on business logic.

AWS Batch is one of the service integrations that are available for Step Functions. AWS Batch enables users to more easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and compute resource classifications based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and runs batch computing workloads across the full range of AWS compute services and features, such as AWS FargateAmazon EC2, and spot instances.

Now, Step Functions is available to AWS Batch users through the AWS Batch console. This feature enables AWS Batch users to augment compute options and have additional orchestration capabilities to manage their batch jobs.

This blog walks through Step Functions integration in AWS Batch console and shows how AWS Batch users can efficiently use Step Functions workflow orchestrators in batch workloads. A sample application also highlights the use of AWS Lambda as a compute option for AWS Batch.

Introducing workflow orchestration in AWS Batch console

Today, AWS users use AWS Batch for high performance computing, post-trade analytics, fraud surveillance, screening, DNA sequencing, and more. AWS Batch minimizes human error, increases speed and accuracy, and reduces costs with automation so that users can refocus on evolving the business.

In addition to using compute-intensive tasks, users sometimes need Lambda for simpler, less intense processing. Users also want to combine the two in a single business process that is scalable and repeatable.

Workflow orchestration (powered by Step Functions) in AWS Batch console allows orchestration of batch jobs with Step Functions state machine:

Workflow orchestration in Batch console

Workflow orchestration in Batch console

Using batch-related patterns from Step Functions

Error handling

Step Functions natively handles errors and retries of its workflows. Users rely on this native error handling mechanism to focus on building business logic.

Workflow orchestration in AWS Batch console provides common batch-related patterns that are present in Step Functions. Handling errors while submitting batch jobs in Step Functions is one of them.

Getting started with orchestration in Batch

Getting started with orchestration in Batch

  1. Choose Get Started from Handle complex errors.
  2. From the pop-up, choose Start from a template and choose Continue.

A new browser tab opens with Step Functions Workflow Studio. The Workflow Studio designer has a workflow pattern template pre-created. Diving deeper into the workflow highlights that the Step Functions workflow submits a batch job and then handles success and error scenarios by sending Amazon SNS notifications, respectively.

Alternatively, choosing Deploy a sample project from the Get Started pop-up deploys a sample Step Functions workflow.

Deploying a sample project

Deploying a sample project

This option allows creating a state machine from scratch, reviewing the workflow definition, deploying an AWS CloudFormation stack, and running the workflow in Step Functions console.

Deploy and run from console

Deploy and run from console

Once deployed, the state machine is visible in the Step Functions console as:

Viewing the state machine in the AWS Step Functions console

Viewing the state machine in the AWS Step Functions console

Select the BatchJobNotificationStateMachine to land on the details page:

View the state machine's details

View the state machine’s details

The CloudFormation template has already provisioned the required batch job in AWS Batch and the SNS topic for success and failure notification.

To see the Step Functions workflow in action, use Start execution. Keep the optional name and input as is and choose Start execution:

Run the Step Function

Run the Step Function

The state machine completes the tasks successfully by Submitting Batch Job using AWS Batch and Notifying Success using the SNS topic:

The successful results in the console

The successful results in the console

The state machine used the AWS Batch Submit Job task. The Workflow orchestration in AWS Batch console now highlights this newly created Step Functions state machine:

The state machine is listed in the Batch console

The state machine is listed in the Batch console

Therefore, any state machine that uses this task in Step Functions for this account is listed here as a state machine that orchestrates batch jobs.

Combine Batch and Lambda

Another pattern to use in Step Functions is the combination of Lambda and batch job.

Select Get Started from Combine Batch and Lambda pop-up followed by Start from a template and Continue. This takes the user to Step Functions Workflow studio with the following pattern. The Lambda task generates input for the subsequent batch job task. Submit Batch Job task takes the input and submits the batch job:

Combining AWS Lambda with AWS Step Functions

Combining AWS Lambda with AWS Step Functions

Step Functions enables AWS Batch users to combine Batch and Lambda functions to optimize compute spend while using the power of the different compute choices.

Fan out to multiple Batch jobs

In addition to error handling and combining Lambda with AWS Batch jobs, a user can fan out multiple batch jobs using Step Functions’ map state. Map state in Step Functions provides dynamic parallelism.

With dynamic parallelism, a user can submit multiple batch jobs based on a collection of batch job input data. With visibility to each iteration’s input and output, users can easily navigate and troubleshoot in case of failure.

Easily navigate and troubleshoot in case of failure

Easily navigate and troubleshoot in case of failure

AWS Batch users are not limited to the previous three patterns shown in Workflow orchestration in the AWS Batch console. AWS Batch users can start from scratch and build Step Functions state machine by navigating to the bottom right and using Create state machine:

Create a state machine from the Step Functions console

Create a state machine from the Step Functions console

Create State Machine in AWS Batch console opens a new tab with Step Functions console’s Create state machine page.

Design a workflow visually

Design a workflow visually

Refer building a state machine AWS Step Functions Workflow Studio for additional details.

Deploying the application

The sample application shows fan out to multiple batch jobs pattern. Before deploying the application, you need:

To deploy:

  1. From a terminal window, clone the GitHub repo:
    git clone [email protected]:aws-samples/serverless-batch-job-workflow.git
  2. Change directory:
    cd ./serverless-batch-job-workflow
  3. Download and install dependencies:
    sam build
  4. Deploy the application to your AWS account:
    sam deploy --guided

To run the application using the AWS CLI, replace the state machine ARN from the output of deployment steps:

aws stepfunctions start-execution \ --state-machine-arn <StepFunctionArnHere> \ --region <RegionWhereApplicationDeployed> \ --input "{}"

Step Functions is not limited to AWS Batch’s Submit Job API action

In September 2021, Step Functions announced integration support for 200 AWS Services to enable easier workflow automation. With this announcement, Step Functions is not limited to integrate with AWS Batch’s SubmitJob API but also can integrate with any AWS Batch SDK API today.

Step Functions can automate the lifecycle of an AWS Batch job, starting from creating a compute environment, creating job queues, registering job definitions, submitting a job, and finally cleaning up.

Other AWS service integrations

Step Functions support for 200 AWS Services equates integration with more than 9,000 API actions across these services. AWS Batch tasks in Step Functions can evolve by integrating with available services in the workflow for their pre- and post-processing needs.

For example, batch job input data sanitization can be done inside Lambda and that gets pushed to an Amazon SQS queue or Amazon S3 as an object for auditability purposes.

Similarly, Amazon SNS, Amazon Pinpoint, or Amazon SES can notify once AWS Batch job task is complete.

There are multiple ways to decorate around an AWS Batch job task. Refer to AWS SDK service integrations and optimized integrations for Step Functions for additional details.

Important considerations

Workflow orchestrations in the AWS Batch console only show Step Functions state machines that use AWS Batch’s Submit Job task. Step Functions state machines do not show in the AWS Batch console when:

  1. A state machine uses any other AWS SDK Batch API integration task
  2. AWS Batch’s SubmitJob API is invoked inside a Lambda function task using an AWS SDK client (like Boto3 or Node.js or Java)

Cleanup

The sample application provisions AWS Batch (the job definition, job queue, and ECS compute environment inside a VPC). It also creates subnets, route tables, and an internet gateway. Clean up the stack after testing the application to avoid the ongoing cost of running these services.

To delete the sample application stack, use the latest version of AWS SAM CLI and run:

sam delete

Conclusion

To learn more on AWS Batch, read the Orchestrating Batch jobs section in the Batch developer guide.

To get started, open the workflow orchestration page in the Batch console. Select Orchestrate Batch jobs with Step Functions Workflows to deploy a sample project, if you are new to Step Functions.

This feature is available in all Regions where both Step Functions and AWS Batch are available. View the AWS Regions table for details.

To learn more on Step Functions patterns, visit Serverless Land.