By Allan Denot, CTO – DNX Solutions
By Sriwantha Attanayake, Sr. Partner Solution Architect – AWS

DNX-Solutions-AWS-Partners

Many customers at DNX Solutions struggle with different facets of their cloud platform optimization, but scalability is usually identified as one of the main factors to drive a cloud modernization project.

In a recent engagement, DNX Solutions supported ScanX—a Tokyo-based company specializing in 3D laser scan data processing—to improve scalability of their data platform, migrating from a monolithic architecture to a set of distributed services.

Concurrently, DNX Solutions modernized the customer’s heavy image processing jobs, and moved from a traditional pooling mechanism to an event-driven architecture supported by AWS Batch.

This case study describes how DNX Solutions improved the scalability of ScanX’s platform by harnessing a combination of AWS Batch and other AWS services including Amazon EventBridge, with a view to reducing maintenance overheads, specifically via scaling Laravel Jobs.

The key objective of the process detailed in this post was to enable ScanX to scale their data platform, subsequently making way to achieve expansion for their already substantially growing business.

DNX Solutions is an AWS Advanced Consulting Partner with the DevOps Competency that is focused on cloud-native concepts, application modernization, and data and analytics.

Why We Selected AWS Batch

AWS Batch enables organizations to run multiple computing jobs on Amazon Web Services (AWS). It dynamically provisions the optimal quantity and type of compute resources without needing to install and manage additional software.

AWS Batch essentially creates and runs Amazon Elastic Container Service (Amazon ECS) tasks, allowing it to scale down to zero when a job is not waiting to be processed.

For the team at DNX Solutions, it was important to understand the core component of ScanX’s platform, which is responsible for the image processing. Knowing the customer was struggling to keep up with the increasing number of jobs and customers subscribing to their software-as-a-service (SaaS) platform was essential to the decision making when selecting AWS Batch as a core component for the new architecture.

ScanX’s jobs are intermittent, and the workload can vary depending on the day and time. Instead of deploying a pull architecture with idle workers waiting to process jobs, AWS Batch manages the compute environments automatically and provisions the compute resources when you have jobs pending in the queue. Consequently, no cost is generated during the inactivity period.

Currently, AWS Batch also offers support to AWS Fargate and Amazon EC2 Spot Instances, which helps to diversify your job strategy while maintaining focus on developing your core product, scalability, and cost optimization.

Leveraging AWS to Scale a Monolithic Application

In the original architecture, there was a single Amazon Elastic Compute Cloud (Amazon EC2) instance supporting the website, APIs, and also the data platform utilized for the image processing jobs.

Without an auto scaling strategy and with limited resources, ScanX needed to throttle the number of jobs triggered per day to avoid impacts in the server and application. The first step proposed by DNX Solutions was to decouple some of the services into separate containers and make use of Amazon ECS as the new home for their website and additional APIs.

The main application is based on PHP and uses the Laravel framework. Laravel is a web application framework which provides a foundation for PHP developers with an extensive ecosystem of features and service integrations. The framework already provides integration for different AWS services out-of-the-box like Amazon DynamoDB, Amazon Simple Queue Service (SQS), and Amazon ElastiCache for Redis.

Laravel has a job dispatching system as part of the standard framework, and the ScanX data platform was constructed on top of this feature. It’s possible to set different queues such as database, Redis, or SQS, but all of these scenarios still need a worker to pull pending jobs from the specified queue.

As noted previously, due to the nature of ScanX image processing jobs, DNX Solutions decided to move away from the traditional pull architecture and deploy an event-driven approach. AWS Batch by itself wouldn’t be enough for this transition because the team needed to dispatch the jobs from the website, and AWS Batch is not supported in Laravel queues by default.

To integrate the services, DNX Solutions used the AWS PHP SDK, but there was still a challenge regarding the way in which jobs should transition between the decoupled services.

An additional Laravel component was needed to allow the push of individual jobs to AWS Batch queues, and to change the way workers select the job ID which has triggered the initial request.

After researching some open source projects, DNX Solutions found an interesting option which proposed an integration between Laravel 5.x and AWS Batch by creating a custom Laravel connector. However, ScanX was using Laravel version 6.x, so to adapt the team forked the original project, updated the PHP codes, and published this package in a list of open source projects.

You can find the Laravel dispatcher at GitHub.

With the new Laravel dispatcher in-place, the jobs started to dispatch individual entries in the AWS Batch queue, which consequently triggered the automatic provisioning of compute resources.

When the container was ready for processing, AWS Batch specified the exact job ID in the entry point of the application, allowing the selection and processing of a single job per container.

The New State of the Data Platform

The diagram below illustrates part of the new architecture DNX Solutions developed for ScanX. The focus here is to understand how the website interacts with AWS Batch and exchanges information between the decoupled services.

Figure 1 – Event-driven architecture with AWS Batch.

Figure 1 – Event-driven architecture with AWS Batch.

With this architecture, not only can the website dispatch jobs via AWS Batch, but jobs can also dispatch subsequent jobs. In the case of a complex data processing workflow with dependency of pre-steps, this feature can help to control the execution order of tasks.

The permission to dispatch new jobs is controlled by AWS Identity and Access Management (IAM) roles associated with the ECS task definition of the website, and additionally by other AWS Batch job definitions.

The use of Amazon Simple Storage Service (Amazon S3) to store the images uploaded via the website and converted by the image processing jobs allows us to maintain a stateless component. This also guarantees the use of an auto scaling strategy to spin-up additional containers when the demand surges.

With the move from a pull architecture to an event-driven architecture, a separate container is started for each job dispatched by the main application. Each job is configured with a minimum number of vCPUs and memory, which AWS Batch uses to provision the containers and EC2 hosts automatically.

You can also limit the minimum and maximum number of hosts, depending on the instance type selected and the number of vCPUs, allowing maximum control to balance costs and scalability.

Scheduling cron Jobs with AWS Batch

Due to specific business requirements from ScanX, a minor number of jobs could not be adapted directly to the new Laravel dispatcher. Specifically, jobs that weren’t triggered by a certain action in the system, but instead were started on a time interval basis.

With this exception in mind, DNX Solutions spent some time discussing what to do for jobs that run on a recurring schedule. Is there still use for AWS Batch in this scenario?

These particular jobs were triggered by cron rules associated with the Laravel scheduler component and couldn’t be associated with a specific event or trigger dispatched by the system.

Also, the Laravel scheduler works fine on a single instance, but when you begin scaling horizontally this kind of architecture presents a lot of challenges for jobs that are not idempotent, creating a risk of generating inaccurate data and unnecessary reprocessing.

Moving the solution towards AWS services, DNX Solutions decided to remove this application dependency by combining Amazon EventBridge and AWS Batch. EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale.

EventBridge allowed the team to create specific rules to trigger AWS Batch jobs automatically based on the same cron rules configured initially through the application. AWS Batch continued to process each job individually and scale based on demand, while EventBridge added the missing event trigger which was associated with the cron rules in the original architecture while removing a single point of failure.

Like the other jobs, there’s no need for maintaining an idle worker, as resources are only provisioned when it’s time to process the job. Through AWS Batch queues and computing environments, DNX Solutions was able to assign different instance types for the different types of recurring jobs. This reduced the overall costs compared to the heavy and intensive image processing jobs which demand more computing power, and consequently generate higher costs.

Figure 2 – Flow of events generated by Amazon EventBridge.

Figure 2 – Flow of events generated by Amazon EventBridge.

The diagram above illustrates the configuration made for ScanX. With EventBridge, the team was able to create separate rules for different periods of time, and two main rules were created based on the customer’s business requirements (hourly and daily basis).

For each rule, it’s possible to assign one or multiple targets and different AWS Batch jobs could be associated with the same trigger.

When the time comes, EventBridge places a new job in the selected queue, at which point AWS Batch will handle the provisioning of the environment and spin-up of the containers automatically. After the conclusion of the job, the instance is automatically cleaned up by the managed compute environment provided by AWS Batch.

Conclusion

AWS Batch offers scalability and is well aligned with cost optimization best practices. The adaptation of Laravel jobs for AWS Batch allowed ScanX to scale their data platform and achieve expansion for their growing business.

Through combining AWS Batch with other AWS services, it was also possible to extract additional benefits and reduce the maintenance overhead. This lets the team focus more on new application features instead of worrying about the scaling of their infrastructure.

.
DNX-Solutions-APN-Blog-CTA-1
.


DNX Solutions – AWS Partner Spotlight

DNX Solutions is an AWS DevOps Competency Partner that works to bring a better cloud and application experience for digital-native companies in Australia.

Contact DNX Solutions | Partner Overview | AWS Marketplace

*Already worked with DNX Solutions? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.

Categories: APN