By Hakan Ilter, Cloud and Big Data Consultant at Kloia
By Prasad Rao, Partner Solutions Architect at AWS
Many enterprise organizations have .NET Framework legacy applications they need modernize and convert to cloud-native so they can reap the cloud benefits of agility, scalability, and cost savings.
One of the largest product feed automation and optimization platforms, GoDataFeed has a back-end platform that ingests millions of items of catalog data from multiple sources.
GoDataFeed applies complex data transformations to the data, and feeds it to target systems. It is more than a simple extract, transform, load (ETL) platform, though, as it continuously runs tens of thousands of micro-ETL jobs.
To address the business needs of customers, GoDataFeed relies heavily on flexibility and scalability. However, GoDataFeed was built as a monolithic .NET Framework application, including Windows Services and MSSQL, with heavy business logic in stored procedures.
GoDataFeed’s monolithic platform presented three main challenges to their business:
- Lack of agility – Developing new features, debugging issues, and tracking changes is difficult in monolithic applications, especially when so much business logic is implemented inside stored procedures.
- Limited scalability – Scaling with the increased volume in traffic involves a lot of manual work like installing a new database instance, restoring the data, and manually sharding some clients into different database instances.
- Increased licensing cost – With the increased volume, each new database instance means higher license costs.
Based on their growth projections, the current architecture would have become a limiting factor for their business in near mid-term. Hence, they engaged Kloia, an AWS Partner Network (APN) Advanced Consulting Partner with AWS Competencies in DevOps and Microsoft Workloads. Kloia agreed to re-architect GoDataFeed’s platform and modernize their application within tight timelines.
In this post, we’ll describe how Kloia addressed GoDataFeed’s challenges by transforming their legacy .NET Framework monolithic application into a .NET Core-based decoupled architecture.
How Kloia Partnered with GoDataFeed
When Kloia initially engaged with GoDataFeed for their DevOps-related requirements, it became apparent the primary need was scalability from application modernization. In the ensuing discussions, Kloia implemented a small proof-of-concept (PoC), and prepared the draft architecture.
The discovery sessions and PoC led to a professional services consultancy. Over five days, Kloia conducted a deep-dive workshop with the team who designed, built, and currently manage the platform at GoDataFeed.
With their knowledge of, and experience building, cloud-native applications on AWS, Kloia proposed and implemented a cloud-native architecture within a six-month timeframe.
One of the most critical decisions was choosing the right database technology. Kloia made the architectural decision to retain the smaller portion of GoDataFeed’s relational data (business information and application settings) in a SQL database, moving the largest part of their data (product catalog data) to a NoSQL database.
The Amazon Aurora Serverless MySQL-compatible was an ideal choice for the retained relational data, as it’s easy to scale and requires no maintenance. Kloia selected Amazon DocumentDB to store the large amounts of product catalog data. Since the customer was ingesting diverse data from different e-commerce platforms and marketplaces, each with its own schema, the schemaless nature of Amazon DocumentDB was a good fit.
During discussions with the customer, Kloia also suggested using Amazon DynamoDB, but GoDataFeed didn’t have any experience with DynamoDB. They did have experience with MongoDB, however, which is compatible with Amazon DocumentDB, so that made the transition easier.
Amazon DocumentDB has flexible query capabilities for different types of data, specifically complex nested structures, making it efficient for integrating legacy services into the new NoSQL database.
GoDataFeed wanted to keep all historical data, both raw ingested and transformed exported data. Kloia chose Amazon Simple Storage Service (Amazon S3) as the storage solution for their historical data. In future phases, this historical data will be used to build a data lake using Amazon Athena for reporting purposes.
For the application layer, because an ETL platform needs to perform multiple series of tasks, Kloia chose AWS Step Functions to implement workflows, which come with in-built features like error handling and retry mechanisms.
Kloia took the business logic within MSSQL stored procedures and .NET Framework modules and re-implemented it in .NET Core. The team split the business logic into multiple small applications (tasks) so they could run them as workflows. Breaking business logic into small tasks improves development speed and quality, in a way that’s similar to a microservices architecture.
Kloia then chose AWS Fargate for Amazon Elastic Container Service (Amazon ECS), since it can scale to run hundreds of thousands of concurrent tasks without the overhead of managing the underlying infrastructure.
To control the on-demand initiation and termination of the workflow, the team crafted a Serverless API from Amazon API Gateway and AWS Lambda functions. They complemented the Serverless API with Amazon CloudWatch Events to trigger scheduled tasks and keep some Lambda functions warm to avoid cold-start problems.
Kloia made the key decision to use Apache Spark with DotNet for the transformation part. Instead of implementing a parser for each data source, they used Spark DataFrames APIs to load and transform the data.
They chose Terraform from HashiCorp to manage the infrastructure, and Localstack and Docker-compose were used to aid local development.
Building the Solution
The team at Kloia designed the architecture to provide scalability, high reliability, and performance efficiency. This serverless architecture reduces the operational overhead, as well as the overall cost.
Figure 1 represents the high-level application workflow design. Click on the image to expand it.
Figure 1 – High-level architecture of Kloia’s solution for GoDataFeed. Click to enlarge.
These are the details of the high-level application workflow. The numbers in the graphic match the numbers in the descriptions below.
- A user triggers the import workflow from the user interface, or a scheduled task triggers the import for the user.
- Imports request invokes the controller Lambda function, which retrieves the necessary configuration from the data service layer (DSL), passes it to the step function, and executes it.
- An AWS Step Function runs parallel download Amazon ECS tasks, if necessary, to retrieve data from the given client (Shopify or FTP, for example). Data is saved as is, without applying any transformation, into the imports Amazon S3 bucket inside the source folder.
- After downloading and storing the data on S3, the AWS Step Function triggers a schema validation Amazon ECS task. This task uses Apache Spark to extract and transform the data into the required form. The output is again saved on S3 as a JSON file.
- Next, an import task is triggered by the AWS Step Function that retrieves the mapping and filtering rules from the DSL. It also applies the rules on the data and saves the results in the products Amazon DocumentDB collection.
- If any exception occurs during the executions, the AWS Step Function automatically retries that step based on retry count settings. If the error exceeds the retry count, AWS Step Function triggers a workflow failure Lambda function.
If everything works successfully, the AWS Step Function triggers the workflow complete Lambda function as the last step of the import task. This function checks the configuration and triggers multiple parallel export workflows, if required (for example, export to eBay, Google Merchant Center).
- The export workflow starts with the compile step. It retrieves the data from the products collection, and the configuration from the DSL. It applies the rules, filters, validations, etc., and saves the data in the compiled products collection.
- The AWS Step Function then runs the writer Amazon ECS task. This task creates necessary files (for example, CSV, TSV, JSON) for the target system and UI. The files are saved into the export Amazon S3 bucket.
- Next, the submit Amazon ECS task is triggered, and uploads the files to the target system.
- Finally, similarly to the import workflow, workflow complete or workflow failure Lambda functions are triggered by the AWS Step Function as the last step of the workflow.
Designing and Optimizing Workflows
To handle the ETL process, Kloia designed one finite state machine (FSM) to represent import workflows, and another to represent export workflows.
Figure 2 – Finite state machines representing both import and export workflows.
Using the AWS Step Functions and Amazon ECS tasks previously described worked well for implementing the workflows accurately.
One of the challenges Kloia faced during the execution of the workflow was a small delay between the tasks. Across the entire workflow, this delay cost an additional 1-2 minutes. Since customers with less than 10,000 products in the existing platform were already experiencing an overall processing time of 1-2 minutes, this additional delay was not acceptable.
Troubleshooting revealed the delay was caused by AWS Fargate provisioning (downloading Docker images, allocating resources, for example). The team at Kloia noticed this delay specifically in two tasks—import and compile.
To reduce this delay, they created versions of these tasks written as Lambda functions. Based on the application configuration, the workflow decides whether to use the Amazon ECS or the Lambda function version of the task.
Figure 3 – Workflow decides whether to use the Amazon ECS or Lambda version of the task.
Integrating with the Legacy Platform
Kloia migrated some parts of GoDataFeed’s legacy platform to .NET Core without re-factoring them as part of the new architecture. These parts included configuration and user preferences, transformation rules, and so on.
To integrate these legacy parts into the new architecture, the team created a new DSL. Using an API layer between the legacy and cloud platforms ensured decoupling, and helped reduce development time and integration complexity. This also allowed Kloia to iteratively modernize the components of the legacy platform.
The team used Amazon Simple Notification Service (Amazon SNS) topics to post updates of workflow progress to the legacy platform. All applications send messages and update their status or progress, and give detailed information about any errors. The legacy platform subscribes to the topic and updates the legacy tables and logs based on the progress.
By converting monolithic legacy applications from the .NET Framework into cloud-native, you can reap the agility, scalability, and cost savings of the cloud.
Modernizing your legacy .NET applications to cloud-native can be transformative. Choosing trusted providers with niche expertise in this space is critical to achieving the desired value-add and results.
Kloia used an AWS Serverless approach to re-architect GoDataFeed’s legacy platform into cloud native. Doing so helped GoDataFeed’s current business challenges, and left them with a future-ready platform that took their business to the next level.
“We have worked with Kloia to help build out the next generation of our software. Their vast knowledge and ability to quickly learn our business has been a tremendous benefit to our organization.” ~ Sheldon Cohen, CTO at GoDataFeed
We are interested in hearing about your application modernization approach, as well as the challenges you face and the solutions that work for you.
Kloia – APN Partner Spotlight
Kloia is an AWS Competency Partner and DevOps consultancy company. Their focus is on the digital transformation of legacy infrastructures and practices.
*Already worked with Kloia? Rate this Partner
*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.