By Michael Segal, AVP Strategic Alliances at NETSCOUT
By Ray Krug, Solutions Architect at NETSCOUT
By Roy Rodan, Partner Solutions Architect at AWS
In this post, we will demonstrate how NETSCOUT’s visibility is helping Forte Data Solutions to quickly resolve latency issues of web-based statistical applications.
By using NETSCOUT’s Application Management Solution available in AWS Marketplace, Forte was able to gain visibility into load, latency, errors, and dependencies of their application, quickly identify the root cause of latency issues, and fix them.
NETSCOUT is an AWS Partner Network (APN) Advanced Technology Partner that helps customers monitor and analyze network traffic in order to gain critically-needed insights that reveal the root cause of performance issues.
Forte Data Solutions are experts in database and application migrations, and feature web-based statistical applications used for generating statistical reports on the sales per given time periods, as well as for given products and other customizable criteria.
When Forte customers began experiencing issues with their web-based statistical applications, IT teams faced the challenge of addressing the problem quickly before it could impact the bottom line and each company’s quality of service.
Users stated that it took several seconds to save their work, causing reports to be delayed anywhere from several milliseconds to several seconds, which had a cascading impact with other applications relying on these reports.
Ultimately, more than 50 percent of customers’ reports were severely delayed, impacting user experience while costing Forte lost revenue and damage to its brand reputation.
Auto Scaling Group Regularly Exceeded Limits
Forte’s web-based statistical application runs on two web servers behind Elastic Load Balancing (ELB) which functions as both a Network Load Balancer and Application Load Balancer. As the server load grew, customers experienced slowdowns when running queries, as well as instability, intermittent freezes within the application, and timeouts while creating reports.
As the overall load of the application was continuously growing, data stored in an Oracle database and running on a multi-node Real Application Cluster (RAC) installed on Amazon Elastic Compute Cloud (Amazon EC2) RHEL instances was placed in an Auto Scaling Group.
This enabled activity peaks to be easily satisfied by automatically scaling out new RAC nodes once CPU and RAM usage thresholds were exceeded. The Auto Scaling Group was set up for a minimum of two nodes and maximum of six.
Figure 1 below illustrates Forte’s service delivery architecture:
- Amazon Route 53 routes users to the Forte web-applications reporting tier, which is responsible for producing statistical reporting data such as sales reports per given time periods.
- The reporting tier server queries the database tier comprised of Oracle RAC that retrieves and stores data as required in the AWS NFS-enabled storage tier.
- NETSCOUT vSTREAM (not shown on this diagram) monitors traffic between each tier, analyzes it in real-time, and converts it to service contextual metadata used by NETSCOUT to deliver application performance insights to Forte.
- ELB provides the flexibility of accessing RAC nodes transparently with a round-robin rule. This helps streamline the connectivity between the RAC nodes and the application.
- ELB acts on TCP port 1521, and the traffic switchover occurs if any of the RAC nodes become unresponsive, or in case of scaling in.
Figure 1 – Forte Data Solutions architecture.
Forte’s AWS administrators noticed the Auto Scaling Group was regularly scaling in and out the number of active RAC nodes beyond the established limit. Even the additional Amazon EC2 instances were constantly starting up and shutting down, causing frequent Amazon CloudWatch Alarms to be raised.
In addition, database backups were delayed, while some backups failed due to the database performance issues.
Forte’s IT team attempted to address these issues by testing the web tiers and adjusting some Apache parameters, but the problem persisted.
To address the scaling behavior, they made several adjustments, such as utilizing new Amazon EC2 instance types with more memory and CPU processing power. Database administrators aligned the database settings with changed Amazon EC2 settings, and these efforts resulted in a reduced number of instances scaling out.
Unfortunately, the changes did not reduce the frequency of the Auto Scaling Group scaling in and out.
NETSCOUT’s Value Add
NETSCOUT was approached by Forte to help solve the infrastructure and application slowdowns, instability, and intermittent freezes that plagued mission-critical functions relied upon by the business.
NETSCOUT worked with Forte to deploy the NETSCOUT Application Management Solution from AWS Marketplace in the corresponding AWS infrastructure. This included vSTREAM agents with virtual nGeniusONE, and monitoring was configured to analyze networks on Apache, Oracle, and Java application ports.
The nGeniusONE dashboard workflows provided insights into database, web, and application details and their dependencies.
Figure 2 – NETSCOUT nGeniusONE dashboard.
Database monitoring revealed evidence of persistent scaling in and out, while web monitoring uncovered persistent latencies on both web servers, thus eliminating the RAC cluster as the root cause.
Application session analysis showed multiple Java and embedded SQL-related errors. The Java errors retrieved from packets indicated version-related issues that began occurring after recent Java upgrades.
Figure 3 – nGeniusONE dashboard with 5 minutes resolution for granular view.
Armed with these insights, Forte’s IT team was able to revert to a previous Java version. This was accomplished by using a previous Amazon EC2 snapshot. The Java config and libraries were successfully restored and downgraded on both web tier Amazon EC2 machines.
Once this fix was applied, the errors disappeared and the RAC Auto Scaling Group returned to normal threshold usage of two machines.
NETSCOUT’s monitoring solution has allowed Forte to address the slowdowns, instability, intermittent freezes, and timeouts plaguing customers. Their IT team is now able to proactively monitor and troubleshoot application performance in their AWS environment.
The nGeniusONE dashboard workflows empower IT teams to quickly identify the root cause of issues, thus reducing Mean-Time-To-Knowledge (MTTK) by over 70 percent and leading to rapid resolution while creating reports.
By using the NETSCOUT solution, Forte achieved tangible benefits, including:
- Reducing the latency of the web-tier from seconds to milliseconds.
- Stopping the unnecessary workload-driven node scaling at the database tier that were happening every 5-7 minutes.
- Eliminating hundreds of redundant daily CloudWatch Alarms.
This post illustrates how NETSCOUT visibility helped Forte Data Solutions improve service performance and user experience for web-based statistical applications.
Forte’s challenges included high latency at the web tier, continuous scaling in and out of nodes beyond the established limit at the database tier, and frequent CloudWatch Alarms. Traditional approaches to fixing these issues, such as adjusting some Apache parameters, utilizing new Amazon EC2 instance types, and adjusting database settings, failed to yield desirable results.
By using NETSCOUT’s Application Management Solution, Forte gained visibility into load, latency, errors, and dependencies and effectively identified the root-cause that was associated with Java and embedded SQL-related errors.
The benefits to Forte included reducing latency at the web tier from seconds to milliseconds, stopping the unnecessary workload-driven node scaling at the database tier, and eliminating redundant CloudWatch Alarms.
NETSCOUT – APN Partner Spotlight
NETSCOUT is an APN Advanced Technology Partner. They provide service assurance, security, and business analytics solutions that delivers consistent, high-resolution, real-time visibility into on-premises and cloud environments.
*Already worked with NETSCOUT? Rate this Partner
*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.
from AWS Partner Network (APN) Blog: https://aws.amazon.com/blogs/apn/identifying-and-resolving-application-performance-in-hybrid-environments-with-netscout/