By Ujwal Bukka, Partner Solutions Architect – AWS SaaS Factory

AWS-SaaS-Factory-2020

Operational excellence is a key challenge for software-as-a-service (SaaS) providers. It deals with the ability to run and monitor workloads effectively and to continuously improve supporting processes and procedures to deliver business value.

Striving to get it helps ensure frictionless operation of the SaaS solution, and thus optimal customer experience.

When operating a multi-tenant environment, SaaS providers should be aware of the current status of each tenant, make sure automated processes are working equally and as expected for new and existing tenants, and continuously improve supporting processes and procedures.

Operational excellence design principles and best practices associated with multi-tenant environments include the ability to effectively monitor and manage the operational health of the solution. This includes knowing how effectively your system is scaling the resources for each tenant and/or tier, capturing the right metric data in order to generate alerts and insights, managing operational processes such as tenant onboarding, and supporting the different needs of tenants in the multi-tenant environment.

In this post, I will review these design principles and best practices.

Implementing these best practices will enable you to handle and respond to continually shifting tenant workloads and usage patterns, and use data-driven insights to achieve desired business and technical outcomes. The goal here is to help you design operational mechanisms and processes which can support ever-changing business priorities, provide business context, reflect customer needs, and more.

SaaS Applications Operational Excellence Considerations

SaaS providers need to effectively support customers, cope up with dynamic business needs, and incorporate any lessons learned. There are several key points to consider from an operational perspective in a SaaS delivery model which can help you achieve operational excellence.

There are four main areas that we recommend to examine when building a SaaS solution:

  • Manage and monitor multi-tenant environment health
  • Tenant onboarding
  • Tenant-specific customizations
  • Capturing and analyzing metric data

Manage and Monitor Multi-Tenant Environment Health

In multi-tenant environments, you may deploy tenants in a shared infrastructure model. Any outage or issue might have a negative impact on all tenants, even to the extent of causing service downtime for all customers.

To prevent this, SaaS providers must place great importance in building operational mechanisms that allow you to monitor tenant health and trends, such as whether resources are scaling appropriately for each tenant and/or tier, and how the tenants consumption of resources varying over a period of time.

In addition, SaaS providers should monitor the tenant activity, identify, and analyze current or potential issues, and fix them proactively.

As part of implementing operational mechanisms, you can build dashboards or views which provide insights into tenants’ health and activity. These dashboards should provide a global view of all tenants’ health, and also tenant and/or tier-specific views that provide operational data.

For example, in order to have insights on the amount of data stored on Amazon Simple Storage Service (Amazon S3) there should be dashboards which provide insights on the overall amount of data stored on S3 by all tenants as well as by individual tenants. There can be other dashboards—like tenant resource consumption on a tenant-by-tenant basis, or tenant activity by application services—which supplement operational mechanisms.

In order to build these dashboards, you need to have operational data like database query duration, search latency, amount of data stored, or result count with tenant and tiering context. As part of your architecture design, you’ll need to consider how and where you will capture operational data along with tenant and tier context.

Refer to the “Instrumenting Your Application” section of this AWS SaaS Factory blog post which explains how and where you need to capture operation data along with tenant context.

Tenant Onboarding

A good SaaS solution provides a frictionless experience for its customers. Tenant onboarding is the first piece of functionality that tenants interact with. Your SaaS solution should provide the ability to onboard new tenants seamlessly and allow them to run this process repeatedly.

Depending on your solution’s requirements, the onboarding process can be triggered in a self-service manner, or triggered by the SaaS provider when needed. Regardless of the type of onboarding process you use, the provisioning of all resources as part of onboarding a tenant should be automated.

During the onboarding process, resources can be created and configured in different ways. Some of the resources can be provisioned synchronously, while others can be provisioned asynchronously. Provisioning a billing account as a part of onboarding can happen asynchronously, for example. The goal is to provide better agility around the onboarding process.

Refer to the SaaS Lens documentation for a deep dive on the tenant onboarding process.

Tenant-Specific Customizations

As a SaaS provider, you must make sure all tenants are running the same version of your SaaS product. Supporting specific customer requirements through one-off versions of your product will undermine the overall agility of your solution by requiring specific maintenance actions and manual work. This also hinders operational excellence and innovation goals.

Instead of creating a separate version of your product for each one-off requirement, implement these features into your core platform so they are available to all customers. Using tenant-specific configurations, you’d be able to determine which tenants can use these features.

One common way to implement this is by using feature flags. Each feature flag correlates to a tenant configuration option, and will be evaluated at runtime. Based on their values, the outcome will result in different paths of execution in a common code base.

For example, a series of flags will be turned on/off for individual tenants, determining which capabilities are enabled for a tenant. Thus, feature flags enable or disable different features for different tenants based on tenant configuration. You should be cautions enough to refrain from creating a complex maze of feature flag-based options that result in an unmanageable environment.

You can refer to this article to understand feature flags in more detail.

Capturing and Analyzing Metric Data

To continuously grow and improve your SaaS solution, you need to collect various metrics which provide insights of usage and resource consumption in your environment. Then, you need to analyze this data to get meaningful information.

SaaS metrics are not just about capturing fundamental metrics like CPU and memory. They’re also about giving you a fundamental understanding of tenant activity in your environment. Analyzing this data should help you to continuously improve the reliability, scalability, cost efficiency, and overall SaaS agility.

For example, technical teams can use this data to troubleshoot or forecast any issues, such as whether current architecture can support a bigger number of tenants or not in the near future, and how to manage a more accurate capacity planning process.

Business teams can use this data to shape the product roadmap and pricing models. For example, according to data of features that are used by tenants, business teams can prioritize the most-used features, enrich them, and/or make changes such as defining some features as available only for premium tiers.

In order to enable this ability, you need to identify key areas in your application where you can capture metrics that give you useful operational insights, and then make them accessible and consumable. As a next step, build a metric infrastructure which allows ingestion, aggregation, and surfacing of this data similar to a sample metric infrastructure shown in Figure 1.

Finally, build dashboards or views that help you analyze this data. You can refer to this SaaS Factory blog post which explains how to ingest, aggregate, and visualize multi-tenant metrics of a SaaS application. Here’s a high-level diagram from the post that illustrates a metrics ingestion and analysis flow.

SaaS-Operational-Excellence-1

Figure 1 – Sample metric infrastructure.

You can also refer to this workshop to build dashboards which display cost, usage, and operational insights of your AWS Cloud usage.

SaaS Lens Operational Excellence Pillar Questions

In order to assess your current operational excellence posture and get prescriptive guidance on how to improve it, you can use the AWS Well-Architected SaaS Lens. This uses the questions below to evaluate your alignment with the operational excellence considerations mentioned above.

The SaaS Lens’ Operational Excellence pillar extends the existing Well-Architected principles. To align with the full range of best practices, be sure you included these foundational practices as part of your review. For more details, refer to the Well-Architected Framework’s Operational Excellence pillar description.

The questions are aligned with the general Well-Architected design principles, but address unique challenges in SaaS solutions. Each is accompanied by a short summary of the recommended practices for each topic. It includes a Required, Good, and Best set of practices, as well as a reference to content that’s related to the discussed topics.

Following is a high-level view of the scope and goals of each question. For more details, please refer to the Operational Excellence pillar section in the SaaS Lens Whitepaper and the AWS Well-Architected Tool. Guidance for improving your current posture can be found in the SaaS Lens improvement plan within the Well-Architected Tool.

SaaS OPS 1. How do you effectively monitor and manage the operational health of a multi-tenant environment?

Capturing tenant consumption, activity and health trends using robust operational tools will help you assess the overall activity and health trends of tenant and tenant tiers.

  • Include tenant context into application logs: Operational tools aggregate log activity, enabling operations teams to inspect the health and activity of the system, individual tenants, and tenant tiers.
  • Collect detailed tenant insights: Instrumentation is added to the SaaS application, enabling it to emit a collection of detailed tenant insights that enable operational analysis of tenant activity, health, and consumption trends. Operations teams leverage business intelligence (BI) tools to analyze this tenant-infused data.
  • Use purpose-built, tenant-aware tools to enable proactive management of workloads: Use tools to provide detailed tenant operational data to analyze and evaluate activity, consumption, and health through the lens of tenants and tiers. These tools enable the implementation of proactive policies and alarms.

SaaS OPS 2. How are you capturing and surfacing metric data that can be used to analyze the usage and consumption trends of individual tenants?

Different organizational functions in both technical and business-related teams can look at the captured tenant-aware metrics and use it to shape architecture and product strategies, pricing models, and operational excellence.

  • Capture low fidelity tenant activity metrics: Use packaged frameworks and tools that can capture and surface readily-available application and system insights with minimal instrumentation, injecting tenant context where possible.
  • Instrument high-value workflows of the system with tenant-aware metrics: Targeted, high-value areas of the system are instrumented with metrics that provide insights on workflows and use cases that are essential to understanding the customer experience and consumption patterns of these high-value targets. Use analytics tools to analyze and surface operationally significant data.
  • Create a complete view of tenant consumption: The SaaS application is fully instrumented with metrics that capture a range of tenant activity, feature usage, and resource consumption events. These metrics enable product managers, architects, and operations teams to build analytics views of this data to drive technical and business decisions.

SaaS OPS 3. How are new tenants onboarded to your system?

The tenant onboarding process includes creating a tenant, user identity, isolation policies, billing, tenant configuration, and the required infrastructure. It should happen using an automated predictable process. This promotes better operational excellence and organizational agility.

  • Use manually triggered scripts to provision tenants: All of the steps required to onboard a new tenant are performed through one or more automated scripts that provision the elements of the tenant footprint (infrastructure, tenant, admin user).
  • Use a single automated process to onboard tenants: Onboarding of a new tenant is triggered and executed by a single automation process running end-to-end without manual intervention.
  • Provide a fully automated, self-service user experience that configures and executes tenant provisioning: Users (internal or customers) complete a registration form that collects all of their configuration data before launching the onboarding process. This executes the onboarding steps needed to introduce a new tenant into the system.

SaaS OPS 4. How do you support the need for tenant-specific customizations?

Managing separate environments or versions for each individual tenants will generate technical debt, impact the ability to manage the SaaS environment, and impede agility. All of the tenants must use a single version of the solution. Any customizations can be implemented in such a way that they are available to all tenants as feature toggles for example. This way, customizations do not impact operational excellence and SaaS agility.

  • Use feature flags to manage tenant variations: Support feature and functional variations through the introduction of flags that are enabled and disabled on a tenant-by-tenant basis.
  • Support unique tenant requirements via shared application customizations: Address any need for variation through the introduction of generalized application customization constructs that are configured as part of the tenant configuration process.

For more details, refer to the Operational Excellence pillar section in the SaaS Lens whitepaper.

Get Started with the Well-Architected SaaS Lens

The SaaS Lens is available in all regions where the AWS Well-Architected Tool is offered, as described in the AWS Regional Services List.

There are no costs in using the AWS Well-Architected Tool; the lens can be applied to existing workloads, or used for new workloads you define in the tool. You can use it to improve the application you are working on, or to get visibility into multiple workloads used by the department or area you are working with.

Learn more about the new SaaS Lens and get started today with the AWS Well-Architected Tool.

If you’re an AWS customer, find current AWS Partners that can conduct a review by visiting the AWS Well-Architected Partner Program or AWS SaaS Competency Partners page.

AWS-SaaS-Factory-Banner-1

About AWS SaaS Factory

AWS SaaS Factory helps organizations at any stage of the SaaS journey. Whether looking to build new products, migrate existing applications, or optimize SaaS solutions on AWS, we can help. Visit the AWS SaaS Factory Insights Hub to discover more technical and business content and best practices.

SaaS builders are encouraged to reach out to their account representative to inquire about engagement models and to work with the AWS SaaS Factory team.

Sign up to stay informed about the latest SaaS on AWS news, resources, and events.

Categories: APN