This post is contributed by Jerome Van Der Linden, AWS Solutions Architect

Concurrency of an AWS Lambda function is the number of requests it can handle at any given time. This metric is the average number of requests per second multiplied by the average duration in seconds. For example, if a Lambda function takes an average 500 ms to run with 100 requests per second, the concurrency is 50 (100 * 0.5 seconds).

When invoking a Lambda function, an execution environment is provisioned in a process commonly called a cold start. This initialization phase can increase the total execution time of the function. Consequently, with the same concurrency, cold starts can reduce the overall throughput. For example, if the function takes 600 ms instead of 500 ms due to cold start latency, with a concurrency of 50, it handles 83 requests per second.

As described in this blog post, Provisioned Concurrency helps to reduce the impact of cold starts. It keeps Lambda functions initialized and ready to respond to requests with double-digit millisecond latency. Functions can respond to requests with predictable latency. This is ideal for interactive services, such as web and mobile backends, latency-sensitive microservices, or synchronous APIs.

However, you may not need Provisioned Concurrency all the time. With a reasonable amount of traffic consistently throughout the day, the Lambda execution environments may be warmed already. Provisioned Concurrency incurs additional costs, so it is cost-efficient to use it only when necessary. For example, early in the morning when activity starts, or to handle recurring peak usage.

Example application

As an example, we use a serverless ecommerce platform with multiple Lambda functions. The entry point is a GraphQL API (using AWS AppSync) that lists products and accepts orders. The backend manages delivery and payments. This GitHub project provides an example of such an ecommerce platform and is based on the following architecture:

Post sample architecture

Post sample architecture

This company runs “deal of the day” promotion every day at noon. The “deal of the day” reduces the price of a specific product for 60 minutes, starting at 12pm. The CreateOrder function handles around 20 requests per seconds during the day. At noon, a notification is sent to registered users who then connect to the website. Traffic increases immediately and can exceed 400 requests per seconds for the CreateOrder function. This recurring pattern is shown in Amazon CloudWatch:

Reoccurring invocations

Reoccurring invocations

The peak shows an immediate increase at noon, slowly decreasing until 1pm:

Peak invocations

Peak invocations

Examining the response times of the function in AWS X-Ray, the first graph shows normal traffic:

Normal performance distribution

Normal performance distribution

While the second shows the peak:

Peak performance distribution

Peak performance distribution

The average latency (p50) is higher during the peak with 535 ms versus 475 ms at normal load. The p90 and p95 values show that most of the invocations are under 800 ms in the first graph but around 900 ms in the second one. This difference is due to the cold start, when the Lambda service prepares new execution environments to absorb the load during the peak.

Using Provisioned Concurrency at noon can avoid this additional latency and provide a better experience to end users. Ideally, it should also be stopped at 1pm to avoid incurring unnecessary costs.

Application Auto Scaling

Application Auto Scaling allows you to configure automatic scaling for different resources, including Provisioned Concurrency for Lambda. You can scale resources based on a specific CloudWatch metric or at a specific date and time. There is no extra cost for Application Auto Scaling, you only pay for the resources that you use. In this example, I schedule Provisioned Concurrency for the CreateOrder Lambda function.

Scheduling Provisioned Concurrency for a Lambda function

  1. Create an alias for the Lambda function to identify the version of the function you want to scale:
    $ aws lambda create-alias --function-name CreateOrder --name prod --function-version 1.2 --description "Production alias"
  2. In this example, the average execution time is 500 ms and the workload must handle 450 requests per second. This equates to a concurrency of 250 including a 10% buffer. Register the Lambda function as a scalable target in Application Auto Scaling with the RegisterScalableTarget API:
    $ aws application-autoscaling register-scalable-target \
    --service-namespace lambda \
    --resource-id function:CreateOrder:prod \
    --min-capacity 0 --max-capacity 250 \
    --scalable-dimension lambda:function:ProvisionedConcurrency

    Next, verify that the Lambda function is registered correctly:

    $ aws application-autoscaling describe-scalable-targets --service-namespace lambda

    The output shows:

    { "ScalableTargets": [ { "ServiceNamespace": "lambda", "ResourceId": "function:CreateOrder:prod", "ScalableDimension": "lambda:function:ProvisionedConcurrency", "MinCapacity": 0, "MaxCapacity": 250, "RoleARN": "arn:aws:iam::012345678901:role/aws-service-role/lambda.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_LambdaConcurrency", "CreationTime": 1596319171.297, "SuspendedState": { "DynamicScalingInSuspended": false, "DynamicScalingOutSuspended": false, "ScheduledScalingSuspended": false } } ]
    }
  3. Schedule the Provisioned Concurrency using the PutScheduledAction API of Application Auto Scaling:
    $ aws application-autoscaling put-scheduled-action --service-namespace lambda \ --scalable-dimension lambda:function:ProvisionedConcurrency \ --resource-id function:CreateOrder:prod \ --scheduled-action-name scale-out \ --schedule "cron(45 11 * * ? *)" \ --scalable-target-action MinCapacity=250

    Note: Set the schedule a few minutes ahead of the expected peak to allow time for Lambda to prepare the execution environments. The time must be specified in UTC.

  4. Verify that the scaling action is correctly scheduled:
    $ aws application-autoscaling describe-scheduled-actions --service-namespace lambda

    The output shows:

    { "ScheduledActions": [ { "ScheduledActionName": "scale-out", "ScheduledActionARN": "arn:aws:autoscaling:eu-west-1:012345678901:scheduledAction:643065ac-ba5c-45c3-cb46-bec621d49657:resource/lambda/function:CreateOrder:prod:scheduledActionName/scale-out", "ServiceNamespace": "lambda", "Schedule": "cron(45 11 * * ? *)", "ResourceId": "function:CreateOrder:prod", "ScalableDimension": "lambda:function:ProvisionedConcurrency", "ScalableTargetAction": { "MinCapacity": 250 }, "CreationTime": 1596319455.951 } ]
    }
  5. To stop the Provisioned Concurrency, schedule another action after the peak with the capacity set to 0:
    $ aws application-autoscaling put-scheduled-action --service-namespace lambda \ --scalable-dimension lambda:function:ProvisionedConcurrency \ --resource-id function:CreateOrder:prod \ --scheduled-action-name scale-in \ --schedule "cron(15 13 * * ? *)" \ --scalable-target-action MinCapacity=0,MaxCapacity=0

With this configuration, Provisioned Concurrency starts at 11:45 and stops at 13:15. It uses a concurrency of 250 to handle the load during the peak and releases resources after. This also optimizes costs by limiting the use of this feature to 90 minutes per day.

In this architecture, the CreateOrder function synchronously calls ValidateProduct, ValidatePayment, and ValidateDelivery. Provisioning concurrency only for the CreateOrder function would be insufficient as the three other functions would impact it negatively. To be efficient, you must configure Provisioned Concurrency for all four functions, using the same process.

Observing performance when Provisioned Concurrency is scheduled

Run the following command at any time outside the scheduled window to confirm that nothing has been provisioned yet:

$ aws lambda get-provisioned-concurrency-config \
--function-name CreateOrder --qualifier prod
No Provisioned Concurrency Config found for this function

When run during the defined time slot, the output shows that the concurrency is allocated and ready to use:

{ "RequestedProvisionedConcurrentExecutions": 250, "AvailableProvisionedConcurrentExecutions": 250, "AllocatedProvisionedConcurrentExecutions": 250, "Status": "READY", "LastModified": "2020-08-02T11:45:18+0000"
}

Verify the different scaling operations using the following command:

$ aws application-autoscaling describe-scaling-activities \
--service-namespace lambda \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--resource-id function:CreateOrder:prod

You can see the scale-out and scale-in activities at the times specified, and how Lambda fulfills the requests:

{ "ScalingActivities": [ { "ActivityId": "6d2bf4ed-6eb5-4218-b4bb-f81fe6d7446e", "ServiceNamespace": "lambda", "ResourceId": "function:CreateOrder:prod", "ScalableDimension": "lambda:function:ProvisionedConcurrency", "Description": "Setting desired concurrency to 0.", "Cause": "maximum capacity was set to 0", "StartTime": 1596374119.716, "EndTime": 1596374155.789, "StatusCode": "Successful", "StatusMessage": "Successfully set desired concurrency to 0. Change successfully fulfilled by lambda." }, { "ActivityId": "2ea46a62-2dbe-4576-aa42-0675b6448f0a", "ServiceNamespace": "lambda", "ResourceId": "function:CreateOrder:prod", "ScalableDimension": "lambda:function:ProvisionedConcurrency", "Description": "Setting min capacity to 0 and max capacity to 0", "Cause": "scheduled action name scale-in was triggered", "StartTime": 1596374119.415, "EndTime": 1596374119.431, "StatusCode": "Successful", "StatusMessage": "Successfully set min capacity to 0 and max capacity to 0" }, { "ActivityId": "272cfd75-8362-4e5c-82cf-5c8cf9eacd24", "ServiceNamespace": "lambda", "ResourceId": "function:CreateOrder:prod", "ScalableDimension": "lambda:function:ProvisionedConcurrency", "Description": "Setting desired concurrency to 250.", "Cause": "minimum capacity was set to 250", "StartTime": 1596368709.651, "EndTime": 1596368901.897, "StatusCode": "Successful", "StatusMessage": "Successfully set desired concurrency to 250. Change successfully fulfilled by lambda." }, { "ActivityId": "38b968ff-951e-4033-a3db-b6a86d4d0204", "ServiceNamespace": "lambda", "ResourceId": "function:CreateOrder:prod", "ScalableDimension": "lambda:function:ProvisionedConcurrency", "Description": "Setting min capacity to 250", "Cause": "scheduled action name scale-out was triggered", "StartTime": 1596368709.354, "EndTime": 1596368709.37, "StatusCode": "Successful", "StatusMessage": "Successfully set min capacity to 250" } ] }

In the following latency graph, most of the requests are now completed in less than 800 ms, in line with performance during the rest of the day. The traffic during the “deal of the day” at noon is comparable to any other time. This helps provide a consistent user experience, even under heavy load.

Performance distribution

Performance distribution

Automate the scheduling of Lambda Provisioned Concurrency

You can use the AWS CLI to set up scheduled Provisioned Concurrency, which can be helpful for testing in a development environment. You can also define your infrastructure with code to automate deployments and avoid manual configuration that may lead to mistakes. This can be done with AWS CloudFormation, AWS SAM, AWS CDK, or third-party tools.

The following code shows how to schedule Provisioned Concurrency in an AWS SAM template:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31 Resources: CreateOrderFunction: Type: AWS::Serverless::Function Properties: CodeUri: src/create_order/ Handler: main.handler Runtime: python3.8 MemorySize: 768 Timeout: 30 AutoPublishAlias: prod #1 DeploymentPreference: Type: AllAtOnce CreateOrderConcurrency: Type: AWS::ApplicationAutoScaling::ScalableTarget #2 Properties: MaxCapacity: 250 MinCapacity: 0 ResourceId: !Sub function:${CreateOrderFunction}:prod #3 RoleARN: !Sub arn:aws:iam::${AWS::AccountId}:role/aws-service-role/lambda.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_LambdaConcurrency ScalableDimension: lambda:function:ProvisionedConcurrency ServiceNamespace: lambda ScheduledActions: #4 - ScalableTargetAction: MinCapacity: 250 Schedule: 'cron(45 11 * * ? *)' ScheduledActionName: scale-out - ScalableTargetAction: MinCapacity: 0 MaxCapacity: 0 Schedule: 'cron(15 13 * * ? *)' ScheduledActionName: scale-in DependsOn: CreateOrderFunctionAliasprod #5

In this template:

  1. As with the AWS CLI, you need an alias for the Lambda function. This automatically creates the alias “prod” and sets it to the latest version of the Lambda function.
  2. This creates an AWS::ApplicationAutoScaling::ScalableTarget resource to register the Lambda function as a scalable target.
  3. This references the correct version of the Lambda function by using the “prod” alias.
  4. Defines different actions to schedule as a property of the scalable target. You define the same properties as used in the CLI.
  5. You cannot define the scalable target until the alias of the function is published. The syntax is <FunctionResource>Alias<AliasName>.

Conclusion

Cold starts can impact the overall duration of your Lambda function, especially under heavy load. This can affect the end-user experience when the function is used as a backend for synchronous APIs.

Provisioned Concurrency helps reduce latency by creating execution environments ahead of invocations. Using the ecommerce platform example, I show how to combine this capability with Application Auto Scaling to schedule scaling-out and scaling-in. This combination helps to provide a consistent execution time even during special events that cause usage peaks. It also helps to optimize cost by limiting the time period.

To learn more about scheduled scaling, see the Application Auto Scaling documentation.