Amazon Web Services (AWS) customers are storing an unprecedented amount of data on AWS for a range of use cases, including data lakes and analytics, machine learning, and enterprise applications. Customers secure their data by implementing data security controls including identity and access management, network security, and encryption. For non-public, sensitive data, customers want to make sure that it’s only accessible by authorized users from known locations. Customers implement identity and network based data loss prevention controls to ensure that access to sensitive data is restricted to trusted identities, such as your authorized corporate users, and expected network locations, such as your corporate network or your Amazon VPC. Some AWS services need to access your resources using their own identities and from outside of your network locations. In this post, we show how you can use aws:PrincipalIsAWSService, a new global AWS Identity and Access Management (IAM) condition key, to write policies that restrict access to your data from untrusted identities and unexpected network locations while safely granting access to AWS services. We discuss how to use the new condition key, provide sample policies that show its usage, and show how you can incorporate it into your organization’s data security strategy.
New global IAM condition key
aws:PrincipalIsAWSService is a global IAM condition key that simplifies resource-based policies (such as an Amazon S3 bucket policy) when granting access to AWS services. It gives you a shorthand for allowing AWS services to access your resources and can be used alongside other desired restrictions, such as restricting access to your networks. Since the purpose of this condition key is to simplify how AWS services interact with your resources, the examples in this post primarily cover resource-based policies. We use AWS CloudTrail to illustrate how this condition key can be used. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. CloudTrail allows you to create a trail, enabling ongoing delivery of events as log files to an Amazon Simple Storage Service (Amazon S3) bucket that you specify. Consider a scenario where you want to allow CloudTrail to write data to your S3 bucket directly from its service account but also want to ensure that all other access from your identities is restricted to your network, such as your Amazon VPC, as illustrated in Figure 1.
The new aws:PrincipalIsAWSService condition key can be used to implement a bucket policy that limits access to your data from your VPC while safely granting access to an AWS service, such as CloudTrail. Replace <my-logs-bucket>, <AccountNumber>, and <vpc-111bbb22> with your information in the following example.
The above policy statement limits s3:PutObject and s3:GetObject actions to the VPC while exempting AWS services from this condition. The aws:PrincipalIsAWSService condition key works with Boolean condition operators that restrict access based on comparing a key to true or false. In the preceding statement, access to the bucket containing CloudTrail data is restricted unless the request originates from the specified VPC (<vpc-111bbb22>) or from an AWS service (CloudTrail in this case).
Note: the complete policy, including the necessary Allow statements for cross-account access, is covered later in this post.
aws:PrincipalIsAWSService as part of a data perimeter
You have just read about the new aws:PrincipalIsAWSService global condition key and its basic usage. Keep reading to learn how you can use the new condition key to establish a data perimeter. Simply put, a data perimeter is a set of preventive guardrails that ensures that only customers’ identities are accessing their own data in the cloud from expected network locations. One of the key data perimeter controls is to configure your resources to be accessible only by trusted identities and from expected network locations. Let’s assume you have sensitive data stored in an S3 bucket. As part of establishing a data perimeter, you want to restrict access to that data to network locations—such as your VPC or on-premises network. Let’s see how you can restrict access based on four common data access patterns while securely exempting AWS services that also need to access your resources from outside of your network. Knowing the different data access patterns will help you understand when and how to use this new condition key, along with other IAM condition keys, to securely exempt AWS services.
1. Direct access of your identities to data
The most basic data access pattern is when one of your IAM principals (roles or users) from your AWS account directly accesses data within an S3 bucket. For example, a user logs in to the AWS Management Console to upload an object to an S3 bucket as shown in Figure 2.
With this access pattern, your bucket policy can constrain the permissions you have already granted as follows. Replace <my-data-bucket> and <203.0.113.0/24> with your information.
With the aws:SourceIp condition in the preceding policy, users are denied access to list, put, and get objects in or out of the S3 bucket unless the API call originates from within their corporate network.
Note: This and subsequent examples use a Deny statement to constrain the permissions you have already granted to help illustrate an effective data perimeter policy. The principals also require an identity-based policy with the appropriate Allow permissions to write to this bucket, which isn’t depicted in the examples. Similarly, for cross-account access, appropriate Allow statements must be added to the bucket policy for authorized principals.
The policies include a subset of s3 data access actions instead of s3:* to prevent unintentional lockout from your bucket, which could occur if you edit bucket policies outside of your specified network locations. You might consider expanding the action list to include other actions such as DeleteObject, RestoreObject, or PutBucketPolicy, depending on your requirements, or even s3:* as long as you can perform all actions from within your defined network perimeter.
2. Direct access to data by way of an AWS service
This second access pattern applies when you access the data via an AWS service and that service takes subsequent actions on behalf of the IAM principal. A common example of this access pattern is Amazon Athena. Athena is an interactive query service that lets you use standard SQL to query data in Amazon S3. To continue the preceding example, let’s assume you want to restrict users from accessing the S3 bucket unless the API call originates from within their corporate network but also allow them to query the data via Athena as shown in Figure 3.
The bucket policy in the previous example prevents the ability of users to query the data with Athena, since the calls to Amazon S3 are made from the Athena service and not from your corporate network. To account for this access pattern, you can update your bucket policy by adding the condition key aws:CalledViaFirst to StringNotEquals, as shown in the following example:
We now have a Deny statement with two negated condition keys. This means that both conditions must resolve to true to trigger the Deny effect. The condition statement in the preceding policy now reads as follows: deny the three S3 actions unless they originate from your corporate network (NotIpAddress with aws:SourceIp) or via the Athena service (StringNotEquals with aws:CalledViaFirst). We are using aws:CalledViaFirst (a single value key) instead of aws:CalledVia (a multivalued key), because a single value key is easier to reason about when used with a StringNotEquals condition. For more information on how to use aws:CalledViaFirst (and aws:CalledVia), see How to define least-privileged permissions for actions called by AWS services. See also Creating a condition with multiple keys or values for more details on the evaluation logic.
3. Intermediate IAM roles for data access
A third common pattern is to use an AWS service role. In this scenario, a given AWS service assumes a service role that you created to perform actions on your behalf. Since the AWS service is using a service role rather than making a request on the principal’s behalf, you cannot use the aws:CalledViaFirst condition key from the previous example. This access pattern has two variations which will determine how we grant AWS services access to your resources.
3a. API call originates from your VPC
The first is when the API call originates from within your expected network, such as within your VPC. A good example of this is AWS Glue. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning, and application development. To continue further with the preceding example, let’s assume you want to restrict users from accessing the data in your S3 bucket unless the API call originates from within your corporate network but also allow them to use AWS Glue to crawl the data to update the schema in the AWS Glue Data Catalog. In the case of AWS Glue, you can configure the crawler to use your network with network connection to Amazon S3 by specifying the desired VPC ID, subnet IDs, and security group as shown in Figure 4.
With this pattern, you simply extend your data perimeter to include your VPC network by adding the aws:SourceVpc condition to StringNotEquals, as shown in the following example. Replace each <placeholder> with your values.
The preceding policy adds the aws:SourceVPC condition in the same block as the aws:CalledViaFirst condition that was added earlier. We now have a Deny statement with three negated condition keys. This means that all three conditions must resolve to true to trigger the Deny effect. Therefore, this policy denies a call to Amazon S3 only if the call does not originate from your on-premises network, is not made via the Athena service, and does not originate from your VPC network.
3b. API call originates from outside of your VPC
While some AWS services that use a service role access your resources directly from your VPC such as AWS Glue, there’s a second variation of this access pattern when a service role needs to access the data in an S3 bucket from outside of your VPC. A good example of such a pattern is Amazon Translate asynchronous batch processing. Amazon Translate is a text translation service that uses advanced machine learning technologies to provide high-quality translation on demand. To translate large collections of documents, you can use the Amazon Translate asynchronous batch processing operation to translate documents stored in an S3 bucket. If you apply the preceding bucket policy, the translate job will fail with a no read access error. This happens because the calls to Amazon S3 are made by Amazon Translate from outside of the VPC. This is shown in Figure 5.
To account for this access pattern, you can update your bucket policy to include the aws:PrincipalArn condition key as part of the StringNotEquals statement as shown in the following example. Replace each <placeholder> with your values.
With the preceding policy, you’re effectively excluding the Amazon Translate service role—<AmazonTranslateServiceRole-myjobs>—associated with the translation job from the SourceIp and SourceVpc restrictions. You can make this exception because the Amazon Translate service role is configured with a trust policy that allows only the Amazon Translate service to assume it—only translate.amazonaws.com can assume the role. It is a best practice to apply the principle of least privilege to ensure that only authorized users are allowed to modify the trust policy of the role and to pass the role as part of translation job configuration.
4. AWS services with direct access to your resources
In the previous three examples, data in the bucket is accessed directly by your trusted identity, directly via an AWS service (Athena), or by a trusted intermediary service role (AWS Glue and Amazon Translate). However, there’s one final access pattern where the AWS service uses its own identity—a service principal—to perform an action on behalf of the customer. A good example of this access pattern is the CloudTrail use case we introduced at the start of this blog post, shown in Figure 6.
Let’s assume your data perimeter objective is to restrict access to the logs in your S3 bucket from either your VPC or the CloudTrail service. If you craft a bucket policy that restricts access to only your VPC using aws:SourceVpc condition alone, you effectively prevent CloudTrail from writing data to your bucket. You cannot use aws:CalledViaFirst to exclude CloudTrail as shown in data access pattern #2 above because CloudTrail is using its own service principal to write data to your bucket (Note: although as part of the CalledVia condition you also specify a service principal, such as athena.amazonaws.com, CalledVia only applies when the service is making a request on behalf of the calling principal, as opposed to CloudTrail, where the service is directly writing data to your bucket). You also cannot use aws:PrincipalArn as shown in data access pattern #3b because CloudTrail uses a service principal and not an ARN. By adding the new aws:PrincipalIsAWSService condition to your bucket policy, you can achieve your data perimeter objective as follows. Replace each <placeholder> with your values.
The first two Allow statements in the preceding bucket policy are part of the standard cross account bucket policy configuration for CloudTrail. The last statement—expected-network+service-principal—uses a combination of aws:SourceVpc and the newly launched aws:PrincipalIsAWSService conditions to deny access unless the call originates from your VPC network, or is made by an AWS service principal, such as CloudTrail.
Data perimeter policy for common data access patterns
Now that you have reviewed the common data access patterns and various IAM condition keys, including the new aws:PrincipalIsAWSService, let’s look at a data perimeter policy. This sample policy can be appended to all of your buckets or other resource based policies. Replace each <placeholder> with your values.
Note: appending policies to existing resources may cause an unintended disruption to your application. Consider testing your policies in lower environments before applying them to production resources.
The preceding policy consists of two statements. The first statement—network-data-perimeter—sets the expected network data perimeter. Let’s examine all of the condition elements in this statement:
|Condition key||Usage||Example data access pattern|
|aws:SourceIp||Use to restrict access to public IP ranges of your expected network when the request doesn’t originate over a VPC endpoint.||Console access from an on-premises corporate network as discussed in data access pattern #1.|
|aws:SourceVpc||Use to restrict access to specific VPC IDs of your expected network if the request originates over a VPC endpoint.||An application running on Amazon Elastic Compute Cloud (Amazon EC2) instance using an instance profile, a Lambda function deployed within a VPC, or an AWS Glue crawler configured with VPC network connection as discussed previously in data access pattern #3a.|
|aws:PrincipalArn||Allows you to exclude a principal, such as a service role for an AWS service when the request doesn’t originate from your network.||Amazon Translate in data access pattern #3b.|
|aws:PrincipalIsAWSService||Provides a straightforward way to allow access to an AWS service when the service uses its own service principal to access your bucket from its own network. Cannot be used when the AWS service makes a request on behalf of the IAM principal (such as in data access pattern #2, in which case you have to use aws:CalledVia instead).||CloudTrail in data access pattern #4.|
This condition key is similar to aws:CalledVia and aws:CalledViaFirst, but instead of being limited to a specific AWS service (i.e. Athena), it can be used to allow or deny access to any AWS service (hence it’s either set to true or false) that makes a request on behalf of the IAM principal to access your resources as discussed in data access pattern #2.
You typically wouldn’t use aws:CalledVia and aws:ViaAWSService in the same bucket policy. Instead, use aws:CalledVia for policies scoped to a specific AWS service and aws:ViaAWSService when you want to allow or deny any AWS service that makes a request on behalf of the IAM principal.
|Amazon Athena in data access pattern 2.|
The second statement—identity-data-perimeter—in the preceding policy sets the trusted identity data perimeter. Let’s examine the two conditions in this statement:
|Condition key||Usage||Data access pattern|
|aws:PrincipalOrgID||Restricts access to trusted principals that belong to your AWS Organizations. See the blog post An easier way to control access to AWS resources by using the AWS organization of IAM principals for additional use cases for this powerful condition key.||Used in resource based policies such as bucket policies and VPC endpoint policies.|
|aws:PrincipalIsAWSService||Similar to the first statement in the preceding policy, you can use this condition key to allow AWS service to access your bucket from its network using its own service principal.||CloudTrail in data access pattern #4.|
The newly launched aws:PrincipalIsAWSService condition key simplifies resource-based policies by providing a straightforward way to limit access to trusted identities and expected networks while at the same time granting access to AWS services that use their own service principal from outside of your network locations. You can also use this condition key as part of a broad data perimeter strategy across the common data access patterns we discussed in this blog post. If you have any questions, comments, or concerns, contact AWS Support or start a new thread on the AWS Identity and Access Management forum.
Thanks for reading about this new feature. If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.