One of the use cases we hear from customers is that they want to provide very limited access to Amazon Workspaces users (for example contractors, consultants) in an AWS account. At the same time they want to allow them to query Amazon Simple Storage Service (Amazon S3) data in another account using Amazon Athena over a JDBC connection.

For example, marketing companies might provide private access to the first party data to media agencies through this mechanism.

The restrictions they want to put in place are:

  • For security reasons these Amazon WorkSpaces should not have internet connectivity. So the access to Amazon Athena must be over AWS PrivateLink.
  • Public access to Athena is not allowed using the credentials used for the JDBC connection. This is to prevent the users from leveraging the credentials to query the data from anywhere else.

In this post, we show how to use Amazon Virtual Private Cloud (Amazon VPC) endpoints for Athena, along with AWS Identity and Access Management (AWS IAM) policies. This provides private access to query the Amazon S3 data while preventing users from querying the data from outside their Amazon WorkSpaces or using the Athena public endpoint.

Let’s review the steps to achieve this:

  • Initial setup of two AWS accounts (AccountA and AccountB)
  • Set up Amazon S3 bucket with sample data in AccountA
  • Set up an IAM user with Amazon S3 and Athena access in AccountA
  • Create an Amazon VPC endpoint for Athena in AccountA
  • Set up Amazon WorkSpaces for a user in AccountB
  • Install a SQL client tool (we will use DbVisualizer Free) and Athena JDBC driver in Amazon WorkSpaces in AccountB
  • Use DbVisualizer to the query the Amazon S3 data in AccountA using the Athena public endpoint
  • Update IAM policy for user in AccountA to restrict private only access

 Prerequisites

To follow the steps in this post, you need two AWS Accounts. The Amazon VPC and subnet requirements are specified in the detailed steps.

Note: The AWS CloudFormation template used in this blog post is for US-EAST-1 (N. Virginia) Region so ensure the Region setting for both the accounts are set to US-EAST-1 (N. Virginia).

Walkthrough

The two AWS accounts are:

AccountA – Contains the Amazon S3 bucket where the data is stored. For AccountA you can create a new Amazon VPC or use the default Amazon VPC.

AccountB – Amazon WorkSpaces account. Use the following AWS CloudFormation template for AccountB:

  • The AWS CloudFormation template will create a new Amazon VPC in AccountB with CIDR 10.10.0.0/16 and set up one public subnet and two private subnets.
  • It will also create a NAT Gateway in the public subnet and create both public and private route tables.
  • Since we will be launching Amazon WorkSpaces in these private subnets and not all Availability Zones (AZ) are supported by Amazon WorkSpaces, it is important to choose the right AZ when creating them.

Review the documentation to learn which AWS Regions/AZ are supported.

We have provided two parameters in the AWS CloudFormation template:

  • AZName1
  • AZName2

Step 1

Before launching the CloudFormation stack:

  • Log in to AccountB
  • Search for AWS Resource Access Manager
  • On the right-hand side, you will notice the AZ ID to AZ Name mapping. Note down the AZ Name corresponding to AZ ID use1-az2 and use1-az4
  • Now launch the CloudFormation template and remember to choose the AZ names you noted down earlier
    • https://athena-workspaces-blogpost.s3.amazonaws.com/vpc.yaml
  • Enter the CloudFormation Stack Name as – ‘AthenaWorkspaces’ and leave everything default.
  • Once the CloudFormation stack creation is complete, create a peering connection from AccountB to AccountA.
  • Update the associated route tables for the private subnets with the new peering connection.

For information on how to create a VPC peering connection, refer to AWS documentation on VPC Peering.

AccountB VPC Route Table:

AccountB VPC Route Table:

AccountA VPC Route Table:

AccountA VPC Route Table

Step 2

  • Create a new Amazon S3 bucket in AccountA with a bucket name that starts with ‘athena-’.
  • Next, you can download a sample file and upload it to the Amazon S3 bucket you just created.
  • Use the following statements to create AWS Glue database. Use an external table for the data in the Amazon S3 bucket so that you can query it from Athena.
  • Go to Athena console and define a new database:

CREATE DATABASE IF NOT EXISTS sampledb

Once the database is created, create a new table in sampledb (by selecting sampledb from the “Database” drop down menu). Replace the <<your bucket name>> with the bucket you just created:

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.amazon_reviews_tsv( marketplace string, customer_id string, review_id string, product_id string, product_parent string, product_title string, product_category string, star_rating int, helpful_votes int, total_votes int, vine string, verified_purchase string, review_headline string, review_body string, review_date string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n'
LOCATION 's3://<<your bucket name>>/'
TBLPROPERTIES ("skip.header.line.count"="1")

 

Step 3

  • In AccountA, create a new IAM user with programmatic access.
  • Save the access key and secret access key.
  • For the same user add an Inline Policy which allows the following actions:

IAM summary

{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowAthenaReadActions", "Effect": "Allow", "Action": [ "athena:ListWorkGroups", "athena:ListDataCatalogs", "athena:GetExecutionEngine", "athena:GetExecutionEngines", "athena:GetNamespace", "athena:GetCatalogs", "athena:GetNamespaces", "athena:GetTables", "athena:GetTable" ], "Resource": "*" }, { "Sid": "AllowAthenaWorkgroupActions", "Effect": "Allow", "Action": [ "athena:StartQueryExecution", "athena:GetQueryResults", "athena:DeleteNamedQuery", "athena:GetNamedQuery", "athena:ListQueryExecutions", "athena:StopQueryExecution", "athena:GetQueryResultsStream", "athena:ListNamedQueries", "athena:CreateNamedQuery", "athena:GetQueryExecution", "athena:BatchGetNamedQuery", "athena:BatchGetQueryExecution", "athena:GetWorkGroup" ], "Resource": "*" }, { "Sid": "AllowGlueActionsViaVPCE", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:GetDatabases", "glue:CreateDatabase", "glue:GetTables", "glue:GetTable" ], "Resource": "*" }, { "Sid": "AllowGlueActionsViaAthena", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:GetDatabases", "glue:CreateDatabase", "glue:GetTables", "glue:GetTable" ], "Resource": "*" }, { "Sid": "AllowS3ActionsViaAthena", "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload", "s3:CreateBucket", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::athena-*" ] } ]
}

 

Step 4

  • In this step, we create an Interface VPC endpoint (AWS PrivateLink) for Athena in AccountA. When you use an interface VPC endpoint, communication between your Amazon VPC and Athena is conducted entirely within the AWS network.
  • Each VPC endpoint is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets.
  • To create an Interface VPC endpoint follow the instructions and select Athena in the AWS Services list. Do not select the checkbox for Enable Private DNS Name.
  • Ensure the security group that is attached to the Amazon VPC endpoint is open to inbound traffic on port 443 and 444 for source AccountB VPC CIDR 10.10.0.0/16. Port 444 is used by Athena to stream query results.
  • Once you create the VPC endpoint, you will get a DNS endpoint name which is in the following format. We are going to use this in JDBC connection from the SQL client.

      VPC_Endpoint_ID.athena.Region.vpce.amazonaws.com

Step 5

  • In this step we set up Amazon WorkSpaces in AccountB.
  • Each Amazon WorkSpace is associated with the specific Amazon VPC and AWS Directory Service construct that you used to create it. All Directory Service constructs (Simple AD, AD Connector, and Microsoft AD) require two subnets to operate, each in different Availability Zones. This is why we created 2 private subnets at the beginning.
  • For this blog post I have used Simple AD as the directory service for the Amazon WorkSpaces.
  • By default, IAM users don’t have permissions for Amazon WorkSpaces resources and operations.
  • To allow IAM users to manage Amazon WorkSpaces resources, you must create an IAM policy that explicitly grants them permissions
  • Then attach the policy to the IAM users or groups that require those permissions.
  • To start, go to the Amazon WorkSpaces console and select Advanced Setup.
    • Set up a new directory using the SimpleAD option.
    • Use the “small” directory size and choose the Amazon VPC and private subnets you created in Step 1 for AccountB.
    • Once you create the directory, register the directory with Amazon WorkSpaces by selecting “Register” from the “Action” menu.
    • Select private subnets you created in Step 1 for AccountB.

Directory info

  • Next, launch Amazon WorkSpaces by following the Launch WorkSpaces button.
  • Select the directory you created and create a new user.
  • For the bundle, choose Standard with Windows 10 (PCoIP).
  • After the Amazon WorkSpaces is created, you can log in to the Amazon WorkSpaces using a client software. You can download it from https://clients.amazonworkspaces.com/
  • Login to your Amazon WorkSpace, install a SQL Client of your choice. At this point your Amazon WorkSpace still has Internet access via the NAT Gateway
  • I have used DbVisualizer (the free version) as the SQL client. Once you have that installed, install the JDBC driver for Athena following the instructions
  • Now you can set up the JDBC connections to Athena using the access key and secret key you set up for an IAM user in AccountA.

Step 6

To test out both the Athena public endpoint and the Athena VPC endpoint, create two connections using the same credentials.

For the Athena public endpoint, you need to use athena.us-east-1.amazonaws.com service endpoint. (jdbc:awsathena://athena.us-east-1.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Athena public

For the VPC Endpoint Connection, use the VPC Endpoint you created in Step 4 (jdbc:awsathena://vpce-<>.athena.us-east-1.vpce.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Database connection Athena

Now run a simple query to select records from the amazon_reviews_tsv table using both the connections.

SELECT * FROM sampledb.amazon_reviews_tsv limit 10

You should be able to see results using both the connections. Since the private subnets are still connected to the internet via the NAT Gateway, you can query using the Athena public endpoint.

Run the AWS Command Line Interface (AWS CLI) command using the credentials used for the JDBC connection from your workstation. You should be able to access the Amazon S3 bucket objects and the Athena query run list using the following commands.

aws s3 ls s3://athena-workspaces-blogpost

aws athena list-query-executions

Step 7

  • Now we lock down the access as described in the beginning of this blog post by taking the following actions:
  • Update the route table for the private subnets by removing the route for the internet so access to the Athena public endpoint is restricted from the Amazon WorkSpaces. The only access will be allowed through the Athena VPC Endpoint.
  • Add conditional checks to the IAM user access policy that will restrict access to the Amazon S3 buckets and Athena only if:
    • The request came in through the VPC endpoint. For this we use the “aws:SourceVpce” check and provide the VPC Endpoint ID value.
    • The request for Amazon S3 data is through Athena. For this we use the condition “aws:CalledVia” and provide a value of “athena.amazonaws.com”.
  • In the IAM access policy below replace <<your vpce id>> with your VPC endpoint id and update the previous inline policy which was added to the IAM user in Step 3.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowAthenaReadActions", "Effect": "Allow", "Action": [ "athena:ListWorkGroups", "athena:ListDataCatalogs", "athena:GetExecutionEngine", "athena:GetExecutionEngines", "athena:GetNamespace", "athena:GetCatalogs", "athena:GetNamespaces", "athena:GetTables", "athena:GetTable" ], "Resource": "*", "Condition":{ "StringEquals":{ "aws:SourceVpce":[ "<<your vpce id>>" ] } } }, { "Sid": "AllowAthenaWorkgroupActions", "Effect": "Allow", "Action": [ "athena:StartQueryExecution", "athena:GetQueryResults", "athena:DeleteNamedQuery", "athena:GetNamedQuery", "athena:ListQueryExecutions", "athena:StopQueryExecution", "athena:GetQueryResultsStream", "athena:ListNamedQueries", "athena:CreateNamedQuery", "athena:GetQueryExecution", "athena:BatchGetNamedQuery", "athena:BatchGetQueryExecution", "athena:GetWorkGroup" ], "Resource": "*", "Condition":{ "StringEquals":{ "aws:SourceVpce":[ "<<your vpce id>>" ] } } }, { "Sid": "AllowGlueActionsViaVPCE", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:GetDatabases", "glue:CreateDatabase", "glue:GetTables", "glue:GetTable" ], "Resource": "*", "Condition":{ "StringEquals":{ "aws:SourceVpce":[ "<<your vpce id>>" ] } } }, { "Sid": "AllowGlueActionsViaAthena", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:GetDatabases", "glue:CreateDatabase", "glue:GetTables", "glue:GetTable" ], "Resource": "*", "Condition":{ "ForAnyValue:StringEquals":{ "aws:CalledVia":[ "athena.amazonaws.com" ] } } }, { "Sid": "AllowS3ActionsViaAthena", "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload", "s3:CreateBucket", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::athena-*" ], "Condition":{ "ForAnyValue:StringEquals":{ "aws:CalledVia":[ "athena.amazonaws.com" ] } } } ]
}

Once you applied the changes, try to reconnect using both the Athena VPC endpoint as well Athena public endpoint connections. The Athena VPC endpoint connection should work but the public endpoint connection will time out. Also try the same Amazon S3 and Athena AWS CLI commands. You should get access denied for both the operations.

Clean Up

To avoid incurring costs, remember to delete the resources that you created.

For AWS AccountA:

  • Delete the S3 buckets
  • Delete the database you created in AWS Glue
  • Delete the Amazon VPC endpoint you created for Amazon Athena

For AccountB:

  • Delete the Amazon Workspace you created along with the Simple AD directory. You can review more information on how to delete your Workspaces.

Conclusion

In this blog post, I showed how to leverage Amazon VPC endpoints and IAM policies to privately connect to Amazon Athena from Amazon Workspaces that don’t have internet connectivity.

Give this solution a try and share your feedback in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.