AWS Config is a service that enables you to audit your AWS resources for compliance to a desired configuration state. You are billed based on the number of Configuration Items (a point-in-time snapshot of an AWS resource) recorded and the number of AWS Config rules (a function that reports resource compliancy) evaluated per resource per month.
In this post, I’ll show you how to use Amazon Athena to query AWS CloudTrail logs for a detailed billing breakdown of AWS Config rule evaluations. This breakdown can help determine which rules are contributing the most to monthly costs so you can optimize accordingly (for example, by reducing the frequency of the running of certain rules). I don’t dive into configuration items in this post. For more information about that, see the Identifying resources with the most configuration changes using AWS Config blog post.
Step 1: Create an Athena table to query CloudTrail logs
Create an Athena table and configure it with the location and schema of the CloudTrail logs. The easiest way to do this is to use the auto-generated statement available in the CloudTrail console.
From the left navigation pane, choose Event history, and then choose Create Athena table.
Figure 1: Event history page in the CloudTrail console
On the Create a table in Amazon Athena page, choose your CloudTrail log. For Storage location, choose the S3 bucket that contains the CloudTrail log files. The table name and table comment placeholders (indicated by the square brackets) will be auto-populated for you.
Figure 2: Create a table in Amazon Athena
You can either choose Create table or copy this statement and execute it in the Athena console, if you need to make modifications to the query.
CREATE TABLE statement should look similar to the following, which has a modified table name (LINE 1) and table comment (LINE 46):
Step 2: Create a simple query of CloudTrail data
Use the following query to ensure that the Athena table was created correctly. Replace
[your_cloudtrail_table] (LINE 2) with the chosen table name from the previous step. In the example above, the table name was
If the table was correctly created, you will see 10 results. This is the beginning of the full query, which you will expand on later in the post. If you read through the query, you can see it’s filtering on two attribute values:
config.amazonaws.com (events that originated from AWS Config) and
PutEvaluations (events that are recording a compliance state of a resource).
Notice two columns retrieved from the query:
additionaleventdata. We will extract data from these JSON objects in the next steps. For a detailed explanation of the properties in these objects, please reference the PutEvaluations API and ConfigRule API documentation respectively.
Step 3: Query the number of AWS Config rule evaluations by rule name
Use the following query to retrieve the number of AWS Config rule evaluations, sorted by highest to lowest number of evaluations, grouped by rule name.
Replace the table name placeholder (LINE 6) with the name of your Athena table.
If successfully executed, you will receive results similar to the following:
Let’s break down the query:
LINE 3: Uses the
json_extract function to extract
additionaleventdata using a JSONPath expression (
LINE 4: Similar to LINE 3, extracts evaluations from
json_extract and casts it as an
LINE 16: Uses the
UNNEST operator to expand the evaluations array, extracted in LINE 4, into individual elements.
LINE 18: Filters out evaluations that are
NOT_APPLICABLE, because they do not contribute to costs.
LINE 22: Counts the number of evaluations of each rule.
The query also contains a
LIMIT clause (LINE 9) because, by default, Athena will query all CloudTrail log files in the specified S3 bucket. Depending on how large the bucket is, it might take a long time to execute this query. Feel free to experiment by removing the
LIMIT clause to see if the query times are acceptable to you. (It still might take more than one minute to execute the query.)
It is fine to scan all files in a S3 bucket for experimenting with Athena queries. However, for long-term use, it’s more performant and cost-effective to partition your data. When you use Amazon Athena, you are charged by the amount of data that you scan. For more information, see the Amazon Athena pricing page.
Steps 4-6 will describe how to create and query partitions. If you don’t want to do this, you can skip ahead to Step 7.
Step 4: Create a new table with partition keys
Partitioning Athena tables means creating logical groups that restrict the amount of data scanned by each query. A common best practice is to partition by date or time. If you look at the
CREATE TABLE statement in Step 1 of this post, the
LOCATION clause looks like the following:
If you browse the objects in that S3 bucket, you’ll see they are organized into AWS Regions, years, months, and days. They are already physically partitioned. See Figure 3. The Regions in your account might be different.
However, because these physical partition locations are not registered in Athena, by default every query will go through all folders.
Figure 3: CloudTrail S3 bucket Region folders
To partition your table, you will first need to define partition keys using an additional clause in the create table statement. Use the same
CREATE TABLE statement from Step 1, but add a
PARTITIONED BY clause as seen in the example below (LINE 47). You’ll see that I also added a
_partitioned suffix to the table name to differentiate it from the unpartitioned table (LINE 1).
LINE 47: There are three partitions:
month. You can create more partition keys (for example, account ID and day) as appropriate for your use case.
Note: Where the
PARTITIONED BY clause is entered (after
COMMENT), an incorrect clause order might cause a syntax error.
Once you’ve entered your partition keys, you can run the statement within the Athena console to create your partitioned table.
Step 5: Register partition locations
After creating the partitioned Athena table, you can register each partition and its respective S3 bucket location, using an
ALTER TABLE statement.
The following example statement will register a partition. Replace the placeholders with your values.
In this statement, I logically registered
'01' to a physical S3 location. Using this statement as a template, you can register as many partitions as you need.
Step 6: Query using partitions
Now you can modify the query in Step 3, updating the
WHERE clause to limit the search to specific partitions. Your query will look like the following example. Modify the table name (LINE 6) and partitions (LINES 9 to 11) with your own values.
When you execute this query, you’ll see that it runs much faster and scans less data overall.
Step 7: Estimate cost
With the query successfully returning the number of rule evaluations per AWS Config rule, you can expand the
SELECT statement to include a cost column:
count(compliance_type)* 0.001 as cost
0.001 ($0.001 per evaluation) is based on the lowest pricing tier from current AWS Config pricing. Your total number of evaluations may put you on a different pricing tier. You should update this value accordingly. The entire
SELECT statement (including cost) will look like this:
Note: Due to factors such as tiered pricing, these are estimated costs. Also remember that Custom rules are executed as AWS Lambda functions in your account and standard Lambda execution costs also apply. These additional costs are not considered within the query above.
Enhancements and next steps
Here are some ideas to get started:
Visualize the queries using Amazon Quicksight
Using the same query, you can create an Amazon QuickSight dashboard to visualize this data over time.
In this example, I visualized the month-over-month change in rule evaluations, the estimated cost for the last three months, and the number of evaluations per day in the last month.
Figure 4: QuickSight dashboard graphs
Build queries for recorded configuration items
You can build similar queries to determine the number of recorded Configuration Items (and to estimate their cost). Configuration items are stored in a different S3 bucket. For more information, see the Identifying resources with the most configuration changes using AWS Config blog post.
In this example, I visualized the number of recorded Configuration Items, grouped by resource type, in the last month.
Figure 5: QuickSight dashboard pie chart
Automate partition creation
In this post, you registered partitions manually (Step 5). You can automate the registration of partitions by creating a custom Lambda function and scheduling it to execute the
ALTER TABLE statement every month.
In this blog post, I’ve demonstrated how you can use Amazon Athena to query AWS CloudTrail logs and obtain a count of AWS Config rule evaluations by rule name. Now go ahead and experiment! Athena is a powerful tool that can help you gather all sorts of insights on your data without the need to transform it.