Given the infinite nature of data, finding the right data set to gain business insights can be a challenge. You can improve your business by having access to a central repository of various data sets to query, visualize, and forecast. With AWS Data Exchange, finding the right data set has become much simpler. As an example, you can use data sets on web session visitation and demographics to help you understand which demographic groups visit your website most frequently. You can then improve your business through machine learning (ML) models and visitation forecasts.

AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. After you subscribe to a data product within AWS Data Exchange, you can use the AWS Data Exchange API, AWS CLI, or the AWS Management Console to load data into Amazon S3 directly. You can then analyze the imported data with a wide variety of AWS services, ranging from analytics to machine learning.

This post showcases TruFactor Intelligence-as-a-Service data on AWS Data Exchange. TruFactor’s anonymization platform and proprietary AI ingests, filters, and transforms more than 85 billion high-quality raw signals daily from wireless carriers, OEMs, and mobile apps into a unified phygital consumer graph across physical and digital dimensions. TruFactor intelligence is application-ready for use within any AWS analytics or ML service to power your models and applications running on AWS, with no additional processing required. Common use cases include the following:

  • Consumer segmentation – Web intelligence on internet browsing behavior in the US provides a complete view of the consumer, including interests, opinions, values, digital behavior, and sentiment, to inform segmentation of your customers and those of your competitors.
  • Customer acquisition or churn campaigns – Internet browsing behavior can identify affinity properties for new prospects as well as switching to competitors’ websites.

This walkthrough uses TruFactor’s Daily Mobile Web Session Index and Daily Demographics by Mobile Web Sessions data sets, which are both available for free subscription through the AWS Data Exchange console. While there are commercial data sets available for purchase in AWS Data Exchange, this post uses trial data sets to showcase the breadth and depth of analytics possible with TruFactor’s intelligence.

This TruFactor intelligence is aggregated on over 3 billion records from telco carrier networks and mobile apps per day, originating from approximately 30 million consistent users, distilled into session-level information that provides a complete view of user digital interests. The accuracy, breadth of data provided, and the persistency of the panel deliver a unified view of consumers that can inform insights or power analytic models or applications on AWS.

These two data sets have applications across verticals such as retail, financial services, and advertising. Common use cases include creating detailed customer segmentation (for example, full DNA maps of consumers based on visits to specific web HTTP hosts), identifying affinity properties, and estimating demand for apps or services. This intelligence is also ideal for identifying trends and changes over time.

Solution overview

The following diagram illustrates the architecture of the solution.

The workflow is comprised of the following steps:

  1. Subscribe to a data set from AWS Data Exchange and export to Amazon S3
  2. Run an AWS Glue crawler to load product data
  3. Perform queries with Amazon Athena
  4. Visualize the queries and tables with Amazon QuickSight
  5. Run an ETL job with AWS Glue
  6. Create a time series forecast with Amazon Forecast
  7. Visualize the forecasted data with Amazon QuickSight

This post looks at the demographic distributions across various websites and how to use ML to forecast website visitation.

Walkthrough overview

The walkthrough includes the following steps:

  1. Subscribe to a TruFactor data set from the AWS Data Exchange console and export the data set to Amazon S3
  2. Use an AWS Glue crawler to load the product data into an AWS Glue Data Catalog
  3. Use Amazon Athena for SQL querying
  4. Visualize the query views and tables with Amazon QuickSight
  5. Use AWS Glue jobs to extract, transform, and load your data for forecasting with Amazon Forecast
  6. Use Amazon Forecast to create a time series forecast of the transformed data
  7. Visualize the forecasted web visitation data with Amazon QuickSight

You do not have to perform additional processing or manipulation of the TruFactor intelligence for this walkthrough.

The data sets

The TruFactor data sets this post uses are in Parquet format and snappy compression. The following section provides additional details and schema for each data set.

TruFactor Daily Mobile Web Session Index (US – Nationwide) — Trial

The TruFactor Daily Mobile Web Session Index (US – Nationwide) — Trial data set provides aggregate information per HTTP host as a view of the internet browsing behavior in the US. TruFactor generates the data from high-quality packet layer data sourced from mobile carriers that includes the mobile internet traffic originating from a user’s device. TruFactor derives the projected counts from observed counts that are filtered for exclusion and anonymized to make sure users cannot be re-identified. It extrapolates values from US Census data using a proprietary algorithm. For the avoidance of doubt, this data set does not include user-level data.

The following screenshot shows the schema for the mobile web session data set by HTTP host, session time, MB transferred, number of events, sessions, users, and dates.

TruFactor Daily Demographics by Mobile Web Session (US) — Trial

The TruFactor Daily Demographics by Mobile Web Session (US) — Trial data set includes aggregate demographics: a projected distribution of users per HTTP host as a view of the internet browsing behavior in the US. TruFactor generates the data from high-quality packet layer data sourced from mobile carriers that includes the mobile internet traffic originating from a user’s device. TruFactor derives the distribution from observed counts that are filtered for exclusion and anonymized to make sure users cannot be re-identified. It extrapolates values from US Census data using a proprietary algorithm. Demographics include gender, age range, ethnicity, and income range.

The following screenshot shows the partial schema for the demographics by web session data set. The full schema includes the following attributes: HTTP host, age ranges, genders, ethnicity, income ranges, and date.

Prerequisites

To complete this walkthrough successfully, you must have the following resources:

  • An AWS account.
  • Familiarity with AWS core services and concepts.
  • The ability to launch new resources in your account. Some resources may not be eligible for Free Tier usage and might incur costs.
  • Subscription to TruFactor’s Daily Mobile Web Session Index (US – Nationwide) – Trial and Daily Demographics by Mobile Web Session (US) – Trial data sets. For instructions on subscribing to a data set on AWS Data Exchange, see AWS Data Exchange – Find, Subscribe To, and Use Data Products.

Using AWS Data Exchange, Amazon S3, AWS Glue, Amazon Athena, and Amazon QuickSight

This section examines the key demographics of visitors to the top seven e-commerce websites. This information can help you understand which demographic groups are visiting your website most frequently and also help you target ads and cater to certain demographics groups. You use AWS Glue crawlers to crawl your data sets in Amazon S3, populate your AWS Glue Data Catalog, query the AWS Glue Data Catalog using Amazon Athena, and use Amazon QuickSight to visualize the queries.

Step 1: Exporting the data from AWS Data Exchange to Amazon S3

To export your TruFactor data set subscriptions into an Amazon S3 bucket, complete the following steps:

  1. Create an Amazon S3 bucket in your working account. For the purposes of our demo, we have named our S3 bucket trufactor-data-exchange-bucket.
  2. Create two folders within the S3 bucket: web_sess and demo_by_web_sess.

This post uses a trial data set with a sample of 14 days. A paid subscription to TruFactor’s Web Sessions data on AWS Data Exchange includes 6 months of historical data, which refreshes daily.

The following screenshot shows the two folders within the S3 bucket.You are now ready to export the data sets.

  1. On the AWS Data Exchange console, under Subscriptions, locate TruFactor Daily Mobile Web Sessions Index (US – Nationwide) – Trial.
  2. Under Revisions, choose the most recent Revision ID.
  3. Choose all assets except the manifest.json files.
  4. Choose Export to Amazon S3.
  5. In the window that opens, choose the S3 bucket and folder to export the product data into.
    • Export all the assets into the S3 bucket’s web_sess folder.
  6. Repeat the previous steps for the TruFactor Daily Demographics by Mobile Web Sessions (US) – Trial data set, with the following change:
    • Export the assets into the demo_by_web_sess folder in your S3 bucket.
  7. Check to make sure you successfully imported the TruFactor data sets in the Overview. The following screenshot shows that the data sets are partitioned into folders by date. Each folder contains Parquet files of web session data for each day.

Step 2: Populating your AWS Glue Data Catalog with the TruFactor data sets

Now that you have successfully exported the TruFactor data sets into an Amazon S3 bucket, you create and run an AWS Glue crawler to crawl your Amazon S3 bucket and populate the AWS Glue Data Catalog. Complete the following steps:

  1. On the AWS Glue console, under Data Catalog, choose Crawlers.
  2. Choose Add crawler.
  3. For Crawler name, enter a name; for example, trufactor-data-exchange-crawler.
  4. For Crawler source type, choose Data stores.
  5. Choose Next.
  6. For Choose a data store, choose S3.
  7. For Crawl data in, select Specified path in my account.
  8. For Include path, enter the path for the web_sess data set folder. The crawler points to the following path: s3://<trufactor-data-exchange-bucket>/web_sess.
  9. Choose Next.
  10. Select Yes to Add another data store.
  11. Choose Next.
  12. For Include path, enter the path for the demo_by_web_sess data set folder. The crawler points to the following path: s3://<trufactor-data-exchange-bucket>/demo_by_web_sess.
  13. Choose Next.
  14. In the Choose an IAM role section, select Create an IAM role. This is the role that the AWS Glue crawler and AWS Glue jobs use to access the Amazon S3 bucket and its content.
  15. For IAM role, enter the suffix demo-data-exchange.
  16. Choose Next.
  17. In the schedule section, leave the Frequency with the default Run on Demand.
  18. Choose Next.
  19. In the Output section, choose Add database.
  20. Enter a name for the database; for example, trufactor-db.
  21. Choose Next, then choose Finish.This database contains the tables that the crawler discovers and populates. With these data sets separated into different tables, you join and relationalize the data.
  1. In the Review all steps section, review the crawler settings and choose Finish.
  2. Under Data Catalog, choose Crawlers.
  3. Select the crawler you just created.
  4. Choose Run crawler.The AWS Glue crawler crawls the data sources and populates your AWS Glue Data Catalog. This process can take up to a few minutes. When the crawler is finished, you can see two tables added to your crawler details. See the following screenshot.You can now view your new tables.
  1. Under Databases, choose Tables.
  2. Choose your database.
  3. Choose View the tables. The table names correspond to the Amazon S3 folder directory you used to point your AWS Glue crawler. See the following screenshot.

Step 3: Querying the data using Amazon Athena

After you populate the AWS Glue Data Catalog with TruFactor’s Mobile Web Session and Demographics data, you can use Amazon Athena to run SQL queries and create views for visualization. Complete the following steps:

  1. On the Amazon Athena console, choose Query Editor.
  2. On the Database drop-down menu, choose the database you created.
  3. To preview one of the tables in Amazon Athena, choose Preview table.
    On the Results section, you should see 10 records from the web_sess table. See the following screenshot.In this next step, you run a query that creates a view of the Web Session Index and Demographics data across a group of e-commerce HTTP hosts. This is broken down by the percentage of users categorized by age and gender, number of users, MB transferred, and number of sessions ordered by date.
  4. Run the following SQL query in Amazon Athena:
    CREATE OR REPLACE VIEW e_commerce_web_sess_data AS SELECT "date_parse"("a"."partition_0", '%Y%m%d') "date", "a"."http_host", "a"."users", "a"."mb_transferred", "a"."number_of_sessions", "b"."18_to_25", "b"."26_to_35", "b"."36_to_45", "b"."46_to_55", "b"."56_to_65", "b"."66_to_75", "b"."76_plus", "b"."male", "b"."female"
    FROM (( SELECT "partition_0", "http_host", "users", "mb_transferred", "number_of_sessions" FROM "trufactor-db"."web_sess" WHERE ("http_host" IN ('www.amazon.com', 'www.walmart.com', 'www.ebay.com', 'www.aliexpress.com', 'www.etsy.com', 'www.rakuten.com', 'www.craigslist.com'))
    ) a
    LEFT JOIN ( SELECT "http_host" "http_host_2", "partition_0" "partition_2", "age_ranges"."18_to_25", "age_ranges"."26_to_35", "age_ranges"."36_to_45", "age_ranges"."46_to_55", "age_ranges"."56_to_65", "age_ranges"."66_to_75", "age_ranges"."76_plus", "genders"."male", "genders"."female" FROM "trufactor-db"."demo_by_web_sess" WHERE ("http_host" IN ('www.amazon.com', 'www.walmart.com', 'www.ebay.com', 'www.aliexpress.com', 'www.etsy.com', 'www.rakuten.com', 'www.craigslist.com'))
    ) b ON (("a"."http_host" = "b"."http_host_2") AND ("a"."partition_0" = "b"."partition_2")))
    ORDER BY "date" ASC

  5. After you create the view, you can preview it by repeating the above steps for previewing a table. The following screenshot shows the results, which include the number of users, user percentages by age group and gender, and a list of e-commerce hosts listed by date.

Step 4: Visualizing with Amazon QuickSight

After you query your data sets in Amazon Athena, you can use Amazon QuickSight to visualize your results. You must first grant Amazon QuickSight access to the Amazon S3 bucket that holds the TruFactor data sets, which you can do through the Manage QuickSight setting on the Amazon QuickSight console. After you grant access to the Amazon S3 bucket, you visualize the tables and queries with Amazon QuickSight. Complete the following steps:

  1. In the Amazon QuickSight console, choose New Analysis.
  2. Choose New data set.
  3. Choose Athena as the data source.
  4. For Data source name, enter trufactor-data-exchange-source.
  5. From the drop-down menu, choose the database and view you created.
  6. Choose Directly query your data.
  7. Choose Visualize. Because TruFactor intelligence is application-ready, you can gain immediate insights by using Amazon Athena to query and Amazon QuickSight to visualize. This post includes visualizations of the data set for the first two weeks of October 2019. The following graph visualizes the number of users on different HTTP hosts.The following pie charts further filter the HTTP hosts by age range.The following bar chart offers another visualization of users by age range.You could add other fields such as income range, ethnicity, and gender.

Running AWS Glue Jobs and Amazon Forecast

This section discusses how to use AWS Glue jobs to query and export your data set for forecasting with Amazon Forecast. This walkthrough examines the amount of users’ visitation over 14 days across the top 50 HTTP hosts ranked by users’ visitation. From there, you forecast the users’ visitation for these HTTP hosts for the next three days.

Step 1: Creating and running an AWS Glue job

To create and run your AWS Glue job, complete the following steps:

  1. On the AWS Glue console, under ETL, choose Jobs.
  2. Choose Add job.
  3. For Name, enter a name for the AWS Glue job; for example, demo-glue-job.
  4. For Type and Glue version, keep the default values.
  5. For This job runs, select A new script to be authored by you.
  6. In the Security configuration, script libraries, and job parameters (optional) section, set the Maximum capacity cluster size to 2. This reduces the cost of running the AWS Glue job. By default, the cluster size is set to 10 Data Processing Units (DPU).
  7. Choose Next.
  8. In the Connections section, keep the default values.
  9. Choose Save job and edit script.
  10. Enter the following code in the script section, and replace YOUR_BUCKET_NAME on line 42 with the name of your bucket.
    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.dynamicframe import DynamicFrame
    from awsglue.job import Job
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import udf
    from pyspark.sql.types import StringType ## @params: [JOB_NAME]
    args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args) db_name = "trufactor-db"
    tbl_name = "web_sess" web_sess_dyf = glueContext.create_dynamic_frame.from_catalog(database = db_name, table_name = tbl_name, transformation_ctx = "web_sess_dyf")
    web_sess_df = web_sess_dyf.toDF()
    web_sess_df.createOrReplaceTempView("webSessionTable")
    web_sess_sql_df = spark.sql("""
    SELECT to_date(partition_0, 'yyyyMMdd') AS date, http_host, users
    FROM (SELECT partition_0, http_host, users, row_number() OVER ( PARTITION By partition_0 ORDER BY users DESC ) AS rn FROM webSessionTable )
    WHERE rn<=50
    ORDER BY date""") web_sess_sql_df.coalesce(1).write.format("csv").option("header","false").save("s3://YOUR_BUCKET_NAME/amazon_forecast_demo/dataset/sampleset")
    job.commit()

    This code queries the top 50 HTTP hosts, ranked by users’ visitation during the first half of October and returns the users, date, and HTTP hosts columns. The query results upload to your Amazon S3 bucket in CSV format (you need the files in CSV to use Amazon Forecast).

  11. Choose Save and close the AWS Glue job screen.Before you can run the AWS Glue job, you need to modify the IAM role associated with AWS Glue. Currently, the IAM role only has permission to get and put objects in the directories you specified earlier. You need to update the IAM policy to allow permission to get and put objects in all subdirectories of the Amazon S3 bucket.
  12. On the IAM console, choose the role you used for this walkthrough: AWSGlueServiceRole-demo-data-exchange.
  13. In the Summary section for the IAM role, on the Permissions tab, choose the IAM policy associated with the Managed policy.
  14. Choose Edit policy.
  15. Change the view from Visual editor to JSON.
  16. Within this JSON object, under Resource, add another resource into the list of values. The following code is the updated IAM policy:
    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::trufactor-data-exchange-bucket/web_sess*", "arn:aws:s3:::trufactor-data-exchange-bucket/demo_by_web_sess*", "arn:aws:s3:::trufactor-data-exchange-bucket/*" ] } ]
    }

  17. Choose Review policy and Save changes.
  18. On the AWS Glue console, under ETL, choose Jobs. Select the job you created earlier.
  19. From the Action drop-down menu, choose Run job. On the History tab, you can see when the status changes to Succeeded. See the following screenshot.This job can take 15–20 minutes to complete.

Step 2: Creating a dataset group, training a predictor, and creating forecasts in Amazon Forecast

To create your dataset group, train a predictor, and create forecasts, complete the following steps:

  1. On the Amazon Forecast console, choose View dataset groups.
  2. Choose Create dataset group.
  3. For Dataset group name, enter a name; for example, users_visitation_sample_dataset_group.
  4. For Forecasting domain, choose Web traffic.
  5. Choose Next.
  6. On the Create target time series dataset page, for Dataset name, enter the name of your dataset; for example, users_visitation_sample_dataset.
  7. For Frequency of your data, choose 1 day.
  8. For Data schema, update the data schema JSON object with the following code:
    { "Attributes":[ { "AttributeName": "timestamp", "AttributeType": "timestamp" }, { "AttributeName": "item_id", "AttributeType": "string" }, { "AttributeName": "value", "AttributeType": "float" } ]
    }

  9. Choose Next.
  10. On the Import target time series data page, for Dataset import name, enter your dataset name; for example, users_visitation_sample_dataset_import.
  11. For Timestamp format, enter yyyy-MM-dd.
  12. For IAM Role, create a new role and grant Amazon Forecast access to the S3 bucket that you are using for this demo.
  13. For Data Location, use the S3 path that you exported your CSV file to after the AWS Glue job: s3://<trufactor-data-exchange-bucket>/amazon_forecast_demo/dataset/sampleset.
  14. Review the settings for import target time series data and choose Start import.The process of importing the data can take approximately 10 minutes. When the status changes to Active, you can begin training a predictor.
  1. On the Dashboard page, choose Start next to Predictor training.
  2. On the Train predictor page, for Predictor name, enter a name for the predictor; for example, users_visitation_sample_dataset_predictor.
  3. For Forecast horizon, choose 3.
  4. For Forecast frequency, choose day.
  5. For Algorithm selection, select Manual. If you use the other algorithm option, AutoML, you allow Amazon Forecast to choose the right algorithm based on a pre-defined objective function, which is not necessary for this walkthrough.
  6. For Algorithm, choose Deep_AR_Plus (you use deep learning to forecast users’ visitation across 50 HTTP hosts).
  7. Leave all other options with the default values.
  8. Review the settings and choose Train predictor. The predictor training process can take 20–30 minutes. When the training completes, the status changes to Active. To evaluate the predictor’s (ML model) accuracy, Amazon Forecast splits the input time series data into two data sets: training and test. This process tests a predictive model on historical data and is called backtesting. When it splits the input time series data, it maintains the data’s order, which is crucial for time series data. After training the dataset, Amazon Forecast calculates the root mean square error (RSME) and weighted quantile losses to determine how well the predictor performed. For more detailed information about backtesting and predictor metrics, see Evaluating Predictor Accuracy. When the predictor is finished training, you can create a forecast.
  9. On the Dashboard page, under Generate forecasts, choose Start.
  10. For Forecast name, enter a forecast name; for example, users_visitation_sample_forecast.
  11. For Predictor, choose your trained predictor.
  12. For Forecast types, you can enter any values between 0.01 and 0.99 and the mean. These are percentage probabilities of satisfying the original demand. This post enters .50, .90, .99, mean.
  13. Choose Create a forecast.The forecast creation process can take 15–20 minutes.
  14. When the forecast is complete, choose Forecasts.
    You should see a single forecast. See the following screenshot.
    You can now export the generated forecast to a new folder within your existing Amazon S3 bucket for visualization with Amazon QuickSight.
  1. Choose the newly generated forecast.
  2. Under Exports, choose Create forecast export.
  3. For Export name, enter a name for the export; for example, users_visitation_sample_forecast_export.
  4. For Generated forecast, choose users_visitation_sample_forecast.
  5. For IAM Role, choose the role you created earlier.
  6. For S3 forecast export location, enter the S3 path to store the forecasts: s3://<trufactor-data-exchange-bucket>/amazon_forecast_demo/forecasts/sampleset.
  7. Choose Create forecast export.The exporting process can take up to 5 minutes. Alternatively, you can visualize the user visitation forecasts for the 50 HTTP hosts directly through the Amazon Forecast console or Query API.

Step 3: Querying a view using Amazon Athena and downloading the forecast file

Before you visualize users’ visitation forecast data, create a view in Amazon Athena for the top 50 HTTP hosts ranked by users’ visitation over 14 days. Complete the following steps:

  1. Run the following query in Amazon Athena:
    CREATE OR REPLACE VIEW "top_50_users" AS
    SELECT date_format(date_parse(partition_0, '%Y%m%d'),'%Y-%m-%d') AS "date", http_host, users
    FROM (SELECT partition_0, http_host, users, row_number() OVER (PARTITION By partition_0 ORDER BY users DESC ) AS rn FROM "trufactor-db"."web_sess")
    WHERE rn<=50
    ORDER BY date

    The code queries the top 50 HTTP hosts ranked by users’ visitation sorted by date.

  2. In the Amazon S3 console, navigate to the S3 bucket and directory holding the files: s3://<trufactor-data-exchange-bucket>/amazon_forecast_demo/forecasts/sampleset. The following screenshot shows three different files inside the folder.
  1. Download the CSV file.

Step 4: Visualizing in Amazon QuickSight

To visualize the data in Amazon QuickSight, complete the following steps:

  1. On the Amazon QuickSight console, choose Manage data.
  2. Choose New data set.
  3. Choose Upload a file.
  4. Upload the CSV file that you downloaded.
  5. On the Confirm file upload settings page, choose Next.
  6. Choose Visualize.
  7. Return to the Amazon QuickSight console and choose Manage data.
  8. Choose New data set for the top 50 HTTP hosts view you queried earlier.
  9. On the Create a Data set page, find the data source you created earlier: trufactor-data-exchange-source.
  10. From the drop-down list, choose the database and view you created.
  11. Choose Directly query your data.
  12. Choose Visualize.
  13. On the new Amazon QuickSight analysis page, choose the pencil icon next to Data set.
  14. Choose Add data set.
  15. Choose the CSV file you uploaded.

You now have a single Amazon QuickSight analysis with multiple data sets to visualize.

The following graphs highlight the historical data for the users’ visitation across 50 HTTP hosts for the first two weeks of October and the mean forecast for users’ visitation for the next three days.

The following graphs highlight the historical data and forecasted P50, P90, and P99 quantile values for www.google.com.

Amazon Forecast makes it easier to get started with machine learning without having to create your own ML models from scratch. You can use this information to anticipate the web traffic for the upcoming week, which can aid in scaling your resources and applications accordingly.

Cleaning up

To avoid incurring future charges, delete the following resources that you created in this walkthrough:

  • The Amazon S3 bucket trufactor-data-exchange-bucket
  • The AWS Glue crawler trufactor-data-exchange-crawler
  • The AWS Glue job demo-glue-job
  • The AWS IAM role AWSGlueServiceRole-demo-data-exchange
  • The AWS Glue database trufactor-db
  • The Amazon QuickSight demo data sets and analysis
  • The following Amazon Forecast resources (in this order) for users_visitation_sample_dataset_group via the console:
    • Existing forecasts under Forecasts
    • Existing predictors under Predictors
    • Existing datasets under Datasets

Conclusion

This walkthrough detailed how to import a data set to Amazon S3 from AWS Data Exchange, use AWS Glue to run crawlers and an ETL job on the data, run SQL queries with Amazon Athena, create a time series forecast of the queried data with Amazon Forecast, and visualize the queried and forecasted data with Amazon QuickSight.

This post used TruFactor Intelligence-as-a-Service, one of the AWS Data Exchange launch partners, to power this walkthrough. TruFactor intelligence on AWS Data Exchange highlighted the ease of loading directly into Amazon S3 and layering advanced AWS services.

For more information about TruFactor and the AWS Data Exchange, see TruFactor on AWS Data Exchange on the TruFactor website. You can subscribe to TruFactor Intelligence directly on AWS Data Exchange or engage with TruFactor directly to identify the right offering from the larger product portfolio of anonymized consumer intelligence.


About the Authors

jeprkJay Park is a solutions architect at AWS.

 

 

 

 

ArianaRAriana Rahgozar is a solutions architect at AWS.