Data & Analytics

β€’ Serverless query service to analyze data stored in Amazon S3

β€’ Uses standard SQL language to query the files (built on Presto)

β€’ Supports CSV, JSON, ORC, Avro, and Parquet

β€’ Pricing: $5.00 per TB of data scanned

β€’ Commonly used with Amazon Quicksight for reporting/dashboards

β€’ Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...

β€’ Exam Tip: analyze data in S3 using serverless SQL, use Athena

Amazon Athena – Performance Improvement

β€’ Use columnar data for cost-savings (less scan)

β€’ Apache Parquet or ORC is recommended

β€’ Huge performance improvement

β€’ Use Glue to convert your data your Parquet or ORC

β€’ Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd…)

β€’ Partition datasets in S3 for easy querying on virtual columns

β€’ s3://yourBucket/pathToTable

/<PARTITION_COLUMN_NAME>=

/<PARTITION_COLUMN_NAME>=

/<PARTITION_COLUMN_NAME>= /etc…

β€’ Example: s3://athena-examples/flight/parquet/year=1991/month=1/day=1/

β€’ Use larger files (> 128 MB) to minimize overhead

Last updated