β’ Serverless query service to analyze data stored in Amazon S3
β’ Uses standard SQL language to query the files (built on Presto)
β’ Supports CSV, JSON, ORC, Avro, and Parquet
β’ Pricing: $5.00 per TB of data scanned
β’ Commonly used with Amazon Quicksight for reporting/dashboards
β’ Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
β’ Exam Tip: analyze data in S3 using serverless SQL, use Athena
β’ Use columnar data for cost-savings (less scan)
β’ Apache Parquet or ORC is recommended
β’ Huge performance improvement
β’ Use Glue to convert your data your Parquet or ORC
β’ Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstdβ¦)
β’ Partition datasets in S3 for easy querying on virtual columns
β’ s3://yourBucket/pathToTable
/<PARTITION_COLUMN_NAME>=
/<PARTITION_COLUMN_NAME>=
/<PARTITION_COLUMN_NAME>= /etcβ¦
β’ Example: s3://athena-examples/flight/parquet/year=1991/month=1/day=1/
β’ Use larger files (> 128 MB) to minimize overhead