Advanced S3

S3 MFA-Delete

• MFA (multi factor authentication) forces user to generate a code on a device (usually a mobile phone or hardware) before doing important operations on S3

• To use MFA-Delete, enable Versioning on the S3 bucket

• You will need MFA to

• permanently delete an object version

• suspend versioning on the bucket

• You won’t need MFA for

• enabling versioning

• listing deleted versions

• Only the bucket owner (root account) can enable/disable MFA-Delete

MFA-Delete currently can only be enabled using the CLI

S3 Default Encryption vs Bucket Policies

• One way to “force encryption” is to use a bucket policy and refuse any API call to PUT an S3 object without encryption headers:

• Another way is to use the “default encryption” option in S3

• Note: Bucket Policies are evaluated before “default encryption

S3 Access Logs

• For audit purpose, you may want to log all access to S3 buckets

• Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket

• That data can be analyzed using data analysis tools…

• Or Amazon Athena as we’ll see later in this section!

• The log format is at:

https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFo rmat.html

if you want to create access log for any bucket you must use the another bucket because logs send from one bucket to another

S3 Replication (CRR & SRR)

• Must enable versioning in source and destination

• Cross Region Replication (CRR)

• Same Region Replication (SRR)

• Buckets can be in different accounts

• Copying is asynchronous

• Must give proper IAM permissions to S3

• CRR - Use cases: compliance, lower latency access, replication across accounts

• SRR – Use cases: log aggregation, live replication between production and test accounts

S3 Replication – Notes

• After activating, only new objects are replicated

• Optionally, you can replicate existing objects using S3 Batch Replication

• Replicates existing objects and objects that failed replication

• For DELETE operations:

• Can replicate delete markers from source to target (optional setting)

• Deletions with a version ID are not replicated (to avoid malicious deletes)

• There is no “chaining” of replication

• If bucket 1 has replication into bucket 2, which has replication into bucket 3

• Then objects created in bucket 1 are not replicated to bucket 3

you create 2 bucket one bucket is origin other is replica and you configure orign bucket

you can go managment console in the bucket

S3 Pre-Signed URLs

• Can generate pre-signed URLs using SDK or CLI

• For downloads (easy, can use the CLI)

• For uploads (harder, must use the SDK)

• Valid for a default of 3600 seconds, can change timeout with --expires-in [TIME_BY_SECONDS] argument

• Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT

• Examples :

Allow only logged-in users to download a premium video on your S3 bucket

• Allow an ever changing list of users to download files by generating URLs dynamically

• Allow temporarily a user to upload a file to a precise location in our bucket

S3 Storage Classes

• Amazon S3 Standard - General Purpose

• Amazon S3 Standard-Infrequent Access (IA)

• Amazon S3 One Zone-Infrequent Access

• Amazon S3 Glacier Instant Retrieval

• Amazon S3 Glacier Flexible Retrieval

• Amazon S3 Glacier Deep Archive

• Amazon S3 Intelligent Tiering

• Can move between classes manually or using S3 Lifecycle configurations

S3 Durability and Availability

• Durability:

• High durability (99.999999999%, 11 9’s) of objects across multiple AZ

• If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years

• Same for all storage classes

• Availability:

• Measures how readily available a service is

• Varies depending on storage class

• Example: S3 standard has 99.99% availability = not available 53 minutes a year

S3 Standard – General Purpose

• 99.99% Availability

• Used for frequently accessed data

• Low latency and high throughput

• Sustain 2 concurrent facility failures

Use Cases: Big Data analytics, mobile & gaming applications, content distribution…

S3 Storage Classes – Infrequent Access

• For data that is less frequently accessed, but requires rapid access when needed

• Lower cost than S3 Standard

• Amazon S3 Standard-Infrequent Access (S3 Standard-IA)

• 99.9% Availability

• Use cases: Disaster Recovery, backups

• Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)

• High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed

• 99.5% Availability

• Use Cases: Storing secondary backup copies of on-premises data, or data you can recreate

Amazon S3 Glacier Storage Classes

• Low-cost object storage meant for archiving / backup

• Pricing: price for storage + object retrieval cost

• Amazon S3 Glacier Instant Retrieval

• Millisecond retrieval, great for data accessed once a quarter

• Minimum storage duration of 90 days

• Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier):

• Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free

• Minimum storage duration of 90 days

• Amazon S3 Glacier Deep Archive – for long term storage:

• Standard (12 hours), Bulk (48 hours)

• Minimum storage duration of 180 days

S3 Intelligent-Tiering

• Small monthly monitoring and auto-tiering fee

• Moves objects automatically between Access Tiers based on usage

• There are no retrieval charges in S3 Intelligent-Tiering

• Frequent Access tier (automatic): default tier

• Infrequent Access tier (automatic): objects not accessed for 30 days

• Archive Instant Access tier (automatic): objects not accessed for 90 days

• Archive Access tier (optional): configurable from 90 days to 700+ days

• Deep Archive Access tier (optional): config. from 180 days to 700+ days

S3 Lifecycle Rules

• Transition actions: It defines when objects are transitioned to another storage class.

• Move objects to Standard IA class 60 days after creation

• Move to Glacier for archiving after 6 months

• Expiration actions: configure objects to expire (delete) after some time

• Access log files can be set to delete after a 365 days

• Can be used to delete old versions of files (if versioning is enabled)

[• Can be used to delete incomplete multi-part uploads

• Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*)

• Rules can be created for certain objects tags (ex - Department: Finance)

S3 Lifecycle Rules – Scenario 1

• Your application on EC2 creates images thumbnails after profile photos are uploaded to Amazon S3. These thumbnails can be easily recreated, and only need to be kept for 45 days. The source images should be able to be immediately retrieved for these 45 days, and afterwards, the user can wait up to 6 hours. How would you design this?

• S3 source images can be on STANDARD, with a lifecycle configuration to transition them to GLACIER after 45 days.

• S3 thumbnails can be on ONEZONE_IA, with a lifecycle configuration to expire them (delete them) after 45 days.

S3 Lifecycle Rules – Scenario 2

• A rule in your company states that you should be able to recover your deleted S3 objects immediately for 15 days, although this may happen rarely. After this time, and for up to 365 days, deleted objects should be recoverable within 48 hours.

• You need to enable S3 versioning in order to have object versions, so that “deleted objects” are in fact hidden by a “delete marker” and can be recovered

• You can transition these “noncurrent versions” of the object to S3_IA

• You can transition afterwards these “noncurrent versions” to DEEP_ARCHIVE

S3 Analytics – Storage Class Analysis

• You can setup S3 Analytics to help determine when to transition objects from Standard to Standard_IA

• Does not work for ONEZONE_IA or GLACIER

• Report is updated daily

• Takes about 24h to 48h hours to first start

• Good first step to put together Lifecycle Rules (or improve them)!

S3 – Baseline Performance

• Amazon S3 automatically scales to high request rates, latency 100-200 ms

• Your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.

• There are no limits to the number of prefixes in a bucket.

• Example (object path => prefix):

• bucket/folder1/sub1/file => /folder1/sub1/

• bucket/folder1/sub2/file => /folder1/sub2/

• bucket/1/file => /1/

• bucket/2/file => /2/

• If you spread reads across all four prefixes evenly, you can achieve 22,000 requests per second for GET and HEAD

S3 – KMS Limitation

• If you use SSE-KMS, you may be impacted by the KMS limits

• When you upload, it calls the GenerateDataKey KMS API

• When you download, it calls the Decrypt KMS API

• Count towards the KMS quota per second (5500, 10000, 30000 req/s based on region)

• You can request a quota increase using the Service Quotas Console

S3 Select & Glacier Select

• Retrieve less data using SQL by performing server side filtering

• Can filter by rows & columns (simple SQL statements)

• Less network transfer, less CPU cost client-side

S3 Event Notifications

• S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication…

• Object name filtering possible (*.jpg)

• Use case: generate thumbnails of images uploaded to S3

• Can create as many “S3 events” as desired

• S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer

S3 Event Notifications with Amazon EventBridge

• Advanced filtering options with JSON rules (metadata, object size, name...)

• Multiple Destinations – ex Step Functions, Kinesis Streams / Firehose…

• EventBridge Capabilities – Archive, Replay Events, Reliable delivery

S3 – Requester Pays

• In general, bucket owners pay for all Amazon S3 storage and data transfer costs associated with their bucket

With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket

• Helpful when you want to share large datasets with other accounts

• The requester must be authenticated in AWS (cannot be anonymous)

Amazon Athena

Serverless query service to perform analytics against S3 objects

• Uses standard SQL language to query the files

• Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)

• Pricing: $5.00 per TB of data scanned

• Use compressed or columnar data for cost-savings (less scan)

• Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...

• Exam Tip: analyze data in S3 using serverless SQL, use Athena

Glacier Vault Lock

• Adopt a WORM (Write Once Read Many) model

• Lock the policy for future edits (can no longer be changed)

• Helpful for compliance and data retention

S3 Object Lock (versioning must be enabled)

• Adopt a WORM (Write Once Read Many) model

• Block an object version deletion for a specified amount of time

• Object retention:

• Retention Period: specifies a fixed period

• Legal Hold: same protection, no expiry date

• Modes:

• Governance mode: users can't overwrite or delete an object version or alter its lock settings unless they have special permissions

• Compliance mode: a protected object version can't be overwritten or deleted by any user, including the root user in your AWS account. When an object is locked in compliance mode, its retention mode can't be changed, and its retention period can't be shortened

Last updated