AWS Storage Extras
AWS Snow Family
• Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS

Data Migrations with AWS Snow Family
Challenges:
• Limited connectivity
• Limited bandwidth
• High network cost
• Shared bandwidth (can’t maximize the line)
• Connection stability
AWS Snow Family: offline devices to perform data migrations If it takes more than a week to transfer over the network, use Snowball devices!


Snowball Edge (for data transfers)
• Physical data transport solution: move TBs or PBs of data in or out of AWS
• Alternative to moving data over the network (and paying network fees)
• Pay per data transfer job
• Provide block storage and Amazon S3 -compatible object storage
• Snowball Edge Storage Optimized
• 80 TB of HDD capacity for block volume and S3 compatible object storage
• Snowball Edge Compute Optimized
• 42 TB of HDD capacity for block volume and S3 compatible object storage
• Use cases: large data cloud migrations, DC decommission, disaster recovery
AWS Snowcone
• Small, portable computing, anywhere, rugged & secure, withstands harsh environments
• Light (4.5 pounds, 2.1 kg)
• Device used for edge computing, storage, and data transfer
• 8 TBs of usable storage
• Use Snowcone where Snowball does not fit (space-constrained environment)
• Must provide your own battery / cables
• Can be sent back to AWS offline, or connect it to internet and use AWS DataSync to send data
AWS Snowmobile
• Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs)
• Each Snowmobile has 100 PB of capacity (use multiple in parallel)
• High security: temperature controlled, GPS, 24/7 video surveillance
• Better than Snowball if you transfer more than 10 PB

Snow Family – Usage Process
Request Snowball devices from the AWS console for delivery
Install the snowball client / AWS OpsHub on your servers
Connect the snowball to your servers and copy files using the client
Ship back the device when you’re done (goes to the right AWS facility)
Data will be loaded into an S3 bucket
Snowball is completely wiped
What is Edge Computing?
• Process data while it’s being created on an edge location
• A truck on the road, a ship on the sea, a mining station underground.
• These locations may have
• Limited / no internet access
• Limited / no easy access to computing power
• We setup a Snowball Edge / Snowcone device to do edge computing
• Use cases of Edge Computing:
• Preprocess data
• Machine learning at the edge
• Transcoding media streams
• Eventually (if need be) we can ship back the device to AWS (for transferring data for example)
Snow Family – Edge Computing
• Snowcone (smaller)
• 2 CPUs, 4 GB of memory, wired or wireless access
• USB-C power using a cord or the optional battery
• Snowball Edge – Compute Optimized
• 52 vCPUs, 208 GiB of RAM
• Optional GPU (useful for video processing or machine learning)
• 42 TB usable storage
• Snowball Edge – Storage Optimized
• Up to 40 vCPUs, 80 GiB of RAM
• Object storage clustering available
• All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass)
• Long-term deployment options: 1 and 3 years discounted pricing
AWS OpsHub
• Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool)
• Today, you can use AWS OpsHub (a software you install on your computer / laptop) to manage your Snow Family Device
• Unlocking and configuring single or clustered devices
• Transferring files
• Launching and managing instances running on Snow Family Devices
• Monitor device metrics (storage capacity, active instances on your device)
• Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS))
https://aws.amazon.com/blogs/aws/aws-snowball-edge-update/

Solution Architecture: Snowball into Glacier
• Snowball cannot import to Glacier directly
• You must use Amazon S3 first, in combination with an S3 lifecycle policy


Amazon FSx for Windows (File Server)
• EFS is a shared POSIX system for Linux systems.
• FSx for Windows is a fully managed Windows file system share drive
• Supports SMB protocol & Windows NTFS
• Microsoft Active Directory integration, ACLs, user quotas
• Can be mounted on Linux EC2 instances
• Scale up to 10s of GB/s, millions of IOPS, 100s PB of data
• Storage Options:
• SSD – latency sensitive workloads (databases, media processing, data analytics, …)
• HDD – broad spectrum of workloads (home directory, CMS, …)
• Can be accessed from your on-premises infrastructure (VPN or Direct Connect)
• Can be configured to be Müulti-AZ (high availability)
• Data is backed-up daily to S3
Amazon FSx for Lustre
• Lustre is a type of parallel distributed file system, for large-scale computing
• The name Lustre is derived from “Linux” and “cluster
• Machine Learning, High Performance Computing (HPC)
• Video Processing, Financial Modeling, Electronic Design Automation
• Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
• Storage Options:
• SSD – low-latency, IOPS intensive workloads, small & random file operations
• HDD – throughput-intensive workloads, large & sequential file operations
• Seamless integration with S3
• Can “read S3” as a file system (through FSx)
• Can write the output of the computations back to S3 (through FSx)
• Can be used from on-premises servers (VPN or Direct Connect)
FSx File System Deployment Options
• Scratch File System
• Temporary storage
• Data is not replicated (doesn’t persist if file server fails)
• High burst (6x faster, 200MBps per TiB)
• Usage: short-term processing, optimize costs
• Persistent File System
• Long-term storage
• Data is replicated within same AZ
• Replace failed files within minutes
• Usage: long-term processing, sensitive data

Hybrid Cloud for Storage
• AWS is pushing for ”hybrid cloud”
• Part of your infrastructure is on the cloud
• Part of your infrastructure is on-premises
• This can be due to
• Long cloud migrations
• Security requirements
• Compliance requirements
• IT strategy
• S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premises?
• AWS Storage Gateway

AWS Storage Gateway
• Bridge between on-premises data and cloud data in S3
• Use cases: disaster recovery, backup & restore, tiered storage
• 3 types of Storage Gateway:
• File Gateway
• Volume Gateway
• Tape Gateway
• Exam Tip: You need to know the differences between all 3!

File Gateway
• Configured S3 buckets are accessible using the NFS and SMB protocol
• Supports S3 standard, S3 IA, S3 One Zone IA
• Bucket access using IAM roles for each File Gateway
• Most recently used data is cached in the file gateway
• Can be mounted on many servers
• Integrated with Active Directory (AD) for user authentication

Volume Gateway
• Block storage using iSCSI protocol backed by S3
• Backed by EBS snapshots which can help restore on-premises volumes!
• Cached volumes: low latency access to most recent data
• Stored volumes: entire dataset is on premise, scheduled backups to S3

Tape Gateway
• Some companies have backup processes using physical tapes (!)
• With Tape Gateway, companies use the same processes but, in the cloud
• Virtual Tape Library (VTL) backed by Amazon S3 and Glacier
• Back up data using existing tape-based processes (and iSCSI interface)
• Works with leading backup software vendors

Storage Gateway – Hardware Appliance
• Using Storage Gateway means you need on-premises virtualization
• Otherwise, you can use a Storage Gateway Hardware Appliance
• You can buy it on amazon.com
• Works with File Gateway, Volume Gateway, Tape Gateway
• Has the required CPU, memory, network, SSD cache resources
• Helpful for daily NFS backups in small data centers

AWS Storage Gateway Summary
• Exam tip: Read the question well, it will hint at which gateway to use
• On-premises data to the cloud => Storage Gateway
• File access / NFS – user auth with Active Directory => File Gateway (backed by S3)
• Volumes / Block Storage / iSCSI => Volume gateway (backed by S3 with EBS snapshots)
• VTL Tape solution / Backup with iSCSI = > Tape Gateway (backed by S3 and Glacier)
• No on-premises virtualization => Hardware Appliance
Amazon FSx File Gateway
• Native access to Amazon FSx for Windows File Server
• Local cache for frequently accessed data
• Windows native compatibility (SMB, NTFS, Active Directory...)
• Useful for group file shares and home directories

AWS Transfer Family
• A fully-managed service for file transfers into and out of Amazon S3 or Amazon EFS using the FTP protocol
• Supported Protocols
• AWS Transfer for FTP (File Transfer Protocol (FTP))
• AWS Transfer for FTPS (File Transfer Protocol over SSL (FTPS))
• AWS Transfer for SFTP (Secure File Transfer Protocol (SFTP))
• Managed infrastructure, Scalable, Reliable, Highly Available (multi-AZ)
• Pay per provisioned endpoint per hour + data transfers in GB
• Store and manage users’ credentials within the service
• Integrate with existing authentication systems (Microsoft Active Directory, LDAP, Okta, Amazon Cognito, custom)
• Usage: sharing files, public datasets, CRM, ERP,

Storage Comparison
• S3: Object Storage
• Glacier: Object Archival
• EFS: Network File System for Linux instances, POSIX filesystem
• FSx for Windows: Network File System for Windows servers
• FSx for Lustre: High Performance Computing Linux file system
• EBS volumes: Network storage for one EC2 instance at a time
• Instance Storage: Physical storage for your EC2 instance (high IOPS)
• Storage Gateway: File Gateway, Volume Gateway (cache & stored), Tape Gateway
• Snowball / Snowmobile: to move large amount of data to the cloud, physically
• Database: for specific workloads, usually with indexing and querying
Last updated