AWS Batch Data Collection

template43 3 1

Today’s post is my compiled set of notes on AWS Batch Data Collection. This will be a part of a series of my prep notes for the AWS Certified Data Analytics – Specialty certification.

We wanted to share the learning journey and contribute to the wider community interested in learning more about data, analytics and cloud. Read more about why I decided to take an AWS Certification as Data Product Manager on my previous post: AWS Specialty Certification as a Product Manager

The infographic below shows the different AWS products that can be used for orchestrate, collect and process large amounts of data into AWS.

BATCH DATA TRANSFEROnline
AWS DataSync
Transfer data over public internet;
Supports: NFS shares, SMB, HDFS, S3, Snowcone (online), Amazon EFS and more.
Features
Moving data between AWS services/locations;
Moving <10 TB of data.
Use Cases
Hybrid
AWS Direct Connect
Direct, dedicated on premises link: 1/10 Gbps;
Pay per lease time and data out of AWS;
Minimal latency; Increased security
Features
Moving onprem data between into AWS;
Moving between 10 TB and 100 TB of data. 
Use Cases
Offline
AWS Snowfamily
Features
Offline transfer of physical data containers;
Snowcone – 8 TB; data transfer, edge computing, edge storage
Snowball – 80 TB; data transfer, edge storage
Snowball Edge – 42 TB; Edge computing, data transfer
Snowmobile – 100 PB data transfer
Use Cases
Offline/limited connectivity scenarios
>100 TB of data
ORCHESTRATIONDatabase Migration Service
Data Migration – migrate commercial and open-source DBs and DWs into AWS;
DB/DW Replication 
No DB downtime during migration
Features
Migrate Applications 
Archive Old Data 
Upgrade 
Migrate Datastores 
Keep data in sync
Replications
Use Cases
Data Pipeline
Wizard for easy ETL creation;
Evaluates preconditions and runs activities over several possible data sources at a given schedule; 
Has a lot of crossover with Lambda/Step functions
Features
Move tables for analytics/BI Tools;
Remotely execute stored procedures;
Not great for complex transformations.
Use Cases
Lambda/Step Functions
Features
Flexible orchestration for serverless applications
Visual Workflow configuration for complex ETL jobs
Use Cases
Complex data transformation
Reusage of ETL transformations
Complex data workflow management
Not great for: long transformations (per invocation step), big invocation payloa and non-AWS service sources (see lambda limits)