Data Stack Checklist (and AWS Analytics Specialty Certification domain overview)

We have created this page as short summary of the description of each domain covered in the AWS Analytics Specialty Certification. While doing this exercise we realized that the list of domains covered on the AWS exam is also a great checklist of items to review on any data stack architecture. So much so, that we decided to use this in the page name. Read more about the AWS Analytics Specialty Certification here.

Data Stack Building Blocks
Data Stack Building Blocks
Collection

Overview of the techonologies to be used across, all the aspects about the landing of data. List of architectural and operational requirements and features of the collection system:

  • Data loss tolerance
  • Data transfer throughput, latency tolerance and cost
  • Data persistance and recovery
  • Flow characteristics of incoming data: streaming, transactional, batch
  • Scalability
  • Connectivity
  • Order, format, encryption and compression
Storage and Data Management

Overview of the techonologies to be used across, all the aspects about the storing of data. List of the architectural and operational requirements and features of the storage system:

  • Performance
  • Cost
  • Durability
  • Reliability
  • Consistency
  • Reading/Writting latency
  • Data access and retrieval patterns
  • Database Structure: Schema, Layout and Partitioning
  • Retention Policy
  • Data Cataloging and Metadata Management

Processing

Overview of the techonologies to be used across, all the aspects about the processing of data. List of the architectural and operational requirements and features of the systems that transform, aggregate and load data:

  • Automation: Workflow Creation and Orchestration
  • Cost
  • Scalability
  • Availability
  • Replication
  • Processing Concurrency
  • Aggregation Performance
  • Data and Workflow Recovery
  • Logging and Monitoring
Analysis and Visualization

Overview of the techonologies to be used across, all the aspects about the analysis abd visualization of data. List of the architectural and operational requirements and features of the systems that allow you to retrieve meaningful and actionable insights from data:

  • Performance: consume static data vs. consume dynamic data; in-memory vs. direct access
  • Analysis requirements:
    • Availability
    • Scalability
    • Cost
    • Failover Recovery
    • Fault Tolerance
    • Delivery: Streaming vs. Interactive vs. Collaborative vs. Operational
  • Visualization Requirements:
    • Dynamic vs Static
    • Output: Metrics, KPIs, Graphical, Tabular, API
    • Delivery: Web, Mobile, Email, Collaborative
    • Refresh Schedule
Security

List of the requirements and features to secure data access and transit:

  • Authentication: Federated Access, SSO and IAM
  • Authorization Methods: Policies, ACL, Table/Column Level Permissions
  • Access Control Mechanisms: Security Groups, Role-Based Control
  • Data Masking and Ofuscation
  • Encrition Approaches (at-rest and in-transit): Server-Side Encryption, Client-side Encryption, AWS KMS, AWS CloudHSM
  • Key Rotation and Secrets Management
  • Data Governing and Compliance controls