AWS Certified Machine Learning Engineer — 72 Hrs Study Guide
Complete 72-Hour Intensive Study Guide with Visual Diagrams & Cheat Sheet
⚡ LEARN IN 72 HOURS: This guide contains important details which you need to know for the AWS Certified Machine Learning Engineer Exam. Follow the intensive study plan, memorize the mnemonics, use the visual diagrams for quick reference, and review the cheat sheet before the exam. You've got this!
📊 Domain Breakdown & Weightings
| Domain | Weight | Focus Areas |
|---|---|---|
| Domain 1: Data Preparation | 28% | Storage, ETL, Feature Engineering, Data Quality |
| Domain 2: Model Development | 26% | Algorithms, Training, Tuning, Evaluation |
| Domain 3: Deployment | 22% | Endpoints, MLOps, Pipelines, Orchestration |
| Domain 4: Monitoring/Security | 24% | Model Monitor, CloudWatch, IAM, KMS, VPC |
🗓️ 72-Hour Intensive Study Schedule
Day 1 (24 Hours)
Hours 1-6: Domain 1 — Data Preparation (Morning Session)
⏰ Hours 1-2: Storage & Data Formats
- Memorize: S-KEFS-R (Storage), PAJRC (Data Formats)
- Learn: When to use S3 vs Kinesis vs Redshift
- Focus: Parquet vs RecordIO vs CSV - exam loves this!
- Practice: Create S3 bucket, upload data in different formats
⏰ Hours 3-4: ETL & Processing Services
- Memorize: GEEKS-DW (ETL Services)
- Learn: Glue vs EMR vs Data Wrangler decision tree
- Hands-on: Create Glue job, use Data Wrangler in SageMaker
- Key concept: When serverless (Glue) vs managed clusters (EMR)
⏰ Hours 5-6: Feature Engineering
- Memorize: BITES-NO (Feature Engineering), MOUD (Data Quality)
- Learn: SMOTE, imputation methods (mean/KNN/MICE)
- Critical: Handling imbalanced classes - VERY common on exam!
- Practice: Apply transformations in Data Wrangler
Hours 7-12: Domain 2 — Model Development (Afternoon Session)
⏰ Hours 7-9: SageMaker Algorithms
- Memorize: BLIINKS-FPXDR (ALL algorithms)
- CRITICAL: XGBoost, Linear Learner, DeepAR, RCF - most tested!
- Learn: Algorithm selection decision tree
- Practice: Train XGBoost model on sample data
⏰ Hours 10-11: Hyperparameter Tuning
- Memorize: BRAGS (Tuning strategies)
- Focus: Bayesian vs Random vs Grid - when to use each
- XGBoost params: num_round, eta, max_depth (memorize these!)
- Practice: Run hyperparameter tuning job
⏰ Hour 12: Evaluation Metrics
- Memorize: FARM-CAR (Classification + Regression metrics)
- Learn: F1 vs Accuracy vs ROC-AUC - when to use which
- Key: Imbalanced classes → F1 score, not accuracy!
Hours 13-18: Training & Practice (Evening Session)
⏰ Hours 13-14: Training Modes & Infrastructure
- Memorize: PFIS (Training modes)
- CRITICAL: Pipe Mode vs File Mode - VERY common question!
- Learn: Spot training, instance types (ml.p3, ml.c5)
- Cost optimization: Pipe + Spot = huge savings
⏰ Hours 15-18: Practice Questions
- Take practice exam: 30-40 questions on Domains 1-2
- Review wrong answers: Understand WHY you got them wrong
- Flashcards: Create for concepts you struggle with
- Write out mnemonics: 10 times each from memory
Hours 19-24: Review & Sleep (Night Session)
- Recite all mnemonics learned today
- Create mind map connecting Domain 1 → Domain 2
- Review visual diagrams (print them out!)
- SLEEP (4-5 hours minimum) - Your brain needs sleep to consolidate learning
Day 2 (24 Hours)
Hours 25-30: Domain 3 — Deployment (Morning Session)
⏰ Hours 25-27: Deployment Options
- Memorize: REBAS (Deployment types)
- SUPER CRITICAL: Real-time vs Serverless vs Batch vs Async
- Learn: Deployment decision tree - memorize this cold!
- Practice: Deploy model as real-time endpoint
- Key trap: "Real-time" doesn't always mean Real-time Endpoint!
⏰ Hours 28-30: MLOps & Orchestration
- Memorize: SPEC-SAM (MLOps services)
- Focus: SageMaker Pipelines (preferred solution!)
- Learn: Step Functions vs Airflow vs Pipelines
- Practice: Create simple SageMaker Pipeline
Hours 31-36: Domain 3 Continued & Domain 4 Start (Afternoon Session)
⏰ Hours 31-32: Endpoint Optimization
- Memorize: MASS-EI (Endpoint optimization)
- Learn: Multi-Model Endpoints, Auto-scaling, Shadow testing
- Cost focus: Inferentia, Elastic Inference savings
- Practice: Configure auto-scaling for endpoint
⏰ Hours 33-36: Domain 4 — Monitoring
- Memorize: CM-TAXI (Monitoring services)
- CRITICAL: Model Monitor - 4 drift types (memorize all!)
- Learn: CloudWatch metrics for SageMaker
- Practice: Set up Model Monitor for drift detection
Hours 37-45: Security & Practice (Evening Session)
⏰ Hours 37-39: Security
- Memorize: I-MAKE-VOWS (Security services), VINE (SageMaker security)
- Focus: IAM roles, VPC mode, KMS encryption
- Learn: Encryption at rest vs in transit
- Common scenario: "Most secure option" = VPC + KMS + IAM
⏰ Hours 40-45: Full Practice Exam
- Take full practice exam: 65 questions, timed (170 min)
- Simulate real conditions: No breaks, no phone
- Target score: 75%+ to feel confident
- Review ALL answers: Right and wrong, understand concepts
Hours 46-53: Intensive Review & Sleep
- Go through ALL mnemonics - write them out 10x each
- Review all visual diagrams and decision trees
- Identify weak areas from practice exam
- SLEEP (4-5 hours) - Critical for memory consolidation
Day 3 (24 Hours) — Exam Day
Hours 54-62: Final Review & Preparation
⏰ Hours 54-56: Cheat Sheet Review
- Print cheat sheet: Review the entire cheat sheet section
- Memorize: All quick decision rules
- Focus: Common traps section - don't fall for these!
- Write down: Master mnemonics on paper/whiteboard
⏰ Hours 57-59: Final Practice Test
- Take another full practice exam (65 questions)
- Time yourself strictly
- Target: 80%+ correct
- Quick review of wrong answers only
⏰ Hours 60-62: Light Review & Pre-Exam
- NO new information - just review
- Go through all visual diagrams one more time
- Recite all mnemonics out loud
- Relax, breathe, hydrate
🎮 The Master Framework: "ML-PIPE-DDMS"
"Build your ML-PIPE and remember DDMS!"
ML-PIPE = The ML Engineering Workflow:
- Model Development
- Load & Prepare Data (Data Preparation)
- Push to Production (Deployment)
- Inspect & Protect (Monitoring & Security)
- Pipelines (Orchestration)
- Evaluate Performance
DDMS = The 4 Critical Focus Areas:
- Data (28% of exam)
- Development (26% of exam)
- Monitoring (24% of exam)
- Shipping Code (Deployment 22% of exam)
🗄️ DOMAIN 1: Data Preparation for ML (28%)
⚡ HIGHEST WEIGHT: This domain is 28% of your exam - master it!
📦 Storage Services: "S-KEFS-R"
Remember all AWS storage options for ML data
Think: "Safeguard Key Engineering Features on Secure Resources"
- S3 - Object storage for data lakes
- Kinesis - Real-time streaming data
- EBS - Block storage for EC2/EMR
- FSx - High-performance file systems (Lustre for ML)
- Sagemaker Feature Store - Feature management
- Redshift - Data warehouse for analytics
💡 Memory Anchor
"My ML project needs S3 buckets, Kinesis streams, EBS volumes, FSx for HPC, SageMaker Feature Store, and Redshift for queries!"
📊 Data Formats: "PAJRC"
The 5 essential data formats for ML
Think: "Please Always Jot Record Correctly"
- Parquet - Columnar format (best for analytics)
- Avro - Binary format with schema
- JSON - Semi-structured text
- RecordIO-Protobuf - SageMaker's preferred format
- CSV - Simple tabular data
🎯 Exam Tip
Parquet = Analytics (columnar, compressed)
RecordIO = SageMaker training (pipe mode)
Avro = Streaming with schema evolution
⚙️ ETL & Processing: "GEEKS-DW"
Remember all data transformation services
Think: "GEEKS use Data Wrangling"
- Glue - Serverless ETL service
- EMR - Managed Hadoop/Spark clusters
- EMR Serverless - Auto-scaling Spark/Hive
- Kinesis Data Firehose - Stream ETL
- SageMaker Processing - ML-specific processing
- Data Wrangler - Visual data prep
- Wrangler (included above)
🔥 Hot Exam Topic
Glue = Serverless, cost-effective
EMR = Custom code, complex processing
Data Wrangler = Visual, no code, 300+ transforms
🔧 Feature Engineering: "BITES-NO"
Master all feature engineering techniques
Think: "Feature engineering BITES, Need Optimization"
- Binning - Group continuous values into buckets
- Imputation - Handle missing data (mean/median/KNN/MICE)
- Transforming - Log, sqrt, polynomial transforms
- Encoding - One-hot, label, target encoding
- Scaling - Normalization, standardization (MinMaxScaler, StandardScaler)
- Normalization - Make features comparable
- Outlier handling - Remove or cap extreme values
⚡ Quick Reference
Imputation Methods:
• Mean/Median = Simple, fast
• KNN = Better accuracy, slower
• MICE = Most advanced, iterative
Unbalanced Data:
• SMOTE = Synthetic minority oversampling
• Random Oversampling = Duplicate minority
• Undersampling = Remove majority
🎯 Data Quality Issues: "MOUD"
Remember the 4 main data quality challenges
Think: "Get the MOUD (mood) of your data right!"
- Missing values - Impute or drop
- Outliers - Detect & handle (>3σ from mean)
- Unbalanced classes - SMOTE, over/undersampling
- Duplicate records - Remove or aggregate
🤖 DOMAIN 2: ML Model Development (26%)
🧠 SageMaker Algorithms: "BLIINKS-FPXDR"
Master the 11 most important SageMaker algorithms
Think: "BLIINKS before making FP (false positive) XDR (extreme detection rate)"
Supervised Learning:
- BlazeText - Text classification, word2vec
- Linear Learner - Classification/Regression
- Image Classification - Computer vision
- IP Insights - Anomaly detection for IPs
- Neural Topic Model (NTM) - Topic discovery
- KNN - Classification/Regression
- Sequence2Sequence (Seq2Seq) - Translation
Unsupervised Learning:
- Factorization Machines - Recommendation
- PCA - Dimensionality reduction
- XGBoost - Gradient boosting (most popular!)
- DeepAR - Time series forecasting
- Random Cut Forest (RCF) - Anomaly detection
🎯 Algorithm Selection Guide
Classification/Regression: Linear Learner, XGBoost, KNN
Image Tasks: Image Classification, Object Detection
Text Tasks: BlazingText, Seq2Seq, NTM
Anomaly Detection: Random Cut Forest, IP Insights
Time Series: DeepAR
Recommendations: Factorization Machines
⚙️ Hyperparameter Tuning: "BRAGS"
Remember tuning strategies
Think: "Good tuning BRAGS about results"
- Bayesian Optimization - Smart search (SageMaker default)
- Random Search - Random combinations
- Automatic Model Tuning (AMT) - SageMaker's service
- Grid Search - Exhaustive search
- Stochastic (Early Stopping) - Stop poor performers
💡 Exam Tip
Bayesian = Most efficient (SageMaker recommended)
Random = Better than grid, less expensive
Grid = Exhaustive, expensive, thorough
Early Stopping = Save cost, stop bad runs early
📊 Evaluation Metrics: "FARM-CAR"
Remember classification and regression metrics
Think: "Evaluate models on a FARM using a CAR"
FARM = Classification Metrics:
- F1 Score - Harmonic mean of precision & recall
- Accuracy - Correct predictions / Total predictions
- ROC-AUC - Area under ROC curve
- Matrix (Confusion) - TP, TN, FP, FN breakdown
CAR = Regression Metrics:
- Coefficient of Determination (R²) - Variance explained
- Absolute Error (MAE) - Mean Absolute Error
- RMSE - Root Mean Squared Error
🎯 When to Use Which Metric
F1 Score: Imbalanced classes, need balance of precision/recall
ROC-AUC: Binary classification, threshold-independent
RMSE: Regression, penalizes large errors more
MAE: Regression, robust to outliers
💪 Training Modes: "PFIS"
SageMaker training optimization
Think: "PFISh for the best training mode"
- Pipe Mode - Stream data from S3 (fast, efficient)
- File Mode - Download entire dataset first (simple)
- Instance Types - ml.p3 (GPU), ml.c5 (CPU), ml.m5 (balanced)
- Spot Training - Save up to 90% on training costs
💰 Cost Optimization
Pipe Mode: Faster, no EBS needed, preferred for large datasets
Spot Training: Use with checkpointing for interruptible workloads
GPU Instances: p3 for training, g4 for inference
🚀 DOMAIN 3: Deployment and Orchestration (22%)
🎯 Deployment Options: "REBAS"
Remember all SageMaker deployment types
Think: "REBASe your model for production"
- Real-time Endpoints - Low latency, persistent
- Edge (Neo) - Deploy to edge devices (IoT)
- Batch Transform - Process large datasets offline
- Asynchronous Inference - Long-running requests
- Serverless Inference - Auto-scaling, pay per use
🎯 Deployment Selection Guide
Real-time: <1 sec latency, always-on, high cost
Serverless: Intermittent traffic, cold start OK, low cost
Batch Transform: Large batches, no real-time need
Async: Long processing (>60s), queue-based
Edge (Neo): IoT devices, no internet dependency
⚙️ MLOps & Orchestration: "SPEC-SAM"
Remember CI/CD and orchestration services
Think: "Write detailed SPECs for SAM (software)"
- SageMaker Pipelines - Native ML pipelines
- Projects - MLOps templates (CI/CD)
- EventBridge - Event-driven automation
- Code* Services - CodePipeline, CodeBuild, CodeDeploy
- Step Functions - Workflow orchestration
- Airflow (MWAA) - Apache Airflow managed service
- Model Registry - Version control for models
🔥 Hot Exam Topic
SageMaker Pipelines: Native, integrated, preferred
Step Functions: AWS-native, visual workflow
Airflow (MWAA): Complex DAGs, existing Airflow code
Model Registry: Track lineage, approve models
⚡ Endpoint Optimization: "MASS-EI"
Endpoint scaling and optimization
Think: "The MASS of data needs EI (elastic inference)"
- Multi-Model Endpoints - Host multiple models on one endpoint
- Auto Scaling - Scale based on invocations or metrics
- Shadow Testing - Test new models with production traffic
- Serial Inference Pipeline - Chain multiple models
- Elastic Inference (EI) - Attach GPU acceleration
- Inferentia - AWS-designed ML chips (cost-effective)
💡 Performance Tips
Multi-Model: Many models, low traffic each
Auto Scaling: Target tracking on InvocationsPerInstance
Shadow Testing: 0% production impact
Inferentia: Up to 70% cost reduction
🛡️ DOMAIN 4: Monitoring, Maintenance & Security (24%)
📊 Monitoring Services: "CM-TAXI"
Remember all monitoring and logging services
Think: "Call a CM (CloudWatch Metrics) TAXI"
- CloudWatch - Metrics, logs, alarms
- Model Monitor - Detect drift & quality issues
- Trusted Advisor - Best practice checks
- Athena - Query S3 logs with SQL
- X-Ray - Distributed tracing
- Inferences (Logs) - Capture prediction data
🎯 Model Monitor Drift Types
Data Quality Drift: Statistical properties change
Model Quality Drift: Accuracy metrics degrade
Bias Drift: Fairness metrics change
Feature Attribution Drift: Feature importance changes
🔒 Security Services: "I-MAKE-VOWS"
Remember all AWS security services
Think: "I MAKE VOWS to secure my ML models"
- IAM - Identity and Access Management
- Macie - Discover PII in S3
- AWS Shield - DDoS protection
- KMS - Key Management Service (encryption)
- Encryption (at rest & in transit) - S3, EBS, SageMaker
- VPC - Virtual Private Cloud (network isolation)
- Organizations - Multi-account management
- WAF - Web Application Firewall
- Secrets Manager - Manage credentials
🔐 Encryption Best Practices
At Rest: S3-SSE, EBS encryption, SageMaker notebook encryption
In Transit: TLS/HTTPS for all data transfer
KMS Keys: Customer-managed keys for compliance
VPC: Use PrivateLink for SageMaker in VPC
🛡️ SageMaker Security: "VINE"
SageMaker-specific security features
Think: "Secure your ML like a VINE protects grapes"
- VPC Mode - Network isolation
- IAM Roles - Execution roles for notebooks/jobs
- Network Isolation - No internet access
- Encryption Everywhere - KMS for notebooks, training, endpoints
📊 Visual Diagrams & Decision Trees
🔄 Complete ML Workflow Diagram
┌───────────────────────────────────────────────────┐
│ DATA PREPARATION (28%) │
│ S-KEFS-R │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ S3 → Kinesis → Glue/EMR → Data Wrangler → │
│ Feature Store │
│ (GEEKS-DW) (BITES-NO) │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ MODEL DEVELOPMENT (26%) │
│ BLIINKS-FPXDR │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ Train (PFIS) → Tune (BRAGS) → Evaluate │
│ (FARM-CAR) │
│ XGBoost, Linear Learner, DeepAR, BlazingText │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ DEPLOYMENT & ORCHESTRATION (22%) │
│ REBAS + SPEC-SAM │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ Real-time/Batch/Serverless → Pipelines → │
│ Auto-scaling │
│ (REBAS) (SPEC-SAM) (MASS-EI) │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ MONITORING, MAINTENANCE & SECURITY (24%) │
│ CM-TAXI + I-MAKE-VOWS │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ CloudWatch → Model Monitor → Drift Detection → │
│ IAM/KMS │
│ (CM-TAXI) (I-MAKE-VOWS) │
└───────────────────────────────────────────────────┘
🎯 Deployment Option Decision Tree
START: Need to deploy a model? │ ├─→ Real-time predictions needed? │ │ │ ├─→ YES → Latency < 1 second? │ │ │ │ │ ├─→ YES → Traffic pattern? │ │ │ │ │ │ │ ├─→ Constant/Predictable │ │ │ │ → REAL-TIME ENDPOINT │ │ │ │ • Always-on │ │ │ │ • Auto-scaling │ │ │ │ • ml.m5/c5/p3 instances │ │ │ │ │ │ │ └─→ Intermittent/Unpredictable │ │ │ → SERVERLESS INFERENCE │ │ │ • Auto-scales to zero │ │ │ • Cold start acceptable │ │ │ • Pay per invoke │ │ │ │ │ └─→ NO → Processing time > 60 sec? │ │ │ │ │ └─→ YES → ASYNCHRONOUS │ │ INFERENCE │ │ • Queue-based │ │ • S3 trigger │ │ • Long-running tasks │ │ │ └─→ NO → Large batch of data? │ │ │ └─→ YES → BATCH TRANSFORM │ • Process entire datasets │ • No endpoint needed │ • Cost-effective for bulk │ └─→ Deploy to edge devices? │ └─→ YES → SAGEMAKER NEO + EDGE • Compile for IoT • No internet required • Optimized inference
📋 Final Review Cheat Sheet (Print Before Exam)
🎯 THE ULTIMATE MASTER SENTENCE
"Use GEEKS-DW to prepare PAJRC data, train with BLIINKS, deploy via REBAS, orchestrate with SPEC-SAM, and monitor using CM-TAXI!"
🔑 All Mnemonics At A Glance
| Mnemonic | Full Expansion | Category |
|---|---|---|
| S-KEFS-R | S3, Kinesis, EBS, FSx, SageMaker Feature Store, Redshift | Storage Services |
| PAJRC | Parquet, Avro, JSON, RecordIO, CSV | Data Formats |
| GEEKS-DW | Glue, EMR, EMR Serverless, Kinesis Firehose, SageMaker Processing, Data Wrangler | ETL Services |
| BITES-NO | Binning, Imputation, Transforming, Encoding, Scaling, Normalization, Outliers | Feature Engineering |
| MOUD | Missing, Outliers, Unbalanced, Duplicates | Data Quality |
| BLIINKS-FPXDR | BlazingText, Linear, Image, IP Insights, NTM, KNN, Seq2Seq, Factorization, PCA, XGBoost, DeepAR, RCF | SageMaker Algorithms |
| BRAGS | Bayesian, Random, AMT, Grid, Stochastic/Early Stop | Hyperparameter Tuning |
| FARM-CAR | F1, Accuracy, ROC, Matrix | R², MAE, RMSE | Evaluation Metrics |
| PFIS | Pipe, File, Instance Types, Spot | Training Modes |
| REBAS | Real-time, Edge, Batch, Async, Serverless | Deployment Options |
| SPEC-SAM | SageMaker Pipelines, Projects, EventBridge, Code*, Step Functions, Airflow, Model Registry | MLOps Services |
| MASS-EI | Multi-Model, Auto Scaling, Shadow, Serial, Elastic Inference, Inferentia | Endpoint Optimization |
| CM-TAXI | CloudWatch, Model Monitor, Trusted Advisor, Athena, X-Ray, Inference logs | Monitoring Services |
| I-MAKE-VOWS | IAM, Macie, Shield, KMS, Encryption, VPC, Organizations, WAF, Secrets Manager | Security Services |
| VINE | VPC Mode, IAM Roles, Network Isolation, Encryption Everywhere | SageMaker Security |
⚠️ Common Exam Traps
❌ TRAP: "Real-time" doesn't always mean Real-time Endpoint
→ Could be Serverless (for intermittent) or Async (for long-running)
❌ TRAP: "Cost-effective" usually means Serverless/Spot/Pipe Mode
→ Not always-on Real-time Endpoints
❌ TRAP: File Mode is NOT always wrong
→ Required for custom code needing random access to data
❌ TRAP: Grid Search is NOT always best for tuning
→ Bayesian is better for complex parameter spaces
❌ TRAP: CSV is NOT best for analytics
→ Parquet is columnar, compressed, and optimized
❌ TRAP: Model Monitor is NOT just CloudWatch
→ It's specifically for drift detection (data, model, bias, feature)
❌ TRAP: XGBoost is NOT for everything
→ DeepAR for time series, RCF for anomalies, BlazingText for text
✍️ Write on Whiteboard/Paper FIRST (During Exam)
GEEKS-DW | PAJRC | BLIINKS-FPXDR
REBAS | SPEC-SAM | CM-TAXI
BRAGS | FARM-CAR | I-MAKE-VOWS
📖 Recommended Study Resources
Official AWS Resources:
- AWS ML Engineer Exam Guide - Primary source
- AWS Training and Certification portal
- AWS Documentation for SageMaker
- AWS Whitepapers on ML best practices
Practice Tests (Prioritized):
- Udemy by Stephane Maarek - High-quality questions
- AWS Skill Builder - Official practice

Comments