Amazon SageMaker Pocket Book

Amazon SageMaker Pocket Book — Uplatz

Build • Train • Tune • Deploy • Monitor — end-to-end ML on AWS with practical snippets and ops tips

Section 1 — Fundamentals

1) What is Amazon SageMaker?

Amazon SageMaker is a managed platform for the full ML lifecycle: data prep, training, hyperparameter tuning, deployment, batch/real-time inference, and monitoring. It includes purpose-built tools like Studio, Processing jobs, Training jobs, HPO, Pipelines, Endpoints, Batch Transform, Model Registry, and Clarify/Profiler/Monitor.

# Quick sanity: list domains (Studio)
aws sagemaker list-domains --query 'Domains[].DomainName'

2) Why SageMaker? Strengths & Tradeoffs

Strengths: Managed infra, built-in Docker images/algorithms, autoscaling endpoints, Pipelines for CI/CD of ML, Model Registry, lineage/experiments.

Tradeoffs: AWS lock-in; cost governance needed; container/Image/role concepts to learn.

3) Core Building Blocks

  • Studio/Notebooks: IDE & notebooks with lifecycle configs.
  • Processing: run ETL/feature jobs in containers on demand.
  • Training: managed distributed training (Spot supported).
  • Tuning (HPO): Bayesian/Random/Hyperband search over params.
  • Endpoints: real-time inference with autoscaling/multi-model.
  • Batch Transform: offline scoring for large datasets.
  • Pipelines: DAGs for repeatable ML workflows.
  • Model Registry: version, approve, and promote models.

4) Data & Storage

Use S3 for artifacts/datasets; attach EFS/FSx for shared POSIX if needed. Control access via IAM roles + KMS. Keep data and compute in the same region.

5) Security & Roles

Use least-privilege execution roles for Processing/Training/Endpoints. Encrypt data at rest (S3 KMS, EBS KMS) and in transit (TLS). For private endpoints, use VPC mode with SGs/subnets.

Section 2 — Architecture & Workflow

6) Typical Project Blueprint

S3 (raw) → Processing (clean/features) → S3 (features) → Training (Spot) → Tuning (optional) → Model Registry → Endpoint (real-time) or Batch Transform → Model Monitor/Clarify.

# Create an execution role (placeholder ARNs)
aws iam create-role --role-name sagemaker-exec-role --assume-role-policy-document file://trust.json

7) Training with the Python SDK (XGBoost)

# pip install sagemaker==2.*
from sagemaker import Session
from sagemaker.inputs import TrainingInput
from sagemaker.xgboost.estimator import XGBoost

sess = Session()
role = "arn:aws:iam::123456789012:role/sagemaker-exec-role"
xgb = XGBoost(entry_point=None, framework_version="1.7-1",
              instance_type="ml.m5.xlarge", instance_count=1,
              output_path="s3://my-bucket/xgb/output", role=role)
xgb.fit({"train": TrainingInput("s3://my-bucket/xgb/train/"),
         "validation": TrainingInput("s3://my-bucket/xgb/val/")})

8) Hyperparameter Tuning (HPO)

from sagemaker.tuner import HyperparameterTuner, IntegerParameter, ContinuousParameter

xgb.set_hyperparameters(objective="binary:logistic", num_round=200)
tuner = HyperparameterTuner(
  estimator=xgb,
  objective_metric_name="validation:auc",
  hyperparameter_ranges={
    "max_depth": IntegerParameter(3, 10),
    "eta": ContinuousParameter(0.01, 0.3)
  },
  objective_type="Maximize",
  max_jobs=8, max_parallel_jobs=2
)
tuner.fit({"train": "s3://my-bucket/xgb/train/", "validation": "s3://my-bucket/xgb/val/"})

9) Deploy to Real-Time Endpoint

predictor = xgb.deploy(instance_type="ml.m5.large", initial_instance_count=2)
# Invoke
import json
result = predictor.predict(json.dumps({"instances":[[1.2,0.3,7.9]]}))

Use auto scaling on the endpoint (target RPS/latency) and set min/max capacities.

10) Batch Transform (Offline Inference)

transformer = xgb.transformer(instance_count=2, instance_type="ml.m5.xlarge",
                              output_path="s3://my-bucket/xgb/preds/")
transformer.transform(data="s3://my-bucket/xgb/test/", content_type="text/csv")
transformer.wait()

Section 3 — MLOps, Monitoring & Cost

11) Pipelines & Model Registry

Define steps (Processing → Training → Evaluate → RegisterModel → Deploy). Approve models in the Model Registry and promote across dev/stage/prod with CI/CD.

# Skeleton (Python SDK)
from sagemaker.workflow.pipeline import Pipeline
pipe = Pipeline(name="churn-pipeline", steps=[...])
pipe.upsert(role_arn=role); pipe.start()

12) Monitoring & Explainability

Model Monitor watches data/quality drift and violations; Clarify explains predictions and checks bias; Debugger/Profiler inspects training bottlenecks. Ship logs/metrics to CloudWatch for alarms.

13) GenAI & Managed Inference

Use JumpStart for pre-built models/notebooks, or host custom LLMs on inference endpoints. For multi-model, use MME; for many small models, consider Serverless Inference.

14) Cost Controls

  • Use Spot Training where interruption is tolerable; checkpoint to S3.
  • Scale endpoints to zero with Serverless for spiky/low-traffic use cases.
  • Right-size instances; turn off idle Studio/Notebooks.
  • Compress/quantize models; prefer multi-model endpoints for many variants.

15) Common Pitfalls

  • Over-permissive execution roles; restrict S3 prefixes and KMS keys.
  • Endpoints left running with low traffic → unnecessary cost.
  • Mismatched SDK/container versions; pin images and SDK versions.
  • No drift monitoring after deploy; add Model Monitor + alarms.

Section 4 — Quick Q&A

16) Interview Q&A — 8 Quick Ones

1) When to use Batch Transform vs Endpoint? Batch for offline/large jobs, Endpoint for low-latency real-time.

2) How to cut training cost? Spot training + smaller instances + profiling + better data sampling.

3) Multi-model endpoints (MME)? Host many models behind one endpoint; load/unload on demand.

4) CI/CD for ML? SageMaker Pipelines + Model Registry + CodePipeline/CodeBuild.

5) Secure private inference? VPC-only endpoints, KMS, SG allowlists, least-privilege roles.

6) Explainability? Use Clarify for SHAP-based insights and bias detection.

7) Drift detection? Model Monitor with baseline stats + CloudWatch alarms.

8) Choose instance type? CPU for classic ML/low TPS; GPU for deep learning; inferentia for cost-efficient DL inference.