DynamoDB Pocket Book

DynamoDB Pocket Book — Uplatz

50 deep-dive flashcards • Wide layout • Fewer scrolls • 20+ Interview Q&A • Readable code examples

Section 1 — Fundamentals

1) What is DynamoDB?

Amazon DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond latency at any scale. Key–value & document data models, automatic scaling, and built-in security. Ideal for high-traffic APIs, gaming, IoT, sessions, leaderboards. Pair with analytics engines for ad-hoc queries.

# AWS CLI: list tables
aws dynamodb list-tables

2) Why DynamoDB? Strengths & Tradeoffs

Strengths: serverless, global scale, predictable performance, fine-grained IAM, point-in-time recovery. Tradeoffs: query patterns must be known up front; joins/aggregations require modeling or ETL. Use GSIs/LSIs and streams to extend patterns.

# Create a PAY_PER_REQUEST table
aws dynamodb create-table \
  --table-name Users \
  --attribute-definitions AttributeName=UserId,AttributeType=S \
  --key-schema AttributeName=UserId,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

3) Core Concepts

Table → Items → Attributes. Primary key is required: partition (HASH) and optional sort (RANGE). Items are schemaless per row. Throughput via RCUs/WCUs (provisioned) or on-demand.

# Put an item
aws dynamodb put-item --table-name Users \
  --item '{"UserId":{"S":"u1"},"Name":{"S":"Aroha"}}'

4) Consistency Models

By default, reads are eventually consistent (faster/cheaper). Strongly consistent reads return the latest write in the same Region. Global tables across Regions are eventually consistent.

aws dynamodb get-item \
  --table-name Users \
  --key '{"UserId":{"S":"u1"}}' \
  --consistent-read

5) DynamoDB vs RDBMS

No joins or multi-item ACID by default (transactions exist but limited). You denormalize and precompute access paths. Use cases with variable schemas and massive scale benefit most.

-- SQL join vs. Single-table design
-- RDBMS: SELECT u.name, o.total FROM users u JOIN orders o ON ...
-- Dynamo: PK = USER#<id>, SK prefixes: PROFILE#, ORDER#<id>

6) Partition Keys & Hotspots

Partition key determines shard. Avoid skew (e.g., same key hammered). Use high-cardinality keys, write sharding (suffixes), or time buckets to distribute load.

# Example sharded PK
PK = USER#u1#shard_07, SK = ORDER#2025-08-23#0001

7) Secondary Indexes

GSI: different partition/sort keys; global capacity. LSI: same partition key, different sort key; created at table creation only. Indexes cost capacity/storage.

# Create a GSI (CLI snippet)
--global-secondary-index-updates file://gsi.json

8) Billing Modes

On-demand: pay per request, auto scales. Provisioned: set RCUs/WCUs, cheaper when steady; add Auto Scaling. Consider reserved capacity for predictable fleets.

aws dynamodb update-table --table-name Users \
  --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

9) Streams (CDC)

DynamoDB Streams capture item changes; invoke Lambda for event-driven flows, projections, search indexing, or CQRS read models.

aws dynamodbstreams list-streams --table-name Users

10) Q&A — “Why is schema design critical?”

Answer: Because queries must be key-driven. A good design maps all access patterns to PK/SK (or GSIs) so reads are constant-time. Poor design leads to table scans and throttling.

Section 2 — Development & APIs

11) SDKs

Use AWS SDKs (boto3, v3 JS SDK, Java, Go). Implement retries with exponential backoff and jitter. Respect service limits and request size caps (400KB per item).

# Python (boto3)
import boto3
db = boto3.client("dynamodb")
db.get_item(TableName="Users", Key={"UserId":{"S":"u1"}})

12) Document Client (Node.js)

Document clients simplify marshalling JSON to AttributeValues. Prefer v3 modular packages in Node to reduce bundle size.

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { GetCommand, DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";
const d = DynamoDBDocumentClient.from(new DynamoDBClient({}));
await d.send(new GetCommand({ TableName:"Users", Key:{ UserId:"u1" }}));

13) Batch APIs

BatchGetItem/BatchWriteItem up to 25 items per call. Handle unprocessed items; they must be retried. Respect per-partition limits to avoid throttling.

aws dynamodb batch-write-item --request-items file://batch.json

14) Transactions

ACID across multiple items/tables with TransactWriteItems/TransactGetItems. Limits: 25 items, 4 MB per txn, conditional checks recommended.

aws dynamodb transact-write-items --transact-items file://tx.json

15) Query vs Scan

Query is key-only and efficient; Scan reads everything (costly). Model to use Query. Use filters sparingly—they still read items.

aws dynamodb query --table-name Users \
  --key-condition-expression "UserId=:u" \
  --expression-attribute-values '{":u":{"S":"u1"}}'

16) Pagination

Use LastEvaluatedKey and ExclusiveStartKey to iterate. Present cursors to clients; avoid offset-based pagination.

let startKey;
do {
  const r = await d.send(new QueryCommand({ ExclusiveStartKey:startKey, ... }));
  startKey = r.LastEvaluatedKey;
} while (startKey);

17) Expressions

Avoid reserved words with ExpressionAttributeNames. Build safe updates with SET, ADD, REMOVE, DELETE operations.

UpdateExpression: "SET #n = :v",
ExpressionAttributeNames: {"#n":"Name"},
ExpressionAttributeValues: {":v":"Hemi"}

18) TTL (Time-to-Live)

Auto-expire items by epoch timestamp on a TTL attribute. Deletions are eventually consistent; not suitable for hard deadlines.

aws dynamodb update-time-to-live \
  --table-name Sessions \
  --time-to-live-specification "Enabled=true,AttributeName=expiresAt"

19) DAX (DynamoDB Accelerator)

Managed, in-memory cache fronting DynamoDB. Microsecond read latency. Drop-in SDKs exist. Best for read-heavy, repeatedly accessed keys.

aws dax create-cluster --cluster-name MyDAX --node-type dax.t3.small --replication-factor 3

20) Q&A — “Query or Scan for analytics?”

Answer: Neither—export to S3 (TTL/streams/Data Pipeline/Glue) and analyze with Athena/EMR/Redshift. Dynamo is for OLTP, not ad-hoc analytics.

Section 3 — Data Modeling, Patterns & Concurrency

21) Single-Table Design

Store multiple entity types in one table; encode entity type in PK/SK prefixes (e.g., USER#, ORDER#). Enables cross-entity queries with a single request.

PK = USER#<id>, SK = PROFILE#
PK = USER#<id>, SK = ORDER#<orderId>

22) Composite Keys

Use sort keys for time-ordering and ranges (BETWEEN, begins_with). Prefer ISO timestamps or yyyymmdd for lexicographic order.

-- Get last 30 days of orders
SK BETWEEN "ORDER#20250725" AND "ORDER#20250823"

23) GSIs for Alternate Access

When you need to query by email or status, project attributes to a GSI keyed on those attributes. Choose ALL, KEYS_ONLY, or INCLUDE projection types.

GSI1PK = EMAIL#<addr>, GSI1SK = USER#<id>

24) Sparse Indexes

Populate GSI PK/SK only on items requiring that access. Index remains “sparse” and efficient for targeted queries (e.g., STATUS#PENDING).

-- Only pending orders set GSI2PK = STATUS#PENDING

25) Adjacency Lists

Model relationships by writing multiple items per relation: PK=USER#A, SK=FRIEND#B and reciprocal if needed. Query neighbors quickly by PK.

PK = USER#u1, SK = FRIEND#u2

26) Materialized Aggregates

Precompute counts/summaries in items updated via Streams/Lambda (CQRS). Reads become O(1); ensure idempotency and conflict handling.

PK=USER#u1, SK=STATS#orders_count

27) Conditional Writes

Use ConditionExpression to enforce invariants (optimistic concurrency). Prevent overwrites or ensure attribute existence/nonexistence.

ConditionExpression: "attribute_not_exists(#v)",
ExpressionAttributeNames: {"#v":"Version"}

28) Versioning & OCC

Store a numeric version. Update with SET version = version + 1 only if current version matches. Retry on conflict.

ConditionExpression: "version = :cur"

29) Idempotency Keys

Write operations with client-generated idempotency keys stored in a dedicated item prevent duplicates on retries.

PK = IDEMPOTENCY#<key>, SK=REQUEST#, status=APPLIED

30) Q&A — “When add a GSI vs new table?”

Answer: Add a GSI if the data is the same logical dataset and you just need another access path. Create a new table if lifecycle, throughput isolation, or ownership boundaries differ materially.

Section 4 — Integrations, Analytics & Global

31) Global Tables

Multi-Region active–active replication. Low-latency local reads/writes; conflicts resolved by last-writer-wins (timestamp). Beware write conflicts; design for commutativity.

aws dynamodb update-table --table-name Users \
  --replica-updates '[{"Create":{"RegionName":"ap-southeast-2"}}]'

32) S3 Export/Import

Export table/point-in-time snapshot to S3 parquet without impacting performance. Import from S3 to seed data or migrate.

aws dynamodb export-table-to-point-in-time \
  --table-arn arn:aws:dynamodb:...:table/Users --s3-bucket my-bucket

33) Kinesis & Event-Driven

Fan out streams to Kinesis for near-real-time analytics or to Lambda for projections. Use DLQs and retries with backoff.

# Lambda trigger from DynamoDB Stream (Console/IaC)

34) Search with OpenSearch

Project selected fields to OpenSearch/Elasticsearch via Streams+Lambda for full-text queries. Store doc IDs as Dynamo keys to maintain referential integrity.

_id = "USER#u1", fields: name, bio, tags

35) Caching Layers

In-memory LRU in app, Redis edge cache, DAX for Dynamo specific. Cache hot keys, use TTLs, and add cache-busting on writes.

# Pseudocode
if (cache.has(key)) return cache.get(key)
val = dynamo.get(key); cache.set(key, val, 60)

36) Step Functions & Sagas

Use Step Functions to coordinate multi-step writes/compensation across tables/services. Persist saga state in Dynamo with idempotent tasks.

State = { sagaId, step, status, updatedAt }

37) GraphQL with AppSync

AppSync integrates resolvers directly with DynamoDB. Use $util VTL or JS resolvers; add pipeline resolvers for auth/validation and batch operations.

# Key resolver: Query getUser(UserId: ID!): User

38) Time-Series Data

Model time-series with composite keys and time buckets (daily/monthly). Periodically archive old buckets to S3 via TTL + Streams.

PK = SENSOR#<id>, SK = TS#2025-08-23T12:00:00Z

39) Multi-Tenancy

Tenant-aware PK prefix (e.g., TENANT#id) enforces logical isolation. Combine with IAM condition keys (dynamodb:LeadingKeys) to restrict access per tenant.

PK = TENANT#t1#USER#u1

40) Q&A — “How to do aggregations?”

Answer: Precompute and store aggregates (materialized views) updated by Streams, or export to analytics systems. DynamoDB doesn’t natively aggregate across items.

Section 5 — Security, Ops, Cost, Testing, Interview Q&A

41) Security Fundamentals

Use IAM least privilege with condition keys (partition prefix scoping). Encrypt at rest (KMS) and in transit (TLS). Deny by default; audit with CloudTrail.

Condition:
  "ForAllValues:StringLike": { "dynamodb:LeadingKeys": ["TENANT#t1*"] }

42) Backup & PITR

Enable Point-In-Time Recovery for continuous backups (last 35 days). Use on-demand backups for long-term retention & compliance.

aws dynamodb update-continuous-backups \
  --table-name Users --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true

43) Cost Optimization

Prefer keys over scans, compress large attributes, project only needed fields to GSIs, consolidate hot entities, and leverage on-demand for spiky workloads. Monitor using Cost Explorer and CloudWatch.

# Avoid KEYS_ONLY if you always need attributes (extra fetches)

44) Throttling & Timeouts

Handle ProvisionedThroughputExceededException with retries and backoff. Bound client timeouts; surface metrics. Consider adaptive retry strategies.

maxRetries=8, backoff=exponential+jitter

45) Observability

Emit structured logs with request IDs, track ConsumedCapacity, monitor ThrottledRequests, Read/WriteThrottleEvents, and p99 latencies. Add app-level metrics per access pattern.

ReturnConsumedCapacity: "TOTAL"

46) Testing Strategy

Use local emulators (DynamoDB Local) for fast tests, then integration tests in a sandbox AWS account. Seed data with fixtures; test conditional writes and retries.

docker run -p 8000:8000 amazon/dynamodb-local

47) Migrations & Data Changes

No schema migrations like SQL; evolve via backfills, dual-writes, or on-read transforms. Keep write paths idempotent during transitions.

phase1: write v1+v2; phase2: backfill; phase3: read v2; phase4: stop v1

48) Prod Checklist

  • Access patterns mapped to PK/SK and GSIs
  • TTL for ephemeral data & exports to S3
  • PITR/backups enabled; restore tested
  • Retries/backoff + idempotency keys
  • Dashboards for capacity, throttles, latency
  • Least-privilege IAM with tenant scoping

49) Common Pitfalls

Relying on scans, low-cardinality PKs, oversized items (>400KB), over-projecting GSIs, ignoring hot partitions, and skipping conditional writes. Prevent with reviews, load tests, and observability.

50) Interview Q&A — 20 Practical Questions (Expanded)

1) Why DynamoDB over RDS for a social feed? Predictable low-latency, scale, and single-table access patterns map well to timelines and denormalized aggregates.

2) Define partition & sort keys. Partition key routes data to a shard; sort key orders items within a partition, enabling range queries.

3) GSI vs LSI? GSI uses alternate partition/sort keys and independent capacity; LSI shares partition key, created with table only.

4) Prevent lost updates? Conditional writes with a version attribute (OCC) and retries.

5) Hot partition — symptoms & fixes? Throttling on specific PK; shard keys, add random suffixes, or redesign access.

6) Why single-table design? Minimizes joins and network calls, enabling O(1) access for related entities via key prefixes.

7) When use transactions? Cross-item invariants (deduct balance + write order) that must succeed or fail together.

8) Query filtering gotcha? FilterExpression still reads all matched keys; costs capacity. Prefer key conditions.

9) TTL caveat? Deletion is eventual; don’t rely for precise expiry moments.

10) Item size limits? 400KB including attribute names. Store blobs in S3 and reference via URL.

11) Strong reads across Regions? Not supported—global tables are eventually consistent.

12) Write idempotency? Use idempotency keys stored in a guard item and conditional writes.

13) Cost drivers? RCUs/WCUs consumed, GSI projections, data transfer, storage. Design to minimize reads/writes.

14) Modeling many-to-many? Adjacency items per relation + GSIs for reverse lookups.

15) Fan-out reads? Store timeline per user partition; write-time fan-out via Streams/Lambda.

16) How to paginate? Use LastEvaluatedKey cursors; never offset/limit.

17) Backups vs PITR? PITR for rolling window recovery; on-demand backups for long-term archival/compliance.

18) IAM tenant isolation? dynamodb:LeadingKeys conditions to restrict PK prefixes per tenant identity.

19) Handling spikes? Use on-demand or Auto Scaling, pre-warm partitions with diverse keys, DAX for read bursts.

20) Observability must-haves? ConsumedCapacity, throttle counts, latency histograms, and per-pattern dashboards + alarms.