Amazon RDS Pocket Book — Uplatz
Engine choices • HA & scaling • Backups & security • CLI/SQL snippets • Ops tips & interview Q&A
1) What is Amazon RDS?
Amazon Relational Database Service (RDS) is a managed service for relational databases. It handles provisioning, patching, backups, and basic monitoring so you can focus on schema and queries. Supported engines include MySQL, PostgreSQL, MariaDB, SQL Server, and Oracle. For fully compatible PostgreSQL/MySQL with advanced features, see Amazon Aurora.
# List DB instances
aws rds describe-db-instances --query "DBInstances[].DBInstanceIdentifier"
2) Why RDS? Strengths & Tradeoffs
Strengths: Managed backups/patching, Multi-AZ HA, read replicas, encryption, IAM auth, easy scaling.
Tradeoffs: Limited superuser access; some extensions/features restricted; cost vs self-managed on EC2.
3) Core Building Blocks
- DB Instance: compute + storage for your engine
- Subnet Group: VPC subnets where RDS resides
- Parameter Group: engine config (tunable settings)
- Option Group: engine add-ons (e.g., SQL Server features)
- Security: SGs, KMS encryption, Secrets Manager
4) Engine Selection
PostgreSQL for advanced SQL/extensibility; MySQL/MariaDB for ubiquity; SQL Server/Oracle for enterprise workloads. Aurora for high throughput, fast failover, and storage autoscaling.
5) Storage & Performance
Choose gp3 (balanced) or io2 (high IOPS). Right-size IOPS and throughput to match workload. Use connection pooling and proper indexing to avoid CPU/IO bottlenecks.
6) Create a DB Instance (CLI)
aws rds create-db-instance \
--db-instance-identifier app-pg \
--engine postgres --engine-version 16 \
--master-username admin --master-user-password 'Strong#Passw0rd' \
--db-instance-class db.t4g.medium \
--allocated-storage 50 \
--storage-type gp3 \
--db-subnet-group-name app-db-subnets \
--vpc-security-group-ids sg-0123456789abcdef0 \
--publicly-accessible false \
--backup-retention-period 7 \
--storage-encrypted
Tip: prefer private subnets + NAT for app egress, avoid public exposure.
7) High Availability
Multi-AZ creates a synchronous standby in a different AZ; automatic failover on outage/patching. Read Replicas (async) offload reads and can be promoted for DR.
# Modify to Multi-AZ
aws rds modify-db-instance --db-instance-identifier app-pg --multi-az --apply-immediately
8) Connectivity & IAM
Control access via security groups and subnet groups. For PostgreSQL/MySQL, you can use IAM database authentication (token-based) instead of static passwords.
# Generate IAM auth token (example)
aws rds generate-db-auth-token --hostname db.xxxxxx.us-east-1.rds.amazonaws.com \
--region us-east-1 --port 5432 --username app_user
9) Backups & Snapshots
Automated backups (point-in-time restore) + manual snapshots for release checkpoints. Copy snapshots across regions for DR.
# Create a manual snapshot
aws rds create-db-snapshot --db-instance-identifier app-pg --db-snapshot-identifier app-pg-pre-release
10) Observability
Use CloudWatch metrics (CPU, Read/Write IOPS, FreeStorageSpace), Enhanced Monitoring for OS-level metrics, and Performance Insights to find slow SQL. Enable error logs and slow query logs.
11) Maintenance & Patching
Set a maintenance window. Minor version updates are generally safe; test in staging. Major version upgrades require more planning and snapshot/restore or blue/green.
12) Scaling Strategies
Scale vertically (instance class/storage/IOPS) or horizontally with read replicas. Use connection pooling (e.g., PG Bouncer/Aurora Serverless v2 for Aurora) and optimize queries/indexes first.
13) Security Best Practices
- Encrypt at rest (KMS) and in transit (TLS).
- Rotate credentials or use IAM auth.
- Restrict SG ingress to app subnets only.
- Separate app and admin roles; principle of least privilege.
14) Backup & Restore Drill
Regularly test PITR and snapshot restore into staging. Validate app boot, migrations, and read/write paths post-restore.
# Restore from snapshot to a new instance
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier app-pg-restore \
--db-snapshot-identifier app-pg-pre-release
15) Common Pitfalls
- Publicly accessible DBs without strict SGs.
- No automatic backups / too-short retention.
- Ignoring slow queries → CPU/IO spikes.
- Under-provisioned storage/IOPS → throttling.
16) Interview Q&A — 8 Quick Ones
1) RDS vs Aurora? Aurora is cloud-native with distributed storage, faster failover, and better scaling; RDS is managed single-instance engines.
2) Multi-AZ vs Read Replica? Multi-AZ = HA (sync standby); Read Replica = scale reads/DR (async).
3) How to secure RDS? Private subnets, SG allowlists, TLS, KMS encryption, IAM auth/rotation, least-privilege roles.
4) When to scale? CPU/IO bottlenecks, connection saturation, storage pressure; fix queries/indexes first.
5) PITR vs Snapshot restore? PITR uses automated backups/logs to a specific time; snapshot is to a snapshot point.
6) Blue/Green upgrades? Create a parallel environment, migrate traffic after validation; minimizes downtime/risk.
7) Monitoring must-haves? CPU, FreeableMemory, Read/Write IOPS, DiskQueueDepth, connections, replication lag.
8) Cost controls? Right-size instance/storage/IOPS, stop non-prod at night, use reserved instances where stable.