Amazon S3 Pocket Book — Uplatz
Buckets & objects • Storage classes • Security & policies • Lifecycle & replication • CLI recipes
1) What is Amazon S3?
Amazon Simple Storage Service (S3) is object storage for any amount of data. Organised as buckets (global names) and objects (key/value + metadata). Designed for 11×9s durability with regional redundancy. Use it for backups, static sites, data lakes, and app assets.
# Create a private bucket (change region/name)
aws s3api create-bucket --bucket mycompany-app-assets --region ap-south-1 \
--create-bucket-configuration LocationConstraint=ap-south-1
2) Storage Classes (When to use what?)
- Standard: hot data, frequent access.
- Intelligent-Tiering: auto-cost optimise with no performance impact.
- Standard-IA / One Zone-IA: infrequent access (One Zone = single AZ).
- Glacier Instant / Flexible / Deep Archive: archival tiers with minutes→hours restore.
3) Core Features
- Versioning & MFA Delete for protection against accidental deletes/ransomware.
- Lifecycle: auto transition/expire data.
- Replication (CRR/SRR): compliance, locality, DR.
- S3 Select & Object Lambda for partial/transform reads.
- Access Points & Bucket Policies for fine-grained access.
4) Security Basics
Keep buckets private by default. Use bucket policies or IAM to grant least-privilege. Block Public Access unless static hosting + CDN is intended. Encrypt at rest with SSE-S3 or SSE-KMS; enforce TLS in policies.
# Block public access at bucket level
aws s3api put-public-access-block --bucket mycompany-app-assets \
--public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
5) Common CLI Ops
# Upload single file
aws s3 cp ./logo.png s3://mycompany-app-assets/img/logo.png
# Sync a folder (one-way)
aws s3 sync ./public s3://mycompany-app-assets/public --delete
# Generate a pre-signed URL (temporary access)
aws s3 presign s3://mycompany-app-assets/reports/q3.pdf --expires-in 3600
6) Bucket Policy — Private with TLS-only
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "HttpsOnly",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::mycompany-app-assets",
"arn:aws:s3:::mycompany-app-assets/*"
],
"Condition": { "Bool": { "aws:SecureTransport": "false" } }
}]
}
Attach via aws s3api put-bucket-policy --bucket ... --policy file://policy.json
7) Enable Versioning & Default Encryption
aws s3api put-bucket-versioning --bucket mycompany-app-assets --versioning-configuration Status=Enabled
aws s3api put-bucket-encryption --bucket mycompany-app-assets --server-side-encryption-configuration '{
"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms","KMSMasterKeyID":"alias/s3-kms-key"}}]
}'
Prefer KMS CMKs for auditability and key rotation.
8) Lifecycle Policy — Transition & Expire
{
"Rules": [{
"ID": "logs-retention",
"Filter": { "Prefix": "logs/" },
"Status": "Enabled",
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" }
],
"Expiration": { "Days": 365 }
}]
}
Apply with put-bucket-lifecycle-configuration
.
9) Cross-Region Replication (CRR)
Enable versioning on source and destination. Grant replication role permissions. Use CRR for compliance, proximity, or multi-region DR.
# (High-level) Create replication configuration JSON and apply:
aws s3api put-bucket-replication --bucket mycompany-app-assets --replication-configuration file://replication.json
10) Static Website + CDN
For static sites, enable website hosting on S3 and put CloudFront in front for HTTPS, caching, and geo performance. Keep bucket private and use an Origin Access Control (OAC).
aws s3 website s3://mycompany-site/ --index-document index.html --error-document 404.html
11) Cost Controls
- Use Intelligent-Tiering for unknown access patterns.
- Enable Lifecycle to IA/Glacier, expire old versions/multiparts.
- Compress objects (gzip/brotli) before upload when appropriate.
- Use S3 Storage Lens & Cost Explorer to track trends.
12) Access Patterns & Performance
S3 scales automatically; no need to shard prefixes. Use multipart upload for large files (>100MB). Prefer range GETs for partial reads. Co-locate compute (same region) to reduce latency/egress.
# Multipart upload (CLI auto when file > 8MB with cp/sync)
aws s3 cp big.bin s3://mycompany-app-assets/arch/big.bin
13) Data Lake & Analytics
Use S3 as the lake with data partitioning by date/source, open formats (Parquet), and Glue catalog. Query with Athena; stream with Kinesis/Firehose; ETL with AWS Glue or EMR/Spark.
14) Logging, Inventory, Events
Enable Server Access Logging or use CloudTrail data events for audit. S3 Inventory dumps object lists/metadata daily. Event Notifications → SQS/SNS/Lambda for workflows (e.g., thumbnails, AV scanning).
15) Common Pitfalls
- Public buckets unintentionally exposed (always enable Block Public Access).
- No versioning → cannot recover from deletes/overwrites.
- Unbounded lifecycle rules causing unexpected deletes—test on a prefix first.
- KMS permissions misconfigured → access denied despite IAM allow.
16) Interview Q&A — 8 Quick Ones
1) S3 vs EBS vs EFS? S3 = object storage (HTTP, massive scale), EBS = block for single EC2, EFS = NFS for multi-EC2.
2) How to prevent public access? Block Public Access + tight bucket policies + OAC with CloudFront.
3) When to use Intelligent-Tiering? When you don’t know access patterns; it auto-moves objects to cheaper tiers.
4) Disaster recovery? Versioning + CRR to another region; test restores.
5) Serve private content via CDN? CloudFront with OAC; signed URLs/cookies for auth.
6) Partial reads? Use Range headers or S3 Select to fetch specific byte ranges/columns.
7) Presigned URLs? Temporary, scoped access for uploads/downloads; expires in minutes/hours.
8) MFA Delete? Extra protection for versioned deletes—use where strict governance is required.