AWS — Interview Questions Booklet (50 Q&A)
Accounts & IAM • VPC & Networking • EC2/Lambda/Containers • S3/EBS/EFS/DBs • Eventing & Integration • Observability & Ops • Security & Governance • Cost & Well-Architected • Real-world Scenarios
1) What are AWS Regions and Availability Zones, and why do they matter for resiliency?
Answer: Regions are separate geographic areas; AZs are isolated data centers within a region. Spanning AZs protects against single-facility failures and reduces latency to users in chosen regions.
2) When should an organization use a multi-account strategy with AWS Organizations?
Answer: Use multiple accounts to isolate workloads, billing, and blast radius; apply guardrails with SCPs; enable cost allocation and delegated admin per domain (e.g., prod vs. dev, business units).
3) How do Service Control Policies (SCPs) differ from IAM policies?
Answer: SCPs set account/OU-level maximum permissions (guardrails). IAM policies grant permissions to principals inside an account but cannot exceed SCP boundaries.
4) What is the Shared Responsibility Model on AWS?
Answer: AWS secures the cloud (facilities, hardware, managed services); customers secure what they run/configure in the cloud (data, identity, network, OS, apps).
5) How do landing zones help large enterprises start on AWS?
Answer: They provide baseline multi-account structure, identity, network, logging, and guardrails so teams can onboard workloads consistently and compliantly.
6) How do IAM users, groups, and roles differ, and which should be preferred?
Answer: Users are long-lived identities; groups bundle permissions; roles are assumable, short-lived credentials. Prefer roles + SSO/MFA over static user keys.
7) What is the principle of least privilege, and how do you enforce it on AWS?
Answer: Grant only required actions/resources, use IAM conditions (tags, IPs), scoped roles, access advisor, and permission boundaries; review regularly.
8) When would you use resource-based policies instead of identity-based policies?
Answer: Use resource policies on S3, KMS, SQS, etc., to allow cross-account access or public access patterns without granting permissions to the caller’s identity.
9) How do AWS SSO (IAM Identity Center) and federation improve access management?
Answer: They centralize identity via corporate IdPs, provide SSO into accounts/roles, enforce MFA, and issue short-lived credentials with auditability.
10) What is AWS KMS used for, and how do CMKs integrate with services?
Answer: KMS manages encryption keys; services like S3, EBS, RDS can use CMKs for server-side encryption, with fine-grained access via key policies and grants.
11) What core components make up a VPC, and how do they interact?
Answer: Subnets (public/private), route tables, Internet/NAT gateways, NACLs, security groups, and endpoints; routes + ACLs/SecGroups control reachability and filtering.
12) How do security groups and network ACLs differ?
Answer: Security groups are stateful, instance-level firewalls; NACLs are stateless, subnet-level rules. Use SGs for most filtering; NACLs for coarse subnet rules.
13) When do you choose VPC peering, Transit Gateway, or AWS PrivateLink?
Answer: Peering: simple 1-to-1 private routing. Transit Gateway: hub-and-spoke scalable routing across many VPCs/VPNs. PrivateLink: private access to specific services via endpoints.
14) What are interface and gateway endpoints, and why use them?
Answer: Interface endpoints (ENIs) privately connect to AWS services/PrivateLink; gateway endpoints route S3/DynamoDB via private paths—improving security and avoiding NAT egress.
15) How do you connect on-prem networks to AWS securely and reliably?
Answer: Use Site-to-Site VPN for quick IPSec tunnels, Direct Connect for dedicated links/consistent throughput, or both for resilience (DX + VPN failover).
16) What EC2 purchase options exist, and when is each appropriate?
Answer: On-Demand (flexible), Savings Plans/Reserved Instances (steady workloads), and Spot (fault-tolerant, up to steep discounts). Mix for cost/perf.
17) How do Auto Scaling Groups maintain availability and cost efficiency?
Answer: They scale instances based on metrics/schedules, replace unhealthy nodes across AZs, and support mixed instance/Spot strategies.
18) When would you choose Lambda over EC2?
Answer: Choose Lambda for event-driven, intermittent workloads needing zero-admin and sub-minute execution; use EC2 for long-running, custom OS/runtime needs.
19) What are the differences between ECS, EKS, and Fargate?
Answer: ECS is AWS-managed orchestration; EKS is managed Kubernetes; Fargate runs containers serverlessly for either ECS/EKS without managing EC2.
20) How do you implement blue/green or canary deployments for compute workloads?
Answer: Use load balancers/weighted routing, CodeDeploy for EC2/ECS, or service mesh/gateway for EKS; shift traffic gradually with health checks and rollbacks.
21) What are key S3 storage classes, and how do lifecycle policies reduce cost?
Answer: Standard, Intelligent-Tiering, Standard-IA/One Zone-IA, Glacier Instant/Flexible/Deep Archive. Lifecycle moves/expunges objects based on age/access.
22) How do S3 encryption options differ (SSE-S3, SSE-KMS, SSE-C, client-side)?
Answer: SSE-S3 uses AWS-managed keys, SSE-KMS uses CMKs with auditing, SSE-C lets you supply keys, and client-side encrypts before upload for full client control.
23) When do you use EBS, EFS, or FSx for workloads?
Answer: EBS: block storage per instance; EFS: shared POSIX file storage across instances; FSx: managed file systems like Windows/ONTAP/Lustre for specific protocols/perf.
24) How do RDS Multi-AZ and read replicas differ?
Answer: Multi-AZ provides synchronous standby for HA/failover; read replicas are asynchronous copies for read scaling and offloading analytics.
25) What DynamoDB design choices affect scale and cost?
Answer: Choosing partition keys for even workload, using on-demand vs. provisioned + autoscaling, GSIs/LSIs judiciously, TTL to prune data, and DAX for caching.
26) How do API Gateway and Application Load Balancer differ for HTTP APIs?
Answer: API Gateway is feature-rich for serverless APIs (auth, throttling, usage plans). ALB suits L7 load-balancing to containers/EC2/Lambda with simpler routing.
27) When do you choose SQS vs. SNS vs. EventBridge?
Answer: SQS for durable queues and decoupling, SNS for pub/sub fan-out, EventBridge for event bus with routing by rules across SaaS/AWS sources.
28) What are Step Functions Standard vs. Express workflows used for?
Answer: Standard for long-running, exactly-once orchestrations with history; Express for high-throughput, short-lived orchestrations at lower cost.
29) How do you handle Lambda cold starts and concurrency limits?
Answer: Use provisioned concurrency, keep packages slim, reuse connections, and request concurrency increases with reserved/concurrent limits per function.
30) How does AWS Glue/Athena fit into a serverless data lake?
Answer: Glue catalogs ETL/metadata; Athena queries S3 with SQL on demand; together they enable schema-on-read analytics without managing clusters.
31) What are CloudWatch Metrics, Logs, and Alarms used for?
Answer: Metrics track numeric telemetry, Logs capture application/system output, and Alarms notify/act when thresholds or anomaly detections trigger.
32) How do CloudTrail and AWS Config complement each other?
Answer: CloudTrail records API activity for audit; Config tracks resource state/drift and evaluates compliance against rules over time.
33) What is AWS X-Ray, and when is it valuable?
Answer: X-Ray provides distributed tracing across services/functions, helping find latency bottlenecks and error hotspots in microservices/serverless apps.
34) How does Systems Manager (SSM) help with fleet operations?
Answer: SSM enables patching, state management, secure remote commands/Session Manager, parameter/secret storage, and runbooks for automation.
35) What options exist for IaC on AWS, and how do they differ?
Answer: CloudFormation (declarative templates), CDK (code-driven generating CFN), and Terraform (multi-cloud). Choose based on ecosystem and governance.
36) How do you protect internet-facing applications on AWS?
Answer: Use WAF for L7 filtering, Shield for DDoS protection, ALB/CloudFront for edge defense, TLS everywhere, and least-privilege backends.
37) What are GuardDuty, Inspector, and Macie used for?
Answer: GuardDuty: threat detection from logs; Inspector: automated vulnerability scanning; Macie: sensitive data discovery in S3.
38) How do you enforce baseline governance across many accounts?
Answer: Use Organizations with SCPs, Control Tower blueprints/guardrails, centralized logging/CloudTrail, and Config conformance packs.
39) What S3 features reduce data exfiltration risk?
Answer: Block Public Access, bucket policies with aws:PrincipalOrgID
, VPC endpoints, object ownership, and SSE-KMS with strict key policies.
40) How do you design key management for compliance?
Answer: Use CMKs with rotation, separate admin vs. usage roles, key grants for services, detailed CloudTrail/CloudWatch logging, and periodic access reviews.
41) How do you track and allocate AWS costs effectively?
Answer: Use cost allocation tags, Cost Explorer, Budgets/alerts, and split by accounts/OU; enable CUR for detailed analytics and chargeback.
42) When should you choose Savings Plans over Reserved Instances?
Answer: Savings Plans provide flexible savings across instance families/regions/services; RIs can offer specific benefits but are less flexible. Choose based on predictability and flexibility needs.
43) How do the AWS Well-Architected pillars guide design decisions?
Answer: They cover Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability—use reviews to find risks and remediations.
44) How do you architect for high availability vs. disaster recovery?
Answer: HA uses multi-AZ with automatic failover; DR plans cross-region strategies (backup/restore, pilot-light, warm standby, active-active) based on RPO/RTO.
45) What techniques reduce data transfer and egress costs?
Answer: Keep traffic within regions/VPCs, use PrivateLink/endpoints, compress content, cache with CloudFront, and review inter-AZ/region flows.
46) An S3 bucket suddenly became public; how do you lock it down quickly?
Answer: Enable Block Public Access at account/bucket, remove public ACLs/policies, add aws:PrincipalOrgID
restrictions, and verify via Access Analyzer.
47) An EC2 service is unreachable from the internet; what are your first checks?
Answer: Confirm public subnet + route to IGW, security group ingress, NACLs, instance health/port listening, and correct ALB/NLB target registration.
48) A Lambda function is timing out intermittently; how do you triage?
Answer: Check CloudWatch logs/X-Ray traces, increase timeout/memory (CPU), reuse connections, add retries/backoff, and examine downstream latency.
49) Cross-account access to an S3 bucket fails; what might be wrong?
Answer: Missing bucket policy allow, wrong principal/role trust, KMS key policy denies, or Block Public Access interfering with intended access.
50) A database migration to RDS is behind schedule; how can you accelerate safely?
Answer: Use DMS with ongoing replication, pre-cutover validation, scale target IOPS/instance, enable Multi-AZ after load, and perform phased cutover during low traffic.