Interview Questions Booklet – GCP

Google Cloud Platform (GCP) — Interview Questions Booklet (50 Q&A)

Org & IAM • VPC & Load Balancing • Compute (GCE/GKE/Cloud Run) • Storage & Databases • BigQuery & Dataflow • Pub/Sub & Eventing • Security (KMS/VPC-SC) • Observability • DevOps & Governance • Real-World Scenarios

Section 1 — GCP Fundamentals & Resource Hierarchy

1) What is GCP, and when should an enterprise choose it over other clouds?

Answer: GCP is Google’s cloud for compute, storage, data, and ML. It’s compelling for global networking, analytics (BigQuery), Kubernetes leadership, and strong security primitives like VPC Service Controls.

2) How is the GCP resource hierarchy organized, and why does it matter?

Answer: Organization → Folders → Projects → Resources. Policies, billing, and IAM inherit down the tree, enabling guardrails and consistent access control.

3) What is the difference between primitive, predefined, and custom IAM roles?

Answer: Primitive roles (Owner/Editor/Viewer) are broad; predefined roles are curated per service; custom roles let you tailor permissions for least-privilege.

4) How do service accounts work, and how should they be secured?

Answer: Service accounts are nonhuman identities for workloads. Grant minimal roles, rotate keys (or avoid keys), and use Workload Identity Federation where possible.

5) How are billing accounts linked to projects, and how do you control spend?

Answer: Projects attach to billing accounts. Use budgets/alerts, label/tag resources, export the Cost & Usage Report to BigQuery, and apply quotas/org policies.

Section 2 — Networking & Connectivity

6) What makes GCP VPCs unique compared to other clouds?

Answer: VPCs are global constructs with regional subnets. They support dynamic routing, hierarchical firewalls, and private access to Google APIs.

7) When should you use Shared VPC versus VPC peering?

Answer: Use Shared VPC to centrally host networking for multiple projects with consistent policy; use VPC peering for simple, private connectivity between VPCs with separate admin domains.

8) Which Google Cloud load balancer should you pick for global web traffic?

Answer: Use the Global HTTP(S) Load Balancer for anycast IP, cross-region failover, and CDN integration; choose regional internal LBs for east-west or private services.

9) What is Private Service Connect, and when is it useful?

Answer: PSC provides private, VPC-internal endpoints to Google or third-party services, avoiding public IPs and simplifying consumer/provider connectivity.

10) How do you connect on-prem networks to GCP reliably?

Answer: Use HA VPN for IPSec tunnels, Dedicated/Partner Interconnect for high throughput/low latency, or combine (Interconnect + VPN) for resilient hybrid links.

Section 3 — Compute: GCE, GKE, Cloud Run & App Engine

11) How do Compute Engine machine types and Spot VMs influence cost and performance?

Answer: Choose general-purpose, compute-optimized, or memory-optimized shapes; Spot VMs cut cost for fault-tolerant workloads with preemption trade-offs.

12) What are Managed Instance Groups (MIGs), and why use them?

Answer: MIGs provide autoscaling, autohealing, rolling updates, and regional distribution for VM fleets, improving availability and operations.

13) When should you choose GKE, Cloud Run, or App Engine for an application?

Answer: GKE for Kubernetes control and complex microservices, Cloud Run for serverless containers/scale-to-zero, App Engine for PaaS simplicity and autoscaling.

14) How do Cloud Run and Cloud Functions differ, and when is each preferable?

Answer: Cloud Run runs containerized HTTP services/events; Functions runs code snippets per event. Choose Run for custom runtimes and HTTP control; Functions for lightweight event handlers.

15) How can you implement blue/green or canary releases on GCP?

Answer: Use Cloud Load Balancing with weighted backends, Cloud Run traffic splits, or Cloud Deploy for progressive delivery and easy rollbacks.

Section 4 — Storage & Databases

16) Which Cloud Storage classes exist, and how do lifecycle rules reduce cost?

Answer: Standard, Nearline, Coldline, and Archive. Lifecycle policies transition or delete objects by age/conditions to optimize spend.

17) What consistency and performance characteristics does Cloud Storage provide?

Answer: GCS offers strong read-after-write/list consistency, high throughput with parallelism, and per-object atomic updates.

18) How do Persistent Disk, Local SSD, and Filestore differ?

Answer: PD is durable network block storage; Local SSD is ephemeral, ultra-fast; Filestore is a managed NFS for shared POSIX file workloads.

19) When should you choose Cloud SQL, Spanner, Bigtable, or Firestore?

Answer: Cloud SQL for managed relational; Spanner for global, horizontally scalable relational; Bigtable for wide-column, low-latency at scale; Firestore for serverless NoSQL with sync and offline.

20) How do CMEK and CSEK differ, and where are they applied?

Answer: CMEK uses KMS customer keys with many services (GCS, BigQuery, Spanner, disks). CSEK lets you supply encryption keys client-side for specific storage use cases.

Section 5 — Data Analytics: BigQuery & Pipelines

21) How does BigQuery separate storage and compute, and why is that useful?

Answer: Data storage is decoupled from query compute, enabling elastic scaling, independent cost control, and shared datasets across workloads.

22) How do partitioning and clustering improve BigQuery performance and cost?

Answer: Partitioning prunes data by date/ingestion; clustering organizes by columns to reduce scanned bytes and speed selective queries.

23) When should you choose Dataflow versus Dataproc for data processing?

Answer: Dataflow (Apache Beam) for serverless, autoscaling batch/stream pipelines; Dataproc for managed Hadoop/Spark when you need ecosystem control.

24) What are the trade-offs between BigQuery streaming inserts and batch loads?

Answer: Streaming gives low-latency availability at higher cost/quotas; batch loads are cheaper and better for large periodic ingests.

25) How do you govern data on GCP using Dataplex and Data Catalog?

Answer: Use Dataplex to organize lakes and apply policies; Data Catalog for metadata/lineage; combine with IAM, tags, and row-level security.

Section 6 — Eventing, Messaging & Integration

26) What are Pub/Sub topics and subscriptions, and how do they enable decoupling?

Answer: Publishers send to topics; subscribers pull or push from subscriptions. Durable queues decouple producers and consumers with scaling and replay.

27) How do ordering keys and exactly-once delivery work in Pub/Sub?

Answer: Ordering keys preserve per-key order; exactly-once processing requires idempotent consumers and, where supported, exactly-once delivery settings.

28) When should you use Eventarc or Workflows with Cloud Run?

Answer: Use Eventarc to route cloud/SaaS events to services; use Workflows to orchestrate multi-step calls with retries, timeouts, and compensation.

29) How do you securely call Google APIs from private workloads?

Answer: Use Private Google Access or PSC for private endpoints, attach appropriate IAM scopes, and restrict egress with firewall/NAT rules.

30) What patterns integrate SaaS apps with GCP services reliably?

Answer: Use Pub/Sub as a buffer, Cloud Run/Functions for adapters, Workflows for orchestration, and retries/backoff with dead-letter topics.

Section 7 — Security & Compliance

31) How do Organization Policies help govern large GCP estates?

Answer: Org Policies enforce constraints (e.g., allowed regions, external IPs) at org/folder/project, providing central guardrails that inherit down.

32) What is VPC Service Controls, and what risks does it mitigate?

Answer: VPC-SC creates service perimeters around Google APIs to reduce data exfiltration, only allowing access from approved networks/contexts.

33) When do you use Cloud Armor versus Identity-Aware Proxy (IAP)?

Answer: Cloud Armor protects apps with WAF/DDoS rules at the edge; IAP enforces identity-based access to web/SSH/RDP apps without opening networks.

34) How should keys and secrets be managed on GCP?

Answer: Use Cloud KMS for encryption keys with CMEK, and Secret Manager for credentials/tokens. Avoid service account keys; prefer federation.

35) How do Binary Authorization and Artifact Analysis improve supply-chain security?

Answer: They enforce signed images and scan for vulnerabilities, only admitting attested artifacts to GKE/Cloud Run deployments.

Section 8 — Observability & Reliability

36) Which Cloud Operations tools cover logs, metrics, traces, and errors?

Answer: Cloud Logging, Cloud Monitoring, Cloud Trace, Cloud Profiler, and Error Reporting provide end-to-end observability for apps and infra.

37) How do you implement SLOs and actionable alerting in GCP?

Answer: Define SLIs/SLOs in Monitoring, use burn-rate alerts over multiple windows, attach runbooks, and route to on-call with suppressions.

38) What are uptime checks, and how do they tie to alerting?

Answer: Uptime checks monitor public/private endpoints from probes; failures trigger alerts and incident workflows tied to SLOs.

39) How do you collect Prometheus metrics for GKE and VMs?

Answer: Use Managed Service for Prometheus or Ops Agent to scrape/export metrics, then visualize/alert in Cloud Monitoring.

40) How do you route logs to external systems or sinks?

Answer: Configure Log Router sinks to Pub/Sub, BigQuery, or Cloud Storage; filter by resource/labels and apply CMEK if required.

Section 9 — DevOps, Governance & Cost

41) How do Artifact Registry and Container Registry differ, and which should be used?

Answer: Artifact Registry is the modern, regional, multi-format repository replacing Container Registry; prefer it for new builds.

42) How do Cloud Build and Cloud Deploy support CI/CD on GCP?

Answer: Cloud Build builds/tests artifacts; Cloud Deploy orchestrates progressive releases to GKE/Cloud Run with approvals and rollbacks.

43) What options exist for Infrastructure as Code on GCP?

Answer: Terraform (multi-cloud), Google Cloud Deployment Manager (legacy), and Config Connector (K8s CRDs) for GitOps with Anthos/ACM.

44) How can you enforce policy as code across projects and clusters?

Answer: Use Organization Policies and Policy Controller (OPA Gatekeeper) with Anthos Config Management to validate configs pre-merge/pre-apply.

45) What cost-optimization levers are most effective on GCP?

Answer: Committed Use Discounts, Sustained Use Discounts, rightsizing, Spot VMs, autoscaling, storage lifecycle rules, and network egress reduction via Cloud CDN/Interconnect.

Section 10 — Real-World Scenarios & Troubleshooting

46) A service account receives 403 errors accessing a GCS bucket; what do you check first?

Answer: Confirm IAM roles on the bucket/project, uniform bucket-level access vs. ACL conflicts, and CMEK key permissions if the object is KMS-encrypted.

47) GKE pods cannot reach the internet from private subnets; how do you fix it?

Answer: Ensure routes/NAT are configured (Cloud NAT or NAT gateway), allow egress firewall rules, and verify no Org Policy blocks external IPs.

48) A Cloud Run service shows high cold-start latency; what mitigations help?

Answer: Set minimum instances, increase CPU/memory, optimize image size, raise concurrency thoughtfully, and avoid unnecessary VPC egress.

49) BigQuery costs have spiked unexpectedly; how do you control them quickly?

Answer: Enforce partitions/filters, add clustering, use materialized views, cap slots or use reservations, and monitor bytes scanned per query.

50) VPC Service Controls are blocking access to a required API; how do you proceed safely?

Answer: Verify the service perimeter and access levels, use perimeter-bridging/ingress rules if justified, and document exceptions via change control.