Cassandra Pocket Book

Cassandra Pocket Book — Uplatz

50 deep-dive flashcards • Wide layout • Fewer scrolls • 20+ Interview Q&A • Readable code examples

Section 1 — Fundamentals

1) What is Cassandra?

Apache Cassandra is a distributed, wide-column database designed for high write throughput, linear horizontal scaling, and fault tolerance with no single point of failure. It uses a masterless peer-to-peer architecture where any node can accept reads/writes, replicating data across nodes and data centers. Cassandra excels at time‑series, event logging, IoT, user activity feeds, and workloads that favor write-heavy, append‑style operations with predictable low latency.

# Quick feel: run cqlsh in a test cluster
cqlsh -e "SHOW VERSION;"
# or with Docker (example)
docker run -d --name cass -p 9042:9042 cassandra:latest

2) Why Cassandra? Core Strengths & Tradeoffs

Strengths: Always‑on availability, tunable consistency per operation, massive write scalability, and data distribution via consistent hashing. Tradeoffs: Query patterns must be modeled up front (no ad‑hoc joins), secondary indexes are limited, and large partitions/tombstones can hurt performance. Design “query‑first,” keep partitions bounded, and use replication_factor with quorum reads/writes for strong guarantees.

# Create a project keyspace quickly (example replication)
CREATE KEYSPACE app WITH replication = {
  'class':'NetworkTopologyStrategy','dc1':3,'dc2':3
} AND durable_writes = true;

3) Data Model: Mental Model

Think in partitions and clustering. The partition key decides which nodes store the data (via token ring), and clustering columns define on‑disk order inside a partition. Primary key = (partition_key[, clustering...]). Model tables by access pattern: one table per query shape, denormalize, and avoid unbounded partitions with time bucketing.

CREATE TABLE events (
  user_id uuid,
  day text,                 -- partition bucket (e.g., '2025-08-22')
  ts timeuuid,              -- clustering for ordering
  type text,
  payload text,
  PRIMARY KEY ((user_id, day), ts)
) WITH CLUSTERING ORDER BY (ts DESC);
-- Query pattern: by user & day, latest first

Rule of thumb: bound partitions (e.g., by day/month) and pre‑compute query shapes.

4) Storage Engine & Compaction

Writes go to the commit log and an in‑memory memtable; when flushed, they become immutable SSTables. Background compaction merges SSTables, purges tombstones past gc_grace_seconds, and reduces read amplification. Pick a compaction strategy by workload: STCS (general), LCS (read‑heavy, small SSTables), TWCS (time‑series with TTL).

ALTER TABLE events WITH compaction = {
  'class':'TimeWindowCompactionStrategy',
  'compaction_window_unit':'DAYS',
  'compaction_window_size':'1'
};

5) Cassandra vs RDBMS

RDBMS favor normalized schemas, joins, and strong consistency; Cassandra favors denormalization, query‑specific tables, and horizontal scale with tunable consistency. You trade ad‑hoc flexibility for predictable, low‑latency operations at scale. Use analytics engines (Spark/Presto) for heavy joins over snapshots, not the OLTP cluster.

-- No joins; pre‑compute
CREATE TABLE user_by_email (email text PRIMARY KEY, user_id uuid);
CREATE TABLE user_profile (
  user_id uuid PRIMARY KEY, email text, name text
);

6) CQL, Drivers & Tooling

CQL is SQL‑like for DDL/DML; use official drivers (Java/Python/Node/Go). cqlsh for queries, nodetool for operations. Always use prepared statements and token‑aware load balancing.

# cqlsh helpers
DESCRIBE KEYSPACES;
CONSISTENCY QUORUM;
CONSISTENCY LOCAL_QUORUM;  -- multi‑DC best practice

7) Keyspaces & Replication

A keyspace groups tables and defines replication. Use NetworkTopologyStrategy in production to specify per‑DC replication factors. Quorum (⌈RF/2⌉+1) reads/writes ensure no stale data when one replica is down. Keep RF ≥ 3 per DC for resilience.

CREATE KEYSPACE auth WITH replication = {
  'class':'NetworkTopologyStrategy','dc1':3
};

8) Releases & Compatibility

Run a stable GA release for production, test upgrades thoroughly in staging, and roll out per rack/DC. Stick to supported drivers matching your server version. Pin configs in code and infrastructure for reproducible clusters.

# Pin Docker image in infra code
cassandra:4.x
# Keep server & driver versions compatible

9) n‑node Clusters & Multi‑DC

Any node can serve requests. For multi‑DC, use LOCAL_QUORUM to avoid cross‑DC latency, and LOCAL_ONE for low‑latency reads when staleness is acceptable. Use rack‑aware snitches to spread replicas.

# Driver setting (pseudo)
loadBalancingPolicy: tokenAware(localDc="dc1")

10) Q&A — “If there’s no primary leader, how does Cassandra stay consistent?”

Answer: Replicas coordinate via quorum reads/writes and hinted handoff. Read paths can repair on read to fix divergence. With RF≥3 and QUORUM/LOCAL_QUORUM, overlapping replica sets ensure the latest write is seen even with node failures.

Section 2 — Core CQL & Architecture

11) Tables & Primary Keys

Primary key defines distribution and sort. Use composite keys to group related rows in the same partition and order them by clustering columns. Avoid large static columns unless needed for metadata.

CREATE TABLE orders (
  shop_id text,
  yyyymm text,
  created_at timeuuid,
  order_id uuid,
  amount decimal,
  PRIMARY KEY ((shop_id, yyyymm), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);

12) Read/Write Path

Write: coordinator → replicas (based on partitioner tokens) → commit log + memtable → flush to SSTable. Read: coordinator queries replicas (per CL), merges results using bloom filters, indexes, and summary, then returns the latest value by timestamp.

CONSISTENCY LOCAL_QUORUM;
INSERT INTO orders (shop_id, yyyymm, created_at, order_id, amount)
VALUES ('s1','202508','NOW()', uuid(), 19.99);

13) Consistency Levels

Per operation you pick CL: ALL, QUORUM, LOCAL_QUORUM, ONE, etc. For multi‑DC apps, use LOCAL_QUORUM. Mix LOCAL_QUORUM writes with LOCAL_QUORUM reads for strong guarantees; use ONE for latency‑critical but possibly stale reads.

CONSISTENCY ONE;           -- fastest
CONSISTENCY QUORUM;        -- stronger
CONSISTENCY LOCAL_QUORUM;  -- multi‑DC safe default

14) Compaction Strategies

STCS merges similarly‑sized SSTables; LCS organizes levels for read‑heavy workloads; TWCS uses time windows for TTL’d data. Choose per table; compaction is CPU/IO heavy — monitor and tune.

ALTER TABLE orders WITH compaction = {
  'class':'LeveledCompactionStrategy',
  'sstable_size_in_mb':'160'
};

15) Tombstones & GC Grace

Deletes create tombstones which are purged after compaction past gc_grace_seconds. Premature purging can resurrect data during repair; too many tombstones slow reads. Prefer TTL for expiry and avoid wide‑range deletes.

ALTER TABLE events WITH gc_grace_seconds = 86400; -- 1 day
DELETE FROM events WHERE user_id=? AND day=? AND ts=?;

16) Partitioner & Token Ring

Consistent hashing assigns token ranges to nodes (virtual nodes split ownership). The partition key’s hash decides placement; hot partitions overload specific nodes — prevent by adding a bucketing field or random salt.

-- Bucketing to avoid hot partition
PRIMARY KEY ((tenant_id, day_bucket), ts)

17) Gossip, Snitches & Topology

Gossip shares node liveness; snitches tell Cassandra about racks/DCs to place replicas. Use a rack‑aware snitch and consistent rack labels in config/infra. Topology drives RF and request routing.

# cassandra.yaml (snippet)
endpoint_snitch: GossipingPropertyFileSnitch

18) Hinted Handoff & Read Repair

Hints store missed writes when a replica is down, replayed on recovery. Read repair (coordinator‑driven) fixes divergent replicas during reads. Run anti‑entropy repair regularly for full consistency.

nodetool repair keyspace_name --full
nodetool status

19) RF, Quorum Math & Locality

With RF=3 per DC: QUORUM=2, so overlapping quorums guarantee the latest write is returned. Prefer LOCAL_* CLs to keep traffic inside a DC; cross‑DC is for replication, not read paths.

-- Example per‑DC RF
{'class':'NetworkTopologyStrategy','dc1':3,'dc2':3}

20) Q&A — “How do I choose a primary key?”

Answer: Start from queries. Group rows that you read together into the same partition (partition key) and order them via clustering columns that match your sort/filters. Keep partitions bounded (time buckets) and avoid high‑cardinality clustering that causes read amplification.

Section 3 — Consistency, Patterns & Operations

21) Query‑First Modeling

List each API screen/report and design tables for those exact access patterns. Denormalize across tables to serve different queries. Embrace duplication; storage is cheap, cross‑partition joins are not.

-- Feed by user
CREATE TABLE feed_by_user (
  user_id uuid, day text, ts timeuuid, item text,
  PRIMARY KEY ((user_id, day), ts)
);
-- Global feed (sharded)
CREATE TABLE feed_global (
  shard int, day text, ts timeuuid, item text,
  PRIMARY KEY ((shard, day), ts)
);

22) Materialized Views & Alternatives

Materialized Views can mirror data with a different primary key, but come with operational caveats. Many teams prefer manual fan‑out to multiple tables using batches for atomicity per partition, or CDC + stream processors to maintain derived views.

-- Alternative to MV: write to multiple tables in one go
BEGIN BATCH
  INSERT INTO feed_by_user ...;
  INSERT INTO feed_global ...;
APPLY BATCH;

23) Lightweight Transactions (LWT)

LWT (Paxos) provide compare‑and‑set semantics for rare conditional updates (idempotent upserts, uniqueness). They are slower than normal writes; use sparingly and keep partitions small.

-- Ensure unique username
INSERT INTO users (username, id) VALUES ('alice', uuid()) IF NOT EXISTS;

24) Batches: Logged vs Unlogged

Logged batches ensure atomicity across partitions at higher cost; unlogged batches are just a network optimization for same‑partition writes. Prefer small batches; don’t use as a bulk‑loader.

BEGIN UNLOGGED BATCH
  INSERT INTO events ...;
  INSERT INTO events ...;
APPLY BATCH;

25) Repairs

Run incremental repairs regularly to reconcile replicas; do full repairs after topology changes. Stagger by keyspace/table and monitor throughput. Repairs prevent tombstone resurrection and ensure durability.

nodetool repair ks_name --incremental
nodetool compactionstats

26) Scaling: Bootstrap & Decommission

Add nodes by bootstrapping, which streams data for token ranges; remove with decommission. In Kubernetes, use StatefulSets with proper seeds and rack labels; in VMs, automate with config management.

nodetool bootstrap
nodetool decommission
nodetool rebuild -- dc2

27) TTL & Expiration

Use TTL to auto‑expire rows (great for time‑series). Combine with TWCS for efficient compaction. Beware that expiring data creates tombstones until purged.

INSERT INTO sessions (user_id, id, created_at)
VALUES (uuid(), uuid(), toTimestamp(now())) USING TTL 86400;

28) Denormalization & Bucketing

Denormalize by writing to multiple tables and bucket by time to bound partitions. For ultra‑hot keys, add a shard or salt to the partition key and keep routing logic in the application.

PRIMARY KEY ((user_id, shard, yyyymmdd), ts)

29) Timeouts, Retries & Idempotency

Set driver timeouts, use exponential backoff with jitter, and ensure operations are idempotent. For writes, use idempotency keys (e.g., natural keys or timestamps) to avoid duplicates during retries.

# Pseudocode driver config
requestTimeout: 2000
retryPolicy: DefaultRetry (idempotent=true)

30) Q&A — “LWT or redesign?”

Answer: If the operation is rare and correctness requires uniqueness, LWT is fine. If it’s frequent or high‑throughput, redesign the data model (e.g., reservation tokens, time‑bucketed uniqueness) to avoid per‑write consensus.

Section 4 — Clients, Data & Integrations

31) Java Driver (DataStax)

Type‑safe, token‑aware, and DC‑aware by default. Use prepared statements, reuse sessions, and expose metrics. Configure local DC and pooling carefully.

// Java (pseudo)
CqlSession session = CqlSession.builder()
  .withKeyspace("app").withLocalDatacenter("dc1").build();
PreparedStatement ps = session.prepare("INSERT INTO t (k,v) VALUES (?,?)");
session.execute(ps.bind(k, v));

32) Python Driver

Simple and powerful. Set consistency_level, reuse Cluster/Session, and prefer prepared statements. Enable execution profiles for different CLs/timeouts.

from cassandra.cluster import Cluster, ExecutionProfile
from cassandra.policies import DCAwareRoundRobinPolicy
cluster = Cluster(['127.0.0.1'], load_balancing_policy=DCAwareRoundRobinPolicy("dc1"))
session = cluster.connect('app')
session.execute("INSERT INTO t (k,v) VALUES (%s,%s)", (1,'a'))

33) Node.js Driver

Use token‑aware load balancing and LOCAL_QUORUM. Reuse the client, prepare your statements, and batch only for small, related writes.

import cassandra from 'cassandra-driver';
const client = new cassandra.Client({ contactPoints:['127.0.0.1'], localDataCenter:'dc1', keyspace:'app' });
await client.execute('INSERT INTO t (k,v) VALUES (?,?)', [1,'a'], { prepare:true });

34) API Layers & GraphQL

Expose Cassandra via REST/GraphQL through a service layer. Enforce schema validation, idempotent writes, and pagination. For flexible queries, consider an API gateway that maps approved query shapes to prepared statements.

// Example pagination
SELECT * FROM events WHERE user_id=? AND day=? LIMIT 50;
-- Use paging state from driver for next page

35) CDC & Streaming

Enable CDC to capture row changes and stream them to Kafka/Flink for downstream processing (caches, search indexes). Keep CDC volumes manageable; filter at source and compact topics.

# cassandra.yaml (snippet)
cdc_enabled: true
# Stream CDC directory with an agent to Kafka

36) Time‑Series & IoT

Partition by device/site + time bucket; cluster by timestamp descending. Use TWCS and TTL for lifecycle. Keep wide partitions bounded (e.g., per day) to avoid hotspots and long repairs.

PRIMARY KEY ((device_id, yyyymmdd), ts)

37) Spark & Analytics

For analytical joins/aggregations, use the Spark‑Cassandra connector to read snapshots with proper split sizes. Do not overload the OLTP cluster; use a separate analytics cluster or schedule off‑peak.

spark.read
  .format("org.apache.spark.sql.cassandra")
  .options(table="events", keyspace="app")
  .load()

38) Caching Strategy

Use client‑side caches for hot keys and Redis for derived views. Tune Cassandra’s row/key caches cautiously; most workloads benefit more from OS page cache and query redesign.

-- Consider cache‑friendly shapes
SELECT ... WHERE partition_key=? LIMIT 20;

39) Backup & Restore

Take snapshots (hard links) and ship SSTables to durable storage. For restores or migrations, use sstableloader to stream data into a fresh cluster. Verify backups with periodic test restores.

nodetool snapshot app
# restore: sstableloader -d <hosts> data/<ks>/<table>/snapshots/<id>

40) Q&A — “Cassandra vs DynamoDB/Bigtable?”

Answer: All are wide‑column/partitioned stores. Cassandra is open‑source, self‑managed, multi‑DC with tunable consistency; DynamoDB is managed with global tables and serverless ergonomics; Bigtable excels at huge throughput with a managed HBase‑like model. Choose based on control, latency locality, and ops preferences.

Section 5 — Security, Testing, Deployment, Observability & Interview Q&A

41) Security Fundamentals

Enable authentication and authorization, enforce TLS in‑transit, restrict nodes via firewalls/SGs, and rotate credentials. Use roles with least privilege and audit schema changes. Prefer private networking between app and DB.

-- Role example
CREATE ROLE app_rw WITH LOGIN = true AND PASSWORD = '***' AND SUPERUSER = false;
GRANT MODIFY, SELECT ON KEYSPACE app TO app_rw;

42) Multi‑Tenancy

Isolate tenants by keyspace or by partition key. For strict isolation, separate clusters or DCs; for soft isolation, use tenant‑scoped partitions and role‑based access. Encrypt at rest and monitor per‑tenant quotas.

PRIMARY KEY ((tenant_id, bucket), ts)

43) Testing Strategy

Use local single‑node clusters (Docker/ccm) for unit/integration tests. Seed data, run schema migrations, and verify CL/timeout behaviors. For load tests, spin ephemeral multi‑node clusters to test compaction and repair under stress.

# Example docker‑compose service
cassandra:
  image: cassandra:4
  ports: ["9042:9042"]

44) Schema Management

Version DDL alongside app code. Apply forward‑only migrations with back‑compat readers to allow rolling deploys. Avoid disruptive changes (e.g., primary key changes) — create new tables and dual‑write, then backfill.

-- Forward‑only: add a column
ALTER TABLE orders ADD currency text;

45) Performance & Tuning

Measure latency percentiles, monitor pending compactions, and track GC. Tune heap, off‑heap, and memtable sizes; avoid swap. Use async compaction, proper disk types, and disable swap. Fix data model hot spots before scaling hardware.

nodetool tpstats
nodetool tablehistograms app events

46) Deployment Options

Run on VMs/bare metal for predictable IO, on Kubernetes with StatefulSets and local SSDs, or choose managed/serverless variants. Implement readiness/liveness probes, rack/DC labels, and rolling node restarts with drain.

# K8s hints
podAntiAffinity: required
readinessProbe: nodetool statusbinary

47) Observability

Scrape JMX/metrics to Prometheus, build dashboards for latency, saturation, and errors. Ship logs centrally, tag by DC/rack/node. Alert on repair lag, dropped mutations, and pending compactions. Expose /health on app side and use nodetool status for DB state.

# Prometheus JMX exporter sidecar (conceptually)
- targets: ['cass-0:7070','cass-1:7070']

48) Prod Checklist

  • RF≥3 per DC, LOCAL_QUORUM by default
  • Bounded partitions & time buckets
  • TWCS for TTL’d time‑series tables
  • Regular incremental repairs
  • Snapshots & tested restores
  • Dashboards & runbooks for incidents

49) Common Pitfalls

Unbounded partitions, hot keys, overuse of secondary indexes, giant logged batches, skipping repairs, and deleting large ranges (tombstone storms). Avoid by query‑first modeling, bucketing, and steady maintenance.

50) Interview Q&A — 20 Practical Questions (Expanded)

1) Why Cassandra for write‑heavy workloads? Peer‑to‑peer design, append‑optimized storage, and linear scale.

2) How does tunable consistency work? Choose CL per op (e.g., LOCAL_QUORUM) to balance latency and consistency.

3) Primary vs clustering key? Primary defines distribution; clustering defines on‑disk order within a partition.

4) Avoiding hot partitions? Add time buckets/shards; distribute with salts when necessary.

5) When to use LWT? Rare uniqueness/compare‑and‑set needs; keep payloads small.

6) What are tombstones? Markers for deletes/expirations; too many slow reads until purged.

7) Repair vs read repair? Scheduled anti‑entropy vs opportunistic fixes during reads.

8) Compaction choice? STCS general, LCS read‑heavy, TWCS time‑series with TTL.

9) Multi‑DC best practice? LOCAL_QUORUM + per‑DC RF and rack‑aware snitches.

10) Secondary indexes? Use sparingly; prefer SASI or SAI where available, or model new tables.

11) Pagination? Use driver paging state; avoid large offsets or ALLOW FILTERING.

12) Backups? Snapshots + offsite copy; verify with test restores.

13) Batch usage? Small groups of related writes; avoid as bulk importer.

14) GC & memory? Tune heap/off‑heap; monitor GC pauses; avoid giant partitions.

15) Read latency spikes? Check compaction backlog, tombstones, and OS cache misses.

16) Data modeling anti‑patterns? Unbounded partitions, random clustering orders, ALLOW FILTERING reliance.

17) Node failures? Hinted handoff, repairs, and quorum keep service healthy.

18) Observability must‑haves? p99 read/write, pending compactions, dropped mutations, repair lag.

19) Rolling upgrades? One rack/node at a time, drain/stop, upgrade, verify, proceed.

20) When not to use Cassandra? Heavy multi‑row transactions, ad‑hoc joins/OLAP inside OLTP cluster.