MongoDB Pocket Book — Uplatz
50 in-depth cards • Wide layout • Readable examples • 20-Q interview block included
1) What is MongoDB?
MongoDB is a document-oriented NoSQL database storing JSON-like BSON documents. It favors flexible schemas, horizontal scaling via sharding, and rich secondary indexes, with a powerful aggregation framework and transactions (multi-document ACID).
# shell
mongosh "mongodb://localhost:27017"
2) Core Concepts
Hierarchy: Database → Collections → Documents. Each document has an _id
primary key (ObjectId by default). Schemas are dynamic unless validated. CRUD via drivers or mongosh
.
db.users.insertOne({ name:"Ava", age:29, skills:["mongo","node"] })
3) BSON & Types
BSON adds types beyond JSON: ObjectId
, Date
, Decimal128
, Binary
, Timestamp
. Choose Decimal128
for precise monetary values; Date
for ISO dates stored in UTC.
db.tx.insertOne({ amount: NumberDecimal("12.34"), ts: new Date() })
4) Read & Write Concerns
Write concern controls durability (w
, j
, wtimeout
). Read concern controls consistency (local
, majority
, linearizable
, snapshot
in txns). Tune per operation.
db.orders.insertOne(doc, { writeConcern: { w:"majority", j:true } })
5) Replica Sets
Replica set = primary + secondaries for HA. Primary handles writes; secondaries replicate via the oplog. Elections pick a new primary on failure. Reads can target secondaries with read preferences.
rs.initiate({ _id:"rs0", members:[{_id:0,host:"n1"},{_id:1,host:"n2"}] })
6) Sharding
Sharding distributes data across shards (each a replica set). A mongos
router routes queries. Pick a good shard key (high cardinality, good distribution). Zones can pin data by range for locality/regulations.
sh.enableSharding("shop"); sh.shardCollection("shop.orders",{ region:1, _id:1 })
7) Storage Engine (WiredTiger)
WiredTiger provides document-level concurrency, compression, checkpoints, and journaling. Configure cache size (~50% RAM default). Compression reduces I/O at slight CPU cost.
storage.wiredTiger.engineConfig.cacheSizeGB: 8
8) Index Basics
Indexes speed reads. Default _id
index exists. Create compound indexes for common query patterns; order matters. Cardinality, selectivity, and sort patterns drive index design.
db.users.createIndex({ email:1 }, { unique:true })
9) Aggregation Framework
Pipeline stages ($match
, $group
, $project
, $sort
, $lookup
, $facet
) enable analytical queries without ETL. Many ops push down to indexes.
db.sales.aggregate([{ $match:{region:"EU"} },{ $group:{ _id:"$sku", rev:{ $sum:"$amount"} } }])
10) Q&A — “MongoDB vs RDBMS?”
Answer: MongoDB trades rigid schemas and joins for flexible documents, fast iteration, and horizontal scale. It supports ACID transactions when needed but shines with denormalized schemas optimized for app reads.
11) Embedding vs Referencing
Embed when data is accessed together and bounded in size (product + reviews snapshot). Reference when many-to-many, unbounded growth, or reuse across documents. Optimize for read patterns.
// Embed example
{ _id:1, name:"Post", comments:[{ by:"u1", text:"hi" }] }
12) Schema Design Rule
“Data that is accessed together should be stored together.” Model queries first. Precompute aggregates if needed. Avoid “3rd NF” mindset—join costs can be high.
// Orders keep buyer snapshot to avoid cross-collection joins
13) Document Growth & Padding
Growing documents may trigger move/rewrite. Avoid unbounded arrays; use bucketing (e.g., monthly activity docs) or $push
with $slice
.
db.posts.updateOne({_id:1}, { $push:{ comments:{ $each:[c], $slice:-100 } } })
14) CRUD Examples
Use operators: $set
, $inc
, $push
, $pull
, $addToSet
. Upserts combine insert + update.
db.users.updateOne({ email }, { $set:{ name }, $inc:{ logins:1 } }, { upsert:true })
15) Array Operators
Use $elemMatch
, positional $
/$[]
to update matching elements. $size
, $slice
for reads. Consider $addToSet
to avoid duplicates.
db.c.updateOne({ _id:1, "tags":"js" }, { $set:{ "tags.$":"javascript" } })
16) Validation
JSON Schema validation at collection level ensures shape and ranges. Use validationAction
warn/strict.
db.createCollection("users",{ validator:{ $jsonSchema:{ bsonType:"object", required:["email"] }}})
17) Bulk Writes
Batch operations with bulkWrite
reduce round-trips and improve throughput. Choose ordered/unordered by failure semantics.
db.users.bulkWrite([{ insertOne:{ document:{email:"a"}} }, { updateOne:{ filter:{email:"a"}, update:{ $set:{name:"A"}}}}])
18) Transactions
Multi-document ACID transactions across a replica set or sharded cluster. Use for truly atomic multi-collection updates; keep them short.
const s = db.getMongo().startSession(); s.startTransaction();
try { s.getDatabase("shop").orders.insertOne(o); s.commitTransaction(); } catch(e){ s.abortTransaction(); }
19) Change Streams
Subscribe to real-time data changes from the oplog—great for triggers, caching, search indexing. Requires replica set.
db.collection.watch([{ $match:{ "operationType":{ $in:["insert","update"]}} }])
20) Q&A — “When to use transactions?”
Answer: When invariants span multiple documents/collections (e.g., money transfer, inventory + order). Otherwise prefer single-document atomic updates with operators for better performance.
21) Compound Indexes
Index order matters: prefix fields must match query’s sort/filter. A query on leftmost fields can use the index. Design around your most common $match
+ $sort
.
db.orders.createIndex({ userId:1, createdAt:-1 })
22) Partial & Sparse Index
Partial indexes only include docs matching an expression—smaller/faster. Sparse only index docs where the field exists.
db.users.createIndex({ "email":1 }, { partialFilterExpression:{ verified:true } })
23) TTL & Expire
TTL indexes automatically delete docs after a time-to-live or at a date field. Great for sessions, temp data.
db.sessions.createIndex({ "expiresAt":1 }, { expireAfterSeconds:0 })
24) Text & Atlas Search
Built-in text index supports stemming/scores; Atlas Search (Lucene-based) offers rich full-text, facets, and relevance tuning (managed in Atlas).
db.articles.createIndex({ content:"text", title:"text" })
25) Geospatial
2dsphere indexes for GeoJSON (Point
, Polygon
); queries like $near
, $geoWithin
. Store coords as [lng, lat].
db.places.createIndex({ loc:"2dsphere" })
db.places.find({ loc:{ $near:{ $geometry:{ type:"Point", coordinates:[-0.1,51.5] }, $maxDistance:5000 }}})
26) Aggregation Patterns
Use $match
early, $project
to reduce fields, and $group
for rollups. $lookup
for joins (beware big fan-out). $facet
runs multiple pipelines in one pass.
db.sales.aggregate([
{ $match:{ ts:{ $gte:ISODate("2025-01-01") }}},
{ $group:{ _id:"$sku", qty:{ $sum:"$qty" }, rev:{ $sum:"$amount" } }},
{ $sort:{ rev:-1 } }, { $limit:10 }
])
27) Performance Explain
Use explain("executionStats")
to inspect index use, scanned vs returned, stage tree. Fix COLLSCANs, add/adjust indexes.
db.orders.find({ userId:1 }).sort({ createdAt:-1 }).explain("executionStats")
28) Aggregation Windows
Window operators ($setWindowFields
) compute moving averages, ranks, etc., over partitions—handy for analytics without exporting to Spark.
db.m.find().aggregate([{ $setWindowFields:{ partitionBy:"$userId", sortBy:{ts:1}, output:{ runAvg:{ $avg:"$v", window:{ documents:[-5,0] }}}}}])
29) Faceted Search
Build search pages with counts per facet using $facet
to compute buckets in one pipeline, reducing extra round-trips.
db.products.aggregate([{ $match:{ inStock:true } },{ $facet:{ brands:[{ $sortByCount:"$brand"}], price:[{ $bucket:{ groupBy:"$price", boundaries:[0,50,100,200,1000] }}] }}])
30) Q&A — “Compound index order?”
Answer: Put equality fields first, then sort/range fields. The index must support your $sort
order; otherwise MongoDB sorts in memory. Design around your top N query patterns.
31) Connection Pooling
Use driver pools; avoid creating clients per request. Configure timeouts and keep-alive. Monitor connection counts to primary/secondaries.
// Node.js
import { MongoClient } from "mongodb";
const client = new MongoClient(uri, { maxPoolSize: 50 });
32) Write Patterns
Use single-document atomic ops. Avoid read-modify-write races by leveraging operators like $inc
and $currentDate
. Apply idempotency keys for at-least-once inputs.
db.counters.updateOne({ _id:"order" }, { $inc:{ seq:1 } }, { upsert:true })
33) Hot Partitions & Shard Keys
Avoid monotonically increasing shard keys (e.g., timestamp) that hotspot the last chunk. Use hashed shard keys or compound keys with good distribution and query targeting.
sh.shardCollection("app.events", { userId:"hashed" })
34) Backups & PITR
Use Cloud/Atlas backups or filesystem snapshots with consistent checkpoints. Oplog-based Point-in-Time Restore (PITR) enables recovering to a timestamp.
# Atlas: continuous backups + PITR window (UI/CLI)
35) Monitoring
Watch Opcounters, page faults, queue lengths, replication lag, WT cache pressure, and slow queries. Enable profiler at low sampling for hotspots.
db.setProfilingLevel(1, 100) // log ops slower than 100ms
36) Security Basics
Enable auth, enforce TLS, restrict network, use SCRAM users with least privilege, rotate keys. Encrypt at rest; use field-level encryption for PII where needed.
# mongod.conf
security:
authorization: enabled
37) Read/Write Patterns at Scale
Batch writes, use unordered bulk for higher throughput, set appropriate write concern. For reads, use projections to return only needed fields and indexes that support sort.
db.users.find({active:true}, { projection:{ email:1, name:1 }})
38) Caching & Secondary Reads
Cache hot reads at app layer. For replicas, use readPreference: secondary
for non-critical analytics; understand eventual consistency.
// Node.js
coll.find(q, { readPreference: "secondaryPreferred" })
39) Multi-Tenancy
Per-tenant databases/collections or a shared collection with tenantId
in the shard key and unique index partials. Enforce via app and (in Atlas) data access controls.
db.items.createIndex({ tenantId:1, key:1 }, { unique:true, partialFilterExpression:{ tenantId:{ $exists:true }}})
40) Q&A — “Why is my query slow?”
Answer: Missing/inefficient indexes, low selectivity, in-memory sort, large projections, or sharding causing scatter-gather. Fix with targeted compound indexes, projections, and query routing to the correct shard.
41) Pagination
Prefer range-based (bookmark) pagination over skip/limit
for large offsets. Use an indexed field (e.g., _id
or createdAt
).
db.posts.find({ _id:{ $gt:lastId } }).sort({ _id:1 }).limit(20)
42) Unique Constraints
Enforce uniqueness with unique indexes; for scoped uniqueness use compound or partial unique indexes.
db.users.createIndex({ tenantId:1, email:1 }, { unique:true })
43) Upsert Idempotency
Design idempotent upserts using natural keys and $setOnInsert
. Prevent duplicates in retries.
db.payments.updateOne({ extId }, { $set:{status:"paid"}, $setOnInsert:{ createdAt:new Date() } }, { upsert:true })
44) Soft Deletes
Use deletedAt
timestamps and partial indexes that ignore deleted docs, or move to an archive collection.
db.items.createIndex({ email:1 }, { unique:true, partialFilterExpression:{ deletedAt:{ $exists:false }}})
45) Pre-Aggregations
Materialize aggregates (daily rollups) into separate collections via scheduled jobs/change streams to speed dashboards.
// write to reports.daily_sales { day, sku, qty, rev }
46) Time-Series Collections
Use time-series collections for telemetry: automatic bucketing, compressed storage, and optimized queries on time/metadata.
db.createCollection("metrics",{ timeseries:{ timeField:"ts", metaField:"host", granularity:"minutes" }})
47) Large Files (GridFS)
For files >16MB, use GridFS to chunk and store; otherwise prefer object storage (S3) and keep only metadata/URLs in MongoDB.
# GridFS via drivers (fs.files, fs.chunks)
48) Common Pitfalls
Designing schemas like SQL, unbounded arrays, skip/limit deep pages, scatter-gather queries, monotonous shard keys, and missing TTLs for ephemeral data.
// Fix with embedding, bucketing, range pagination, hashed/compound shard keys
49) Mini Checklist
- Model around queries & workloads
- Compound indexes support filter+sort
- Use projections, avoid large docs
- Backups + PITR tested
- Auth/TLS + least privilege
50) Interview Q&A — 20 Practical Questions (Expanded)
1) Embedding vs referencing? Embed when data is accessed together and bounded; reference for many-to-many/unbounded or shared entities.
2) How to pick a shard key? High cardinality, even write distribution, and query targeting. Avoid hotspots; consider hashed or compound keys.
3) Write concern vs read concern? Write concern = durability guarantees; read concern = consistency level. Tune per op/txn.
4) When use transactions? Only when invariants span multiple documents/collections; keep txns short to avoid lock contention.
5) Indexes for sort + filter? Compound with equality fields first, then sort/range fields, matching sort order.
6) Avoid skip/limit? Use range pagination with indexed fields; skip
scans O(n).
7) Handle schema changes? Use schema versioning and migration scripts; keep validators lenient during rollout.
8) Why COLLSCAN? No matching index, low selectivity, or expression not supported by index. Add/rewrite index or reshape query.
9) Partial index use? Smaller, faster indexes when you only query a subset (e.g., active:true
).
10) Change streams use cases? Cache invalidation, search indexing, CDC to warehouses, reactive systems.
11) Atlas Search vs text index? Atlas Search offers Lucene features, scoring, facets, and better relevance; text index is basic.
12) Time-series benefits? Compressed storage, automatic bucketing, faster time-bounded queries.
13) Monotonic shard key problem? Hot chunk receives all new writes; fix with hashed or compound shard keys.
14) Ensure idempotency? Natural keys + $setOnInsert
, dedupe tokens, and unique indexes.
15) Geospatial basics? 2dsphere index, GeoJSON types, queries like $near
/$geoWithin
.
16) Profiler usage? Capture slow ops to tune indexes and queries; don’t leave at level 2 in prod.
17) Field-level encryption? Encrypt sensitive fields client-side; server stores ciphertext; keys via KMS.
18) Bulk vs single writes? Bulk reduces round-trips and improves throughput; unordered is fastest but partial failures need handling.
19) Aggregation optimization? $match
early, projections, index-friendly operators, avoid unwinds on huge arrays without filters.
20) Replica set lag? Network/IO pressure; tune write concern, optimize secondaries, and monitor oplog window.