Azure Blob Storage Pocket Book — Uplatz
50+ deep-dive flashcards • Single column • Data lake & object storage • Security & networking • Performance & cost • Interview Q&A
Cheat-friendly explanations • Readable CLI/SDK snippets • Production-oriented tips
1) What is Azure Blob Storage?
Object storage for massive unstructured data—backups, logs, media, ML datasets, and data lakes (ADLS Gen2). Pay by capacity + operations + egress.
# Login & list accounts
az login
az storage account list -o table
2) Accounts, Containers, Blobs
Storage account (namespace) → container (folder-like) → blob (object). Blob types: Block (uploads/downloads), Append (log-style), Page (random I/O, disks).
3) Tiers: Hot, Cool, Archive
Hot = frequent access; Cool = infrequent (lower storage, higher access); Archive = offline, cheapest, requires rehydration (hours).
az storage blob set-tier --account-name mystg --container logs --name 2025-01.json --tier Cool
4) Redundancy (Replication)
LRS (single zone), ZRS (multi-zone), GRS (paired region), GZRS (zone+geo), RA-GRS/RA-GZRS (readable secondary). Choose based on RTO/RPO.
5) Encryption
SSE with Microsoft-managed keys by default. Use CMK (Key Vault) for control + rotation. Client-side encryption for extra assurance.
6) ADLS Gen2
Enables hierarchical namespace, POSIX ACLs, abfss://
paths, and big data engines (Spark/Synapse). Ideal for data lakes.
7) Auth Options
Azure AD (recommended), Shared Key (account key), SAS tokens (scoped+time-bound). Prefer Azure AD + RBAC for least privilege.
8) RBAC Roles
Storage Blob Data Reader/Contributor/Owner
roles govern data plane access with Azure AD. Assign to users, groups, or managed identities.
az role assignment create \
--assignee user@contoso.com \
--role "Storage Blob Data Reader" \
--scope /subscriptions/<subId>/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/mystg
9) SAS Types
Service SAS (blob/container), Account SAS (multiple services), User Delegation SAS (Azure AD + short-lived). Avoid long-lived SAS.
az storage blob generate-sas \
--account-name mystg --container data --name file.csv \
--permissions r --expiry 2025-08-31T23:59Z --https-only
10) Network Hardening
Private endpoints (VNet), storage firewall (selected networks), service endpoints, and “Allow trusted Microsoft services”.
11) Immutability & Legal Hold
Time-based retention (WORM) or legal hold on containers for compliance—prevents deletion/overwrite until expiry/removal.
12) Soft Delete & Versioning
Enable soft delete for blobs and containers; use versioning for object history/rollback.
az storage account blob-service-properties update \
--account-name mystg --enable-delete-retention true --delete-retention-days 30
13) Block Blob Uploads
Parallel, resumable via blocks (up to 50k blocks). Tune block size for throughput (e.g., 8–100MB).
az storage blob upload --account-name mystg -c data -f big.parquet -n big.parquet --type block
14) Append Blobs
Optimized for log ingestion (append-only). Great for streaming logs or incremental ETL writes.
15) Page Blobs
Random read/write in 512-byte pages—used for VHDs and premium storage scenarios.
16) Snapshots & Point-in-Time
Create snapshots for quick backups/rollbacks; incremental copy works across snapshots.
az storage blob snapshot --account-name mystg -c data -n model.bin
17) Blob Index Tags
Key/value tags enable fast filtering and lifecycle actions without reading object content.
az storage blob tag set --account-name mystg -c raw -n 2025/08/file.json --tags "country=DE" "pii=false"
18) Change Feed & Events
Immutable log of blob changes for audit/ETL. Use Event Grid for reacts-to-changes automation (functions, Logic Apps).
19) Python SDK
pip install azure-storage-blob
from azure.storage.blob import BlobServiceClient
svc = BlobServiceClient.from_connection_string(cs)
client = svc.get_blob_client("data","file.csv")
with open("file.csv","rb") as f: client.upload_blob(f, overwrite=True)
20) Node.js SDK
npm i @azure/storage-blob
const { BlobServiceClient } = require("@azure/storage-blob");
const client = BlobServiceClient.fromConnectionString(process.env.AZURE_STORAGE_CONNECTION_STRING);
const block = client.getContainerClient("data").getBlockBlobClient("out.json");
await block.upload(JSON.stringify({ok:true}), Buffer.byteLength(JSON.stringify({ok:true})));
21) .NET SDK
var svc = new BlobServiceClient(connectionString);
var blob = svc.GetBlobContainerClient("reports").GetBlobClient("y2025.pdf");
await blob.UploadAsync(stream, overwrite:true);
22) REST Tips
Use conditional headers (If-Match
/If-None-Match
) to avoid overwrites; leases for exclusive updates; Copy Blob
for server-side copies.
23) AzCopy
High-throughput copy/mirror tool—parallel, checksummed, resumable.
azcopy copy "C:\data\*.parquet" "https://mystg.blob.core.windows.net/raw?sas" --recursive=true
24) Static Website Hosting
Serve SPA/static assets from Blob; upload to $web
container, set index + error docs.
az storage blob service-properties update --account-name mystg --static-website --index-document index.html
25) Throughput Best Practices
Parallel block uploads, larger block size, reuse clients, enable TCP Keepalive
, colocate compute and storage regions.
26) Lifecycle Management
Policy engine to tier/expire by age, tags, or prefix—huge cost saver.
az storage account management-policy create --account-name mystg --policy @policy.json
// policy.json: move raw/ to Cool after 30d; delete after 365d
27) Archive & Rehydrate
Archive cuts storage cost; reads require rehydration to Hot/Cool (Std or High priority). Plan lead time.
28) Premium Blob
For low-latency workloads (small object reads, transactional). Page blobs also have Premium tier for disks.
29) Object Replication (OR)
Async copy of block blobs between accounts/regions with rules. Great for cross-region DR and data pipelines.
30) Cost Levers
Choose right tier, leverage lifecycle, minimize transactions (batch ops), compress, use CDN for hot public reads, avoid unnecessary egress.
31) Hierarchical Namespace
Directories and files become first-class; rename/move are atomic; works with POSIX ACLs for granular access.
32) POSIX ACLs
Set rwx
on dirs/files for users/groups; combine with RBAC. Great for multi-team data lakes.
az storage fs access set --account-name mystg --path "curated/finance" --acl "user::rwx,group::r-x,other::---"
33) Big Data Engines
Use abfss://
with Spark/Synapse/Databricks for high-throughput analytics. Prefer parquet/delta formats.
34) Optimizing Layout
Partition by date/region; compact small files; use append-friendly staging then compact to columnar (Parquet/Delta).
35) Data Governance
Pair with Purview for catalogs/lineage; use tags + ACLs + access reviews; secure PROD with private endpoints.
36) Multi-Environment Strategy
Separate accounts per env; use object replication or pipelines for promotion; lock config with IaC (Bicep/Terraform).
37) Metrics & Logs
Enable Storage Insights to Log Analytics; analyze availability, latency, throttling, and egress.
// KQL example: high latency operations
StorageBlobLogs
| where TimeGenerated > ago(1h) and ServerLatencyMs > 500
| summarize count() by OperationName
38) Resilience
Use retries with backoff, idempotent uploads, leases for concurrency, and test DR (GRS/GZRS + failover).
39) Security Monitoring
Alert on anonymous access, large egress spikes, SAS misuse, or frequent 403/409. Stream to Sentinel for detections.
40) Data Protection Routine
Confirm versioning + soft delete; periodic restore drills; export policy/ACL baselines; lock critical containers.
41) Cross-Account Copy
Prefer server-side Copy Blob
or AzCopy S2S (no download through client); preserve tier/metadata when feasible.
42) CORS
Configure CORS for browser apps—restrict origins/verbs/headers tightly.
43) Pattern — Data Landing Zone
Ingest to raw (immutable, tagged), validate → curated, then publish gold. Enforce ACLs and lifecycle at each layer.
44) Pattern — Static Site + CDN
Host SPA in $web
, front with Azure CDN for caching/SSL/custom domain, use versioned releases for instant rollback.
45) Pattern — Secure Sharing
Prefer user-delegation SAS (short TTL, least permissions) delivered via your API; log every issuance; rotate keys regularly.
46) Interview — Block vs Append vs Page?
Answer: Block for general files/parallel upload; Append for log-style appends; Page for random I/O (VHD).
47) Interview — ADLS Gen2 advantages?
Answer: Hierarchical namespace, atomic rename, POSIX ACLs, and big-data compatibility (abfss) for analytics at scale.
48) Interview — Reduce costs quickly?
Answer: Lifecycle to Cool/Archive, compress & compact small files, limit transactions, enable CDN for hot public content, delete stale versions.
49) Interview — SAS best practices?
Answer: Prefer user-delegation SAS, minimal permissions, short TTL, IP restrictions, HTTPS only, log/alert issuance.
50) Interview — When to pick Premium?
Answer: Low-latency workloads, lots of small object transactions, or page blob scenarios needing consistent IOPS.