Azure Blob Storage Pocket Book


Azure Blob Storage Pocket Book — Uplatz

50+ deep-dive flashcards • Single column • Data lake & object storage • Security & networking • Performance & cost • Interview Q&A

Cheat-friendly explanations • Readable CLI/SDK snippets • Production-oriented tips

Section 1 — Fundamentals

1) What is Azure Blob Storage?

Object storage for massive unstructured data—backups, logs, media, ML datasets, and data lakes (ADLS Gen2). Pay by capacity + operations + egress.

# Login & list accounts
az login
az storage account list -o table

2) Accounts, Containers, Blobs

Storage account (namespace) → container (folder-like) → blob (object). Blob types: Block (uploads/downloads), Append (log-style), Page (random I/O, disks).

3) Tiers: Hot, Cool, Archive

Hot = frequent access; Cool = infrequent (lower storage, higher access); Archive = offline, cheapest, requires rehydration (hours).

az storage blob set-tier --account-name mystg --container logs --name 2025-01.json --tier Cool

4) Redundancy (Replication)

LRS (single zone), ZRS (multi-zone), GRS (paired region), GZRS (zone+geo), RA-GRS/RA-GZRS (readable secondary). Choose based on RTO/RPO.

5) Encryption

SSE with Microsoft-managed keys by default. Use CMK (Key Vault) for control + rotation. Client-side encryption for extra assurance.

6) ADLS Gen2

Enables hierarchical namespace, POSIX ACLs, abfss:// paths, and big data engines (Spark/Synapse). Ideal for data lakes.

Section 2 — Access & Security

7) Auth Options

Azure AD (recommended), Shared Key (account key), SAS tokens (scoped+time-bound). Prefer Azure AD + RBAC for least privilege.

8) RBAC Roles

Storage Blob Data Reader/Contributor/Owner roles govern data plane access with Azure AD. Assign to users, groups, or managed identities.

az role assignment create \
  --assignee user@contoso.com \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/<subId>/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/mystg

9) SAS Types

Service SAS (blob/container), Account SAS (multiple services), User Delegation SAS (Azure AD + short-lived). Avoid long-lived SAS.

az storage blob generate-sas \
  --account-name mystg --container data --name file.csv \
  --permissions r --expiry 2025-08-31T23:59Z --https-only

10) Network Hardening

Private endpoints (VNet), storage firewall (selected networks), service endpoints, and “Allow trusted Microsoft services”.

11) Immutability & Legal Hold

Time-based retention (WORM) or legal hold on containers for compliance—prevents deletion/overwrite until expiry/removal.

12) Soft Delete & Versioning

Enable soft delete for blobs and containers; use versioning for object history/rollback.

az storage account blob-service-properties update \
  --account-name mystg --enable-delete-retention true --delete-retention-days 30

Section 3 — Data Model & Features

13) Block Blob Uploads

Parallel, resumable via blocks (up to 50k blocks). Tune block size for throughput (e.g., 8–100MB).

az storage blob upload --account-name mystg -c data -f big.parquet -n big.parquet --type block

14) Append Blobs

Optimized for log ingestion (append-only). Great for streaming logs or incremental ETL writes.

15) Page Blobs

Random read/write in 512-byte pages—used for VHDs and premium storage scenarios.

16) Snapshots & Point-in-Time

Create snapshots for quick backups/rollbacks; incremental copy works across snapshots.

az storage blob snapshot --account-name mystg -c data -n model.bin

17) Blob Index Tags

Key/value tags enable fast filtering and lifecycle actions without reading object content.

az storage blob tag set --account-name mystg -c raw -n 2025/08/file.json --tags "country=DE" "pii=false"

18) Change Feed & Events

Immutable log of blob changes for audit/ETL. Use Event Grid for reacts-to-changes automation (functions, Logic Apps).

Section 4 — SDKs & APIs

19) Python SDK

pip install azure-storage-blob
from azure.storage.blob import BlobServiceClient
svc = BlobServiceClient.from_connection_string(cs)
client = svc.get_blob_client("data","file.csv")
with open("file.csv","rb") as f: client.upload_blob(f, overwrite=True)

20) Node.js SDK

npm i @azure/storage-blob
const { BlobServiceClient } = require("@azure/storage-blob");
const client = BlobServiceClient.fromConnectionString(process.env.AZURE_STORAGE_CONNECTION_STRING);
const block = client.getContainerClient("data").getBlockBlobClient("out.json");
await block.upload(JSON.stringify({ok:true}), Buffer.byteLength(JSON.stringify({ok:true})));

21) .NET SDK

var svc = new BlobServiceClient(connectionString);
var blob = svc.GetBlobContainerClient("reports").GetBlobClient("y2025.pdf");
await blob.UploadAsync(stream, overwrite:true);

22) REST Tips

Use conditional headers (If-Match/If-None-Match) to avoid overwrites; leases for exclusive updates; Copy Blob for server-side copies.

23) AzCopy

High-throughput copy/mirror tool—parallel, checksummed, resumable.

azcopy copy "C:\data\*.parquet" "https://mystg.blob.core.windows.net/raw?sas" --recursive=true

24) Static Website Hosting

Serve SPA/static assets from Blob; upload to $web container, set index + error docs.

az storage blob service-properties update --account-name mystg --static-website --index-document index.html

Section 5 — Performance, Cost & Lifecycle

25) Throughput Best Practices

Parallel block uploads, larger block size, reuse clients, enable TCP Keepalive, colocate compute and storage regions.

26) Lifecycle Management

Policy engine to tier/expire by age, tags, or prefix—huge cost saver.

az storage account management-policy create --account-name mystg --policy @policy.json
// policy.json: move raw/ to Cool after 30d; delete after 365d

27) Archive & Rehydrate

Archive cuts storage cost; reads require rehydration to Hot/Cool (Std or High priority). Plan lead time.

28) Premium Blob

For low-latency workloads (small object reads, transactional). Page blobs also have Premium tier for disks.

29) Object Replication (OR)

Async copy of block blobs between accounts/regions with rules. Great for cross-region DR and data pipelines.

30) Cost Levers

Choose right tier, leverage lifecycle, minimize transactions (batch ops), compress, use CDN for hot public reads, avoid unnecessary egress.

Section 6 — Data Lake (ADLS Gen2) Deep Dive

31) Hierarchical Namespace

Directories and files become first-class; rename/move are atomic; works with POSIX ACLs for granular access.

32) POSIX ACLs

Set rwx on dirs/files for users/groups; combine with RBAC. Great for multi-team data lakes.

az storage fs access set --account-name mystg --path "curated/finance" --acl "user::rwx,group::r-x,other::---"

33) Big Data Engines

Use abfss:// with Spark/Synapse/Databricks for high-throughput analytics. Prefer parquet/delta formats.

34) Optimizing Layout

Partition by date/region; compact small files; use append-friendly staging then compact to columnar (Parquet/Delta).

35) Data Governance

Pair with Purview for catalogs/lineage; use tags + ACLs + access reviews; secure PROD with private endpoints.

36) Multi-Environment Strategy

Separate accounts per env; use object replication or pipelines for promotion; lock config with IaC (Bicep/Terraform).

Section 7 — Operations & Monitoring

37) Metrics & Logs

Enable Storage Insights to Log Analytics; analyze availability, latency, throttling, and egress.

// KQL example: high latency operations
StorageBlobLogs
| where TimeGenerated > ago(1h) and ServerLatencyMs > 500
| summarize count() by OperationName

38) Resilience

Use retries with backoff, idempotent uploads, leases for concurrency, and test DR (GRS/GZRS + failover).

39) Security Monitoring

Alert on anonymous access, large egress spikes, SAS misuse, or frequent 403/409. Stream to Sentinel for detections.

40) Data Protection Routine

Confirm versioning + soft delete; periodic restore drills; export policy/ACL baselines; lock critical containers.

41) Cross-Account Copy

Prefer server-side Copy Blob or AzCopy S2S (no download through client); preserve tier/metadata when feasible.

42) CORS

Configure CORS for browser apps—restrict origins/verbs/headers tightly.

Section 8 — Patterns, Recipes & Interview Q&A

43) Pattern — Data Landing Zone

Ingest to raw (immutable, tagged), validate → curated, then publish gold. Enforce ACLs and lifecycle at each layer.

44) Pattern — Static Site + CDN

Host SPA in $web, front with Azure CDN for caching/SSL/custom domain, use versioned releases for instant rollback.

45) Pattern — Secure Sharing

Prefer user-delegation SAS (short TTL, least permissions) delivered via your API; log every issuance; rotate keys regularly.

46) Interview — Block vs Append vs Page?

Answer: Block for general files/parallel upload; Append for log-style appends; Page for random I/O (VHD).

47) Interview — ADLS Gen2 advantages?

Answer: Hierarchical namespace, atomic rename, POSIX ACLs, and big-data compatibility (abfss) for analytics at scale.

48) Interview — Reduce costs quickly?

Answer: Lifecycle to Cool/Archive, compress & compact small files, limit transactions, enable CDN for hot public content, delete stale versions.

49) Interview — SAS best practices?

Answer: Prefer user-delegation SAS, minimal permissions, short TTL, IP restrictions, HTTPS only, log/alert issuance.

50) Interview — When to pick Premium?

Answer: Low-latency workloads, lots of small object transactions, or page blob scenarios needing consistent IOPS.