Claude Pocket Book

Claude Pocket Book — Uplatz

50 fully expanded cards • One-column “ebook” layout • Real prompts & responses • Interview Q&A included

Section 1 — Foundations

1) What is Claude?

Claude is Anthropic’s family of advanced large language models built to be helpful, honest, and harmless. It excels at long-context reasoning, summarization, structured outputs, and safe tool use. Claude models power assistants, copilots, RAG systems, and enterprise workflows.

Typical strengths: long-document QA, careful reasoning, aligned outputs, JSON reliability.

2) Model Lineup & Fit

Claude typically ships in tiers (e.g., faster/cost-efficient, balanced general-purpose, and top-end reasoning). Pick based on your SLOs:

  • Fast tier: drafts, autocomplete, classification, routing.
  • General tier: chat agents, RAG answers, code help.
  • Reasoning tier: complex analysis, multi-step planning, long-doc synthesis.

3) Constitutional AI (Safety by Design)

Claude is trained with a written “constitution” of principles guiding safer behavior. It reduces harmful content and steers towards helpful explanations, refusals when appropriate, and transparent reasoning patterns (without revealing internal chain-of-thought).

4) Long Context & Use Cases

With very large context windows (hundreds of pages), Claude can ingest reports, contracts, or codebases and answer nuanced questions without complex chunking pipelines—ideal for due diligence, legal/compliance review, and research assistants.

5) Claude vs Other LLMs

Claude often leans more cautious and aligned than some peers, with strong long-context performance and reliable JSON outputs. It’s a top pick for enterprises prioritizing governance and auditability in production AI systems.

6) Access & Hosting

Access Claude via Anthropic API or AWS Bedrock; many SaaS platforms (e.g., productivity, analytics, customer support) integrate Claude under the hood. Enterprises can standardize access through Bedrock for simplified IAM, logging, and VPC control.

7) Prompt & Response Example — Summarization

Prompt:
"Summarize this 40-page company strategy into 6 bullets for the exec team. Include 1 risk & 1 action."

Response (excerpt):
• Revenue growth: 15–18% targeted via self-serve expansion
• Product: launch analytics copilot (Q4)
• GTM: partner-led motion in EU
• Ops: consolidate infra to cut 12% cost
• Risk: feature adoption lags without in-app guidance
• Action: embed guided tours + KPI dashboard

8) Prompt & Response — Coding

Prompt:
"Write a Python function to validate an IBAN and return True/False."

Response (excerpt):
def is_valid_iban(iban: str) -> bool:
    s = iban.replace(" ", "").upper()
    if len(s) < 15 or len(s) > 34: return False
    s = s[4:] + s[:4]
    digits = ""
    for ch in s:
        digits += str(ord(ch) - 55) if ch.isalpha() else ch
    return int(digits) % 97 == 1

9) Prompt & Response — Policy-Aware Refusal

Prompt:
"Give me steps to exploit a known bank website vulnerability."

Response:
"I can’t help with instructions that facilitate harm or illegal activity.
If you’re testing your own system, here’s a safe checklist:
• Run an authorized pentest with scope/documentation
• Use official scanners; report findings via responsible channels
• Patch and retest, then conduct a post-mortem"

10) Production Checklist (At a Glance)

  • Stable system prompt & safety policy
  • JSON schemas for tool calls/outputs
  • Observability: tokens, latency, refusals, JSON validity
  • Guardrails: allow/deny tool lists, redaction
  • Cost/latency controls: streaming, caching, smaller models for early steps

Section 2 — API, Prompting & Tool Use

11) Basic API Pattern (Python)

Send a system message for policy/role, then user content. Stream tokens for better UX. Keep keys in env variables.

# pip install anthropic
import anthropic, os
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
resp = client.messages.create(
  model="claude-3-5-sonnet",  # example
  max_tokens=600,
  system="You are a precise enterprise assistant. Be concise; cite sources [S#].",
  messages=[{"role":"user","content":"Give 3 strategies for reducing cloud costs."}]
)
print(resp.content[0].text)

12) Streaming & Time-to-First-Token

Streaming improves perceived latency. Use SSE/WebSocket streams and append chunks as they arrive. Show a typing indicator; finalize when stream closes.

13) System Prompts That Stick

Define persona, tone, refusal behavior, and output formats up-front. Keep it stable across turns; log a version string to audit changes.

SYSTEM:
You are a senior analyst. 
• Never fabricate; if unknown, say "insufficient context".
• Prefer bullet points and short paragraphs.
• When asked for data, return JSON matching the provided schema.

14) JSON-Only Outputs & Validation

For integrations, demand JSON-only replies. Validate server-side; if invalid, resend with the error message to self-correct.

SYSTEM: Return valid JSON only.
SCHEMA: {"title":"string","bullets":"string[]"}
USER: "Summarize the doc into a title + 4 bullets."

15) Few-Shot Examples (Do Less, Better)

Use 2–5 targeted exemplars. Place the most similar example first. Remove noisy or outdated examples to keep context tight.

16) Function / Tool Calling (Concept)

Claude proposes a tool call with structured arguments; your app executes the tool and returns results as context; Claude composes the final answer.

# Model proposes:
{"tool":"get_user_balance","arguments":{"user_id":"u-123"}}
# App calls backend, returns:
{"tool_result":{"balance": 325.40, "currency":"USD"}}

17) Tool Schema Design

  • Small, single-purpose functions
  • Strict JSON schema with enums/ranges
  • Idempotent operations; pass request IDs
  • Return normalized records with IDs/timestamps

18) Temperature, Top-p & Determinism

For factual tasks, set temperature=0–0.3. For creative tasks, raise temperature or top-p. Prefer deterministic settings in CI and evaluations.

19) Errors, Retries & Timeouts

Implement exponential backoff with jitter for rate limits; detect truncation (cut-off outputs) and retry with shorter context. Circuit-break failing tools and show user-safe fallbacks.

20) Anti-Patterns to Avoid

  • Vague or conflicting instructions
  • Unbounded outputs (no token cap)
  • Sending entire corpora instead of retrieval
  • Skipping schema validation & audits

Section 3 — RAG, Long Documents & Multimodal

21) Classic RAG Architecture

Index docs (BM25 + vectors) → retrieve top-k → compose grounded prompt with citations → demand answers referencing [C#] → post-validate. Track “citation coverage” and “answerability”.

22) Chunking Strategy

Chunk by headings; 300–800 tokens with 10–20% overlap. Store metadata (title, section, URL, date). Retrieval filters by product/tenant/date improve precision.

23) RAG Prompt Template (Example)

SYSTEM: Answer using ONLY the provided context. Cite sources as [C1],[C2].
USER:
Question: "What’s our EU data retention policy?"
Context:
[C1] ... excerpt ...
[C2] ... excerpt ...
Require: short answer + bullet list + citations

24) Handling Long PDFs

Create a “map” summary first (per section), then enable targeted QA per section with citations. For tables, ask for JSON extraction (with header normalization).

25) Multimodal: Images

Attach screenshots, charts, or invoices for captioning, OCR-like extraction, or UX audits. Redact PII/logos if needed; normalize dates/currency server-side.

26) Multimodal Prompt (Invoice Extraction)

SYSTEM: Extract JSON: {"invoice_no":"string","date":"YYYY-MM-DD","total":"number","currency":"string"}.
USER: ["Extract fields from this:", <image: invoice.png>]
NOTE: If unreadable, return: {"error":"low_quality_image"}

27) SQL Copilot Pattern

Flow: classify schema → generate SQL (read-only, allowlist) → execute → summarize results + caveats. Cap rows and mask PII in the output.

28) Document QA Guardrails

  • Refuse if context missing: “insufficient context”
  • Always cite [C#]
  • Detect stale sources by date; prefer latest

29) Evaluating RAG Quality

Track: citation precision/recall, answerability, factuality, latency, cost/query. Keep golden sets and regress in CI.

30) Prompt Injection Awareness

Strip untrusted instructions from retrieved text; keep a strict system prompt; sanitize tool-args; never execute code/URLs from doc content without verification.

Section 4 — Ops, Cost, Security & Governance

31) Observability & Telemetry

Log prompts (redacted), outputs, token counts, latency, refusal rates, JSON validity, tool failures. Export metrics to APM; build dashboards for SLOs.

32) Cost Controls

  • Use smaller model for classify/retrieve; larger for final answers
  • Cache preambles/few-shots
  • Compress or summarize context
  • Stream output; cap max output tokens

33) Latency Playbook

Stream tokens, parallelize tool calls, prefetch retrieval, colocate services regionally, warm connections, prune few-shots, reuse KV cache where supported.

34) Rate Limits & Backpressure

Apply per-tenant quotas; queue requests; backoff with jitter; shed non-critical load; communicate friendly errors to the UI.

35) Security & Privacy

Redact PII, hash identifiers, use VPC/private egress, encrypt at rest/in transit, restrict tool scopes, rotate keys, and audit access.

36) Versioning & Rollouts

Version system prompts, tool schemas, and routing rules. Use blue/green and canary cohorts; keep rollback plans and feature flags.

37) Testing & CI

Golden prompts, Monte Carlo variations, schema validation tests, regression gates on quality/cost/latency. Fail builds on KPI regressions.

38) Governance & Audit

Keep audit trails: prompts, outputs, tool calls, reasons for refusals. Define retention policies; tag data by sensitivity and tenant.

39) Multi-Model Routing

Route by task: classify (small), summarize (fast), final reasoning (large). If a call exceeds SLOs, fall back gracefully to faster tier with concise output.

40) Incident Response

Runbooks for outages and safety escalations; alerts for cost spikes, refusal surges, or JSON invalidity; define on-call rotations and SLAs.

Section 5 — Practical Recipes & Interview Q&A

41) Recipe — Email Thread Summarizer

Summarize long email threads into a structured JSON digest for CRM updates.

SYSTEM: Return valid JSON only: 
{"subject":"string","participants":"string[]","summary":"string","action_items":[{"owner":"string","due":"YYYY-MM-DD","task":"string"}]}
USER: (paste email thread)

42) Recipe — Support Triage

Classify intent/severity; suggest response; optionally call ticketing tool. Log refusals for out-of-policy requests.

43) Recipe — Code Review Assistant

Input: PR diff + context files. Ask for risks, complexity, test gaps, security concerns, and a minimal patch proposal. Require a test plan output.

44) Recipe — Contract Clause Extractor

Extract parties, dates, term, renewal, termination, governing law; cite clause numbers; return JSON only; allow “unknown”.

45) Recipe — Analytics Copilot

NL → classify schema → tool(generate_sql, allowlist) → run (RO) → summarize with caveats + small table. Enforce row caps & masking.

46) Common Pitfalls & Fixes

  • No schema → add strict JSON schema & validation
  • Hallucinations → RAG with citations + lower temperature
  • High cost → cache preambles, compress context, route to smaller models

47) 30-Day Adoption Plan

W1 prototype + guardrails → W2 RAG + tools + evals → W3 canary + SLOs → W4 broader rollout + cost alerts + playbooks.

48) Troubleshooting Cheats

Echo final prompt, check truncation, reduce temp/top-p, add exact few-shots, enforce JSON, split tasks, inspect tool args, log everything.

49) Security Notes (Enterprise)

PII minimization, tenant isolation, private egress/VPC, allow-listed tools/URLs only, rate limits, abuse detection, and redaction in logs.

50) Interview Q&A — 20 Practical Questions (Expanded)

1) Claude vs other LLMs? Strong long-context, cautious alignment, reliable JSON. Choose based on SLOs (latency, cost, quality) and governance needs.

2) When use fast vs reasoning tiers? Fast for classify/draft; reasoning for complex analysis and long-doc synthesis.

3) How to reduce hallucinations? Ground with RAG, demand citations, lower temperature, allow “unknown” answers.

4) Best practice for tool schemas? Small, single-purpose; strict JSON with enums; idempotent operations; include request IDs.

5) JSON reliability tips? “JSON only” system prompt + schema + server validation + auto-retry with error hints.

6) Latency tuning? Stream tokens, parallelize tools, prefetch retrieval, prune few-shots, regional routing.

7) RAG chunk size? 300–800 tokens, 10–20% overlap, metadata filters; measure citation precision/recall.

8) Handling long PDFs? Map summary → section QA with citations; structured extraction for tables (JSON).

9) Why outputs vary? Sampling. Lower temp/top-p; add exemplars; keep prompts stable; cache primers.

10) Observability must-haves? Tokens, latency, refusal rate, JSON validity, tool errors, citation coverage, cost/query.

11) Prompt injection defense? Strip untrusted instructions; fixed system prompt; sanitize tool args; never auto-execute code/URLs.

12) When to fine-tune? After prompt+RAG plateaus; for style/format stability. Keep evals to detect drift.

13) Multi-tenant isolation? Separate keys, indices, prompts, quotas, and logs; data tagging and access control at every layer.

14) Cost control? Model routing, caching, context compression, output caps, batch background jobs.

15) What’s truncation? Output cut off at token limit. Fix by shorter context/higher cap; detect and retry.

16) CI for prompts? Golden tests, Monte Carlo variants, schema checks; fail on KPI regression.

17) Data privacy? Redact PII, hash IDs, private networking, retention policies, audit trails.

18) Guardrail vs post-filter? Guardrails prevent unsafe content; post-filters normalize/format outputs. Use both.

19) SQL copilot safety? Allowlist functions/tables, read-only, cap rows, show provenance, mask PII.

20) Biggest gotcha? Shipping without evals, guardrails, or observability. Start small, measure, iterate.