Claude Pocket Book — Uplatz
50 fully expanded cards • One-column “ebook” layout • Real prompts & responses • Interview Q&A included
1) What is Claude?
Claude is Anthropic’s family of advanced large language models built to be helpful, honest, and harmless. It excels at long-context reasoning, summarization, structured outputs, and safe tool use. Claude models power assistants, copilots, RAG systems, and enterprise workflows.
Typical strengths: long-document QA, careful reasoning, aligned outputs, JSON reliability.
2) Model Lineup & Fit
Claude typically ships in tiers (e.g., faster/cost-efficient, balanced general-purpose, and top-end reasoning). Pick based on your SLOs:
- Fast tier: drafts, autocomplete, classification, routing.
- General tier: chat agents, RAG answers, code help.
- Reasoning tier: complex analysis, multi-step planning, long-doc synthesis.
3) Constitutional AI (Safety by Design)
Claude is trained with a written “constitution” of principles guiding safer behavior. It reduces harmful content and steers towards helpful explanations, refusals when appropriate, and transparent reasoning patterns (without revealing internal chain-of-thought).
4) Long Context & Use Cases
With very large context windows (hundreds of pages), Claude can ingest reports, contracts, or codebases and answer nuanced questions without complex chunking pipelines—ideal for due diligence, legal/compliance review, and research assistants.
5) Claude vs Other LLMs
Claude often leans more cautious and aligned than some peers, with strong long-context performance and reliable JSON outputs. It’s a top pick for enterprises prioritizing governance and auditability in production AI systems.
6) Access & Hosting
Access Claude via Anthropic API or AWS Bedrock; many SaaS platforms (e.g., productivity, analytics, customer support) integrate Claude under the hood. Enterprises can standardize access through Bedrock for simplified IAM, logging, and VPC control.
7) Prompt & Response Example — Summarization
Prompt:
"Summarize this 40-page company strategy into 6 bullets for the exec team. Include 1 risk & 1 action."
Response (excerpt):
• Revenue growth: 15–18% targeted via self-serve expansion
• Product: launch analytics copilot (Q4)
• GTM: partner-led motion in EU
• Ops: consolidate infra to cut 12% cost
• Risk: feature adoption lags without in-app guidance
• Action: embed guided tours + KPI dashboard
8) Prompt & Response — Coding
Prompt:
"Write a Python function to validate an IBAN and return True/False."
Response (excerpt):
def is_valid_iban(iban: str) -> bool:
s = iban.replace(" ", "").upper()
if len(s) < 15 or len(s) > 34: return False
s = s[4:] + s[:4]
digits = ""
for ch in s:
digits += str(ord(ch) - 55) if ch.isalpha() else ch
return int(digits) % 97 == 1
9) Prompt & Response — Policy-Aware Refusal
Prompt:
"Give me steps to exploit a known bank website vulnerability."
Response:
"I can’t help with instructions that facilitate harm or illegal activity.
If you’re testing your own system, here’s a safe checklist:
• Run an authorized pentest with scope/documentation
• Use official scanners; report findings via responsible channels
• Patch and retest, then conduct a post-mortem"
10) Production Checklist (At a Glance)
- Stable system prompt & safety policy
- JSON schemas for tool calls/outputs
- Observability: tokens, latency, refusals, JSON validity
- Guardrails: allow/deny tool lists, redaction
- Cost/latency controls: streaming, caching, smaller models for early steps
11) Basic API Pattern (Python)
Send a system message for policy/role, then user content. Stream tokens for better UX. Keep keys in env variables.
# pip install anthropic
import anthropic, os
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
resp = client.messages.create(
model="claude-3-5-sonnet", # example
max_tokens=600,
system="You are a precise enterprise assistant. Be concise; cite sources [S#].",
messages=[{"role":"user","content":"Give 3 strategies for reducing cloud costs."}]
)
print(resp.content[0].text)
12) Streaming & Time-to-First-Token
Streaming improves perceived latency. Use SSE/WebSocket streams and append chunks as they arrive. Show a typing indicator; finalize when stream closes.
13) System Prompts That Stick
Define persona, tone, refusal behavior, and output formats up-front. Keep it stable across turns; log a version string to audit changes.
SYSTEM:
You are a senior analyst.
• Never fabricate; if unknown, say "insufficient context".
• Prefer bullet points and short paragraphs.
• When asked for data, return JSON matching the provided schema.
14) JSON-Only Outputs & Validation
For integrations, demand JSON-only replies. Validate server-side; if invalid, resend with the error message to self-correct.
SYSTEM: Return valid JSON only.
SCHEMA: {"title":"string","bullets":"string[]"}
USER: "Summarize the doc into a title + 4 bullets."
15) Few-Shot Examples (Do Less, Better)
Use 2–5 targeted exemplars. Place the most similar example first. Remove noisy or outdated examples to keep context tight.
16) Function / Tool Calling (Concept)
Claude proposes a tool call with structured arguments; your app executes the tool and returns results as context; Claude composes the final answer.
# Model proposes:
{"tool":"get_user_balance","arguments":{"user_id":"u-123"}}
# App calls backend, returns:
{"tool_result":{"balance": 325.40, "currency":"USD"}}
17) Tool Schema Design
- Small, single-purpose functions
- Strict JSON schema with enums/ranges
- Idempotent operations; pass request IDs
- Return normalized records with IDs/timestamps
18) Temperature, Top-p & Determinism
For factual tasks, set temperature=0–0.3
. For creative tasks, raise temperature or top-p. Prefer deterministic settings in CI and evaluations.
19) Errors, Retries & Timeouts
Implement exponential backoff with jitter for rate limits; detect truncation (cut-off outputs) and retry with shorter context. Circuit-break failing tools and show user-safe fallbacks.
20) Anti-Patterns to Avoid
- Vague or conflicting instructions
- Unbounded outputs (no token cap)
- Sending entire corpora instead of retrieval
- Skipping schema validation & audits
21) Classic RAG Architecture
Index docs (BM25 + vectors) → retrieve top-k → compose grounded prompt with citations → demand answers referencing [C#] → post-validate. Track “citation coverage” and “answerability”.
22) Chunking Strategy
Chunk by headings; 300–800 tokens with 10–20% overlap. Store metadata (title, section, URL, date). Retrieval filters by product/tenant/date improve precision.
23) RAG Prompt Template (Example)
SYSTEM: Answer using ONLY the provided context. Cite sources as [C1],[C2].
USER:
Question: "What’s our EU data retention policy?"
Context:
[C1] ... excerpt ...
[C2] ... excerpt ...
Require: short answer + bullet list + citations
24) Handling Long PDFs
Create a “map” summary first (per section), then enable targeted QA per section with citations. For tables, ask for JSON extraction (with header normalization).
25) Multimodal: Images
Attach screenshots, charts, or invoices for captioning, OCR-like extraction, or UX audits. Redact PII/logos if needed; normalize dates/currency server-side.
26) Multimodal Prompt (Invoice Extraction)
SYSTEM: Extract JSON: {"invoice_no":"string","date":"YYYY-MM-DD","total":"number","currency":"string"}.
USER: ["Extract fields from this:", <image: invoice.png>]
NOTE: If unreadable, return: {"error":"low_quality_image"}
27) SQL Copilot Pattern
Flow: classify schema → generate SQL (read-only, allowlist) → execute → summarize results + caveats. Cap rows and mask PII in the output.
28) Document QA Guardrails
- Refuse if context missing: “insufficient context”
- Always cite [C#]
- Detect stale sources by date; prefer latest
29) Evaluating RAG Quality
Track: citation precision/recall, answerability, factuality, latency, cost/query. Keep golden sets and regress in CI.
30) Prompt Injection Awareness
Strip untrusted instructions from retrieved text; keep a strict system prompt; sanitize tool-args; never execute code/URLs from doc content without verification.
31) Observability & Telemetry
Log prompts (redacted), outputs, token counts, latency, refusal rates, JSON validity, tool failures. Export metrics to APM; build dashboards for SLOs.
32) Cost Controls
- Use smaller model for classify/retrieve; larger for final answers
- Cache preambles/few-shots
- Compress or summarize context
- Stream output; cap max output tokens
33) Latency Playbook
Stream tokens, parallelize tool calls, prefetch retrieval, colocate services regionally, warm connections, prune few-shots, reuse KV cache where supported.
34) Rate Limits & Backpressure
Apply per-tenant quotas; queue requests; backoff with jitter; shed non-critical load; communicate friendly errors to the UI.
35) Security & Privacy
Redact PII, hash identifiers, use VPC/private egress, encrypt at rest/in transit, restrict tool scopes, rotate keys, and audit access.
36) Versioning & Rollouts
Version system prompts, tool schemas, and routing rules. Use blue/green and canary cohorts; keep rollback plans and feature flags.
37) Testing & CI
Golden prompts, Monte Carlo variations, schema validation tests, regression gates on quality/cost/latency. Fail builds on KPI regressions.
38) Governance & Audit
Keep audit trails: prompts, outputs, tool calls, reasons for refusals. Define retention policies; tag data by sensitivity and tenant.
39) Multi-Model Routing
Route by task: classify (small), summarize (fast), final reasoning (large). If a call exceeds SLOs, fall back gracefully to faster tier with concise output.
40) Incident Response
Runbooks for outages and safety escalations; alerts for cost spikes, refusal surges, or JSON invalidity; define on-call rotations and SLAs.
41) Recipe — Email Thread Summarizer
Summarize long email threads into a structured JSON digest for CRM updates.
SYSTEM: Return valid JSON only:
{"subject":"string","participants":"string[]","summary":"string","action_items":[{"owner":"string","due":"YYYY-MM-DD","task":"string"}]}
USER: (paste email thread)
42) Recipe — Support Triage
Classify intent/severity; suggest response; optionally call ticketing tool. Log refusals for out-of-policy requests.
43) Recipe — Code Review Assistant
Input: PR diff + context files. Ask for risks, complexity, test gaps, security concerns, and a minimal patch proposal. Require a test plan output.
44) Recipe — Contract Clause Extractor
Extract parties, dates, term, renewal, termination, governing law; cite clause numbers; return JSON only; allow “unknown”.
45) Recipe — Analytics Copilot
NL → classify schema → tool(generate_sql, allowlist) → run (RO) → summarize with caveats + small table. Enforce row caps & masking.
46) Common Pitfalls & Fixes
- No schema → add strict JSON schema & validation
- Hallucinations → RAG with citations + lower temperature
- High cost → cache preambles, compress context, route to smaller models
47) 30-Day Adoption Plan
W1 prototype + guardrails → W2 RAG + tools + evals → W3 canary + SLOs → W4 broader rollout + cost alerts + playbooks.
48) Troubleshooting Cheats
Echo final prompt, check truncation, reduce temp/top-p, add exact few-shots, enforce JSON, split tasks, inspect tool args, log everything.
49) Security Notes (Enterprise)
PII minimization, tenant isolation, private egress/VPC, allow-listed tools/URLs only, rate limits, abuse detection, and redaction in logs.
50) Interview Q&A — 20 Practical Questions (Expanded)
1) Claude vs other LLMs? Strong long-context, cautious alignment, reliable JSON. Choose based on SLOs (latency, cost, quality) and governance needs.
2) When use fast vs reasoning tiers? Fast for classify/draft; reasoning for complex analysis and long-doc synthesis.
3) How to reduce hallucinations? Ground with RAG, demand citations, lower temperature, allow “unknown” answers.
4) Best practice for tool schemas? Small, single-purpose; strict JSON with enums; idempotent operations; include request IDs.
5) JSON reliability tips? “JSON only” system prompt + schema + server validation + auto-retry with error hints.
6) Latency tuning? Stream tokens, parallelize tools, prefetch retrieval, prune few-shots, regional routing.
7) RAG chunk size? 300–800 tokens, 10–20% overlap, metadata filters; measure citation precision/recall.
8) Handling long PDFs? Map summary → section QA with citations; structured extraction for tables (JSON).
9) Why outputs vary? Sampling. Lower temp/top-p; add exemplars; keep prompts stable; cache primers.
10) Observability must-haves? Tokens, latency, refusal rate, JSON validity, tool errors, citation coverage, cost/query.
11) Prompt injection defense? Strip untrusted instructions; fixed system prompt; sanitize tool args; never auto-execute code/URLs.
12) When to fine-tune? After prompt+RAG plateaus; for style/format stability. Keep evals to detect drift.
13) Multi-tenant isolation? Separate keys, indices, prompts, quotas, and logs; data tagging and access control at every layer.
14) Cost control? Model routing, caching, context compression, output caps, batch background jobs.
15) What’s truncation? Output cut off at token limit. Fix by shorter context/higher cap; detect and retry.
16) CI for prompts? Golden tests, Monte Carlo variants, schema checks; fail on KPI regression.
17) Data privacy? Redact PII, hash IDs, private networking, retention policies, audit trails.
18) Guardrail vs post-filter? Guardrails prevent unsafe content; post-filters normalize/format outputs. Use both.
19) SQL copilot safety? Allowlist functions/tables, read-only, cap rows, show provenance, mask PII.
20) Biggest gotcha? Shipping without evals, guardrails, or observability. Start small, measure, iterate.