Agentic AI — Interview Questions Booklet (50 Q&A)
Comprehensive answers • Production-oriented patterns • Tooling, guardrails & evals • Practical snippets
1) What is an agentic AI system?
Answer: An agentic AI is a goal-seeking loop that perceives context, reasons about next steps, takes actions via tools/APIs, observes outcomes, and iterates until success or constraints are reached. Unlike a reactive chatbot, it maintains state and memory across steps. This enables multi-hop problem solving, real-world integration, and autonomous workflows.
2) How is an agent different from a typical LLM chatbot?
Answer: A standard chatbot generates text responses but lacks structured planning, tool use, and persistent state. An agent coordinates planning, retrieval, tool execution, and evaluation with explicit constraints like budgets or step caps. Practically, the agent achieves outcomes rather than merely answering questions.
3) What core components make up an AI agent?
Answer: Typical components include context assembly (incl. RAG), a planner/reasoner, a tool executor or actuator, memory/state, and evaluators/guardrails. Often there’s a programmatic controller (graph or state machine) orchestrating steps and handling errors. Observability and storage of artifacts are essential for reliability and audit.
4) What problem types benefit most from agentic approaches?
Answer: Multi-step tasks requiring external data or actions (e.g., booking workflows, data analysis, report generation, research with citations) benefit the most. Agents also shine in environments with uncertainty where iterative exploration and feedback improve results. They are less useful when a single static response suffices.
5) What are common failure modes for agents?
Answer: Ambiguous goals, brittle prompts, unreliable or poorly specified tools, and missing guardrails often cause loops to stall or wander. Long-horizon tasks fail without checkpoints, verifiers, and resumability. Retrieval drift, outdated memory, and tool hallucination can cascade into incorrect actions.
6) What is the role of constraints (budget, max steps) in agents?
Answer: Constraints bound exploration to reduce cost, latency, and risk. Budgets and step caps force prioritization and early stopping with partial results when necessary. They also provide safety guarantees and predictable behavior for SLAs.
7) How do agents maintain state and memory?
Answer: Agents persist working state (current plan, step counters, open threads), episodic memory (dialog turns), and semantic/procedural knowledge (facts or how-tos) in external stores. Summaries and embeddings compress long contexts. Strong memory hygiene (TTL, redaction) prevents bloat and privacy risk.
8) What planning strategies are commonly used?
Answer: ReAct couples reasoning with tool actions in a loop; Chain/Tree-of-Thoughts encourage structured multi-step reasoning; hierarchical planners split work into manager/worker roles. Programmatic planners (graphs, FSMs) encode deterministic control. Choice depends on task complexity and reliability needs.
9) Why use a programmatic controller (e.g., state machine or graph)?
Answer: Controllers give determinism, typed I/O contracts, retries with backoff, and resumability after failure or redeploy. They separate concerns between flexible LLM reasoning and strict workflow safety. This is vital for production reliability, auditing, and rollbacks.
10) How do you prevent runaway loops?
Answer: Apply max_steps, time/cost budgets, and circuit breakers tied to success criteria. Enforce JSON schemas for tool calls/outputs to avoid invalid states. On exit, return a structured PartialResult
with achieved artifacts and next-best steps.
11) How do you represent a plan step?
Answer: Use a compact schema that tools and checkers can validate. Include action type, arguments, and success criteria. Example:
{
"id": "step-003",
"action": "http.get",
"args": {"url": "https://api.example.com?q=flights&limit=5"},
"success_criteria": ["price <= 180", "duration <= 2h"]
}
12) How do you decompose long-horizon tasks?
Answer: Break goals into milestones with explicit deliverables and checkers. Persist artifacts at each checkpoint and snapshot memory to enable resume. Idempotent steps and rollback bundles reduce recovery time after partial failure.
13) What is the role of a verifier/checker agent?
Answer: Verifiers validate outputs against rubrics (e.g., budget respected, citations present, safety rules met). They gate progression to the next step or finalization. This reduces hallucinations and improves trust.
14) How do you handle nondeterminism?
Answer: Constrain outputs using JSON schemas and type checks, seed where possible, and use self-consistency (n-best) with majority vote. Cache successful traces and tool results to stabilize behavior. Prefer programmatic controllers for critical paths.
15) What memory types are useful in agents?
Answer: Episodic stores dialogues and events; semantic stores facts as vectors; procedural captures how-to steps; and working memory holds current plan/state. Balancing these improves consistency and reduces context window pressure.
16) What are common RAG pitfalls?
Answer: Poor chunking, irrelevant retrieval, schema/version drift, and stale indexes degrade quality. Use hybrid search (keyword+vector), rerankers, metadata filters, and freshness signals. Monitor retrieved-to-used ratio to detect noise.
17) How do you design a retrieval tool contract?
Answer: Keep inputs minimal (query, k, filters) and outputs typed. Include source IDs and metadata for citation and audit. Example:
tool: retrieve(query:str, k:int=5, filters:dict=None) -> List[Doc]
Doc = { "id":str, "text":str, "source":str, "meta":dict }
18) How do you govern memory growth and privacy?
Answer: Apply TTLs, selective retention, summarization/compression, and PII redaction/anonymization. Provide user controls to view/delete stored memory. Encrypt at rest and in transit; restrict access by role.
19) How to detect retrieval drift?
Answer: Track index freshness, reranker scores, citation coverage in final outputs, and stale-source rates. Alert on drops in coverage or spikes in unused retrievals. Periodically re-chunk and re-index.
20) How to reduce hallucinations with RAG?
Answer: Enforce cite-or-abstain policies, prefer retrieval-first prompting, and require the answer to reference retrieved spans. Use verifier agents to cross-check claims against sources. Penalize unsupported claims in reward shaping.
21) What makes a good tool design?
Answer: Small, composable functions with strict JSON schemas and deterministic errors enable reliable orchestration. Tools should be idempotent or report side effects explicitly. Clear error codes and backoff hints improve automatic recovery.
22) How do you secure code execution tools?
Answer: Run in sandboxes (container/VM), apply CPU/memory/time/network caps, and enforce package allowlists. Jail the filesystem, redact logs, and isolate tenants. Maintain a test harness and audit trail for executed code.
23) How should agents handle tool errors?
Answer: Use typed exceptions and retry policies (exponential backoff with jitter) for transient faults, deterministic fallbacks for known errors, and escalation to HITL when guardrails trigger. Always log traces with inputs/outputs for diagnosis.
on_error:
- retry: exponential_backoff(max=3, jitter=true)
- fallback: "use_cached_result"
- escalate: "request_human_review"
24) Why insist on structured outputs?
Answer: JSON (or similar) allows schema validation, safer parsing, and deterministic downstream processing. It simplifies verification and storage as artifacts. Unstructured prose increases post-processing fragility and cost.
25) What is a router and when do you need one?
Answer: A router dispatches tasks to appropriate models or tools based on complexity, latency, cost, or safety. Use cheap models for retrieval/routing and strong models for planning/verification. This reduces spend while preserving quality.
26) How do you design allow/deny lists and spend caps?
Answer: Maintain an explicit catalog of permitted tools/domains and block high-risk endpoints by default. Enforce per-task and per-user budget caps with real-time counters. Example:
allow_tools = ["search","retrieve","http.get","code.exec"]
deny_domains = ["*.prod-internal.example"]
spend_cap_gbp = 3.00
27) How do you test tools independently of agents?
Answer: Provide a unit-test suite and golden IO fixtures for each tool. Include contract tests for schemas and negative tests for error codes. Mock external dependencies to reproduce edge cases deterministically.
28) What guardrails should wrap tool calls?
Answer: Pre-call policy checks (scope, budget) and post-call validators (schema, safety) reduce risk. Dry-run simulators expose side effects before execution. High-risk actions require human approval or staged rollout.
29) How do you evaluate agents beyond accuracy?
Answer: Track task success, steps-to-success, tool error rate, hallucination rate, latency, and cost per task. Include user satisfaction and HITL intervention rates. Evaluate both offline (golden tasks) and online (canaries).
30) What does an offline eval harness look like?
Answer: Build synthetic and curated real tasks with known optima, run multiple trials per agent version, and compute success/variance. Gate deployments on thresholds and regression tests. Archive traces for forensic analysis.
31) How do you manage compliance and audits?
Answer: Maintain immutable traces, redact PII, log model/tool usage, and document incident response. Respect data residency and right-to-delete requirements. Conduct regular red-team and safety reviews with evidence trails.
32) How can reward shaping improve behavior?
Answer: Add positive signals for meeting constraints, citing sources, and finishing early; penalize unsupported claims, extra steps, and unsafe calls. Rewards can be intrinsic (heuristics) or learned (from human feedback). Keep the rubric transparent and auditable.
33) What HITL triggers make sense?
Answer: Low confidence, budget risk, conflicting tool outcomes, or safety flags should queue human review. Provide reviewers with complete traces, artifacts, and a one-click approve/deny interface. Log outcomes to refine triggers.
34) How do you set enterprise SLAs for agents?
Answer: Define uptime/SLOs, maximum time-to-resolution (TTR), accuracy bands by task class, and incident reporting windows. Include data deletion guarantees and rollback time targets. Align SLAs with risk tiers and business impact.
35) What observability is non-negotiable?
Answer: Per-step traces (prompts, tool inputs/outputs), token/latency/cost counters, structured errors with stack/context, and artifact storage. Correlate logs across services with request IDs. Build dashboards for live and historical analysis.
36) How do you control cost without hurting quality?
Answer: Use routers to select cheaper models for simple tasks, cache retrievals and tool results, compress prompts, and enforce early exits. Batch retrieval where possible and de-duplicate repeated steps. Track cost per task and cache hit rates.
37) Why version prompts, tools, and graphs?
Answer: Versioning enables reproducibility, safe rollbacks, and differential evaluation. Treat each as code artifacts with CI/CD, tests, and change logs. Bundle versions for atomic rollouts and rollbacks.
38) How do you optimize latency?
Answer: Parallelize independent tool calls, stream partial outputs, and reduce context via summaries and retrieval filters. Prefer structured outputs to minimize parsing. Warm pools and connection reuse cut cold-start time.
39) What KPIs matter for production agents?
Answer: Success rate, average steps, MTTR for failures, cost per task, HITL percentage, and retrieval freshness/drift. Segment by task class to avoid averaging away issues. Tie KPIs to alert thresholds.
40) How do you design rollbacks?
Answer: Keep rollback bundles (model, prompt, graph, tool versions) with known-good evals. Canary new versions to a small cohort and monitor deltas before ramping. Automate rollback on KPI regression or error spikes.
41) When should you prefer multi-agent over single agent?
Answer: Use multi-agent when specialization (planner, coder, tester), parallelism, or negotiation adds value. Ensure communication protocols and arbitration rules are explicit. For simpler tasks, a single agent is often more robust.
42) How do agents communicate safely and productively?
Answer: Define message types (Plan, Task, Result, Critique) with schemas and budgets. Require citations and confidence scores. Use an arbiter to resolve conflicts and enforce global constraints.
43) What is tool hallucination and how to prevent it?
Answer: Tool hallucination occurs when agents invent unsupported tools or endpoints. Prevent with strict catalogs, schema validation, and compile-time checks for tool names/args. Fallback to human review when contracts fail repeatedly.
44) How do you balance autonomy and oversight?
Answer: Assign autonomy levels by task risk, with HITL gates for high impact actions. Provide transparent traces and simulators for review. Start with constrained autonomy and expand as metrics prove reliability.
45) What protocol ensures repeatable collaboration?
Answer: Use a simple contract like: planner proposes plan → workers execute tasks → checker evaluates → arbiter decides continue/stop. Enforce budgets and step caps at each stage. Persist artifacts for handoffs and audits.
46) Design a travel-booking agent with constraints.
Answer: Start with goal and constraints (budget, dates, nonstop preference). Plan: search flights (cap price), compare durations, hold best, search hotels within nightly cap, summarize with links and citations. Add a checker to validate price/duration and a HITL gate for payment.
Goal: Manchester → Paris weekend, budget £250, nonstop preferred
47) Build a data-analysis agent for CSV files.
Answer: Tools: file loader, profiler, SQL/df engine, charting, and exporter. Plan: profile schema, infer types, answer user question with SQL/df ops, produce chart + explanation, export notebook. Checker validates statistical soundness and missing-data handling.
48) Draft a minimal ReAct loop with caps.
Answer: The loop alternates Thought → Action → Observation while enforcing MAX_STEPS
and budget. Persist state at each step for resume. Stop with PartialResult
if constraints are hit.
while steps <= MAX && !done:
thought = model.think(state)
action = choose_tool(thought)
obs = tool(action)
state.update(obs)
49) Propose a KPI set and dashboards for production agents.
Answer: Dashboards: success rate, steps/task, MTTR, cost/task, latency distributions, HITL%, tool error breakdown, retrieval freshness. Segment by task class and cohort (canary vs. stable). Alert on regression beyond set thresholds.
50) Final interview advice: how to stand out?
Answer: Treat prompts, tools, and control graphs as first-class code with tests, versions, and rollbacks. Bring real traces showing planning, tool I/O, checker feedback, and final artifacts. Demonstrate how you diagnose failures and trade off cost, latency, and quality.