Course Building Agentic AI Systems Chapter 21 Difficulty advanced Estimated Time 600 min

Chapter 21: Domain-Specific Agents

Domain-Specific Agents in Building Agentic AI Systems.

95% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the agentic AI concept behind Domain-Specific Agents.
  • Apply Domain-Specific Agents to design reliable, production-grade agent systems.
  • Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Chapter 21: Domain-Specific Agents

Coding, research, computer-use, data analysis, and high-stakes domains

Domain Constraints Change Everything

A general-purpose agent design from Chapter 2 is the starting point, but each domain introduces unique constraints: specific tools, safety requirements, user expectations, regulatory obligations, and failure modes. This chapter covers the major categories.

DomainPrimary CapabilityKey ConstraintBenchmark
CodingCode generation, test execution, bug fixingMust run in sandboxed environment; test-driven verificationSWE-bench (% issues resolved)
Deep ResearchMulti-source retrieval, synthesis, citationAccuracy and source attribution; must not hallucinate factsGAIA, BrowseComp
Computer-UseUI navigation, form filling, screenshot-based controlSlow; brittle to UI changes; high error cost on write actionsWebArena, OSWorld
Data AnalysisSQL queries, pandas, visualization generationData privacy; correct statistical interpretationDS-1000, BIRD
Customer SupportTriage, FAQ, escalation, CRM integrationCannot make promises; must escalate edge casesBusiness-specific KPIs
Healthcare / LegalInformation retrieval, document analysisRegulatory compliance; cannot give medical/legal advice directlyDomain-specific, human-in-loop required

Coding Agents

Coding agents are among the most mature and commercially deployed category. Claude Sonnet on SWE-bench (verified) reached 49% issue resolution in 2024; Devin 2 and similar systems reach 55%+ in 2025. The test suite is the verifier, making these agents ideal for RLVR fine-tuning (Chapter 20).

Perception
Repository Context File tree, relevant code files, issue description, test output
Action
read_file Read file contents at a path
edit_file Apply a targeted edit (unified diff)
run_tests Execute test suite in sandbox; return pass/fail
search_codebase Semantic or lexical search across repo
Loop
Understand → Locate → Edit → Test → Iterate Until all tests pass or max iterations reached

Key engineering decisions for coding agents

1
Targeted edits over full rewritesHave the agent emit unified diffs or targeted function replacements, not rewrite whole files — reduces errors and makes review easier
2
Test harness as oracleAfter every edit, run the test suite. The result is the ground-truth feedback signal — much more reliable than LLM self-evaluation of code correctness
3
Repo-map for context efficiencyInstead of providing full file contents, use a repository map (file tree + function signatures) to help the agent navigate to relevant files before reading them in full

Deep Research Agents

Research agents (Perplexity Deep Research, ChatGPT Deep Research, OpenAI o3 + search) synthesize multi-source information into comprehensive reports. The central challenge is source attribution — preventing hallucination by ensuring every factual claim is backed by a retrieved source.

Research Query
🗺
Plan Searches

Decompose into sub-queries

🔍
Parallel Search

N agents, N sub-queries

📖
Read & Extract

Verified facts + citations

📝
Synthesize

Report with inline citations

Common failure: citation hallucination

The agent cites a real URL but attributes a claim to it that does not appear in the source. Mitigation: after synthesis, run a citation verification pass — for each cited claim, retrieve the source and verify the claim appears in it. Flag or remove uncorroborated claims.

High-Stakes Domains: Healthcare & Legal

Agents in regulated domains face hard constraints that do not apply to general assistants. Understanding these constraints prevents both regulatory risk and harm to end users.

Healthcare Agent Constraints

  • Cannot diagnose or prescribe — can only provide general information
  • Must disclose AI-generated nature of responses
  • HIPAA: no storage of PHI without patient consent and encryption
  • Must recommend professional consultation for symptoms
  • Audit trail required for all interactions
  • Human-in-the-loop for any clinical decision support

Legal Agent Constraints

  • Cannot provide specific legal advice — only general legal information
  • Unauthorized practice of law (UPL) risk varies by jurisdiction
  • Confidentiality obligations if user shares privileged information
  • Must cite jurisdiction-specific statutes, not generalizations
  • Attorney supervision required for client-facing applications in most jurisdictions

The deployment boundary is not a technical decision

For healthcare and legal agents, the line between "providing information" and "practicing medicine/law" is a legal question, not an engineering one. Always consult a domain expert or compliance officer before deploying in these fields. The regulatory landscape is also rapidly changing in response to AI agent deployments.

The safest high-stakes pattern

Design the agent to: (1) gather information efficiently (replacing the tedious parts), (2) present structured findings to a human expert, and (3) let the human make the final decision. This "research and present" pattern avoids direct decision-making while still providing significant efficiency gains.

Chapter 21 Quiz

1. Why is SWE-bench particularly well-suited for evaluating coding agents?

2. What is "citation hallucination" in research agents?

3. What is the recommended deployment pattern for agents in regulated healthcare/legal domains?