Chapter 21: Domain-Specific Agents
Domain-Specific Agents in Building Agentic AI Systems.
Learning Objectives
By the end of this chapter, you will be able to:
- Explain the agentic AI concept behind Domain-Specific Agents.
- Apply Domain-Specific Agents to design reliable, production-grade agent systems.
- Recognize operational trade-offs in tool use, orchestration, safety, and cost.
Chapter 21: Domain-Specific Agents
Coding, research, computer-use, data analysis, and high-stakes domains
Domain Constraints Change Everything
A general-purpose agent design from Chapter 2 is the starting point, but each domain introduces unique constraints: specific tools, safety requirements, user expectations, regulatory obligations, and failure modes. This chapter covers the major categories.
| Domain | Primary Capability | Key Constraint | Benchmark |
|---|---|---|---|
| Coding | Code generation, test execution, bug fixing | Must run in sandboxed environment; test-driven verification | SWE-bench (% issues resolved) |
| Deep Research | Multi-source retrieval, synthesis, citation | Accuracy and source attribution; must not hallucinate facts | GAIA, BrowseComp |
| Computer-Use | UI navigation, form filling, screenshot-based control | Slow; brittle to UI changes; high error cost on write actions | WebArena, OSWorld |
| Data Analysis | SQL queries, pandas, visualization generation | Data privacy; correct statistical interpretation | DS-1000, BIRD |
| Customer Support | Triage, FAQ, escalation, CRM integration | Cannot make promises; must escalate edge cases | Business-specific KPIs |
| Healthcare / Legal | Information retrieval, document analysis | Regulatory compliance; cannot give medical/legal advice directly | Domain-specific, human-in-loop required |
Coding Agents
Coding agents are among the most mature and commercially deployed category. Claude Sonnet on SWE-bench (verified) reached 49% issue resolution in 2024; Devin 2 and similar systems reach 55%+ in 2025. The test suite is the verifier, making these agents ideal for RLVR fine-tuning (Chapter 20).
Key engineering decisions for coding agents
Deep Research Agents
Research agents (Perplexity Deep Research, ChatGPT Deep Research, OpenAI o3 + search) synthesize multi-source information into comprehensive reports. The central challenge is source attribution — preventing hallucination by ensuring every factual claim is backed by a retrieved source.
Decompose into sub-queries
N agents, N sub-queries
Verified facts + citations
Report with inline citations
Common failure: citation hallucination
The agent cites a real URL but attributes a claim to it that does not appear in the source. Mitigation: after synthesis, run a citation verification pass — for each cited claim, retrieve the source and verify the claim appears in it. Flag or remove uncorroborated claims.
High-Stakes Domains: Healthcare & Legal
Agents in regulated domains face hard constraints that do not apply to general assistants. Understanding these constraints prevents both regulatory risk and harm to end users.
Healthcare Agent Constraints
- Cannot diagnose or prescribe — can only provide general information
- Must disclose AI-generated nature of responses
- HIPAA: no storage of PHI without patient consent and encryption
- Must recommend professional consultation for symptoms
- Audit trail required for all interactions
- Human-in-the-loop for any clinical decision support
Legal Agent Constraints
- Cannot provide specific legal advice — only general legal information
- Unauthorized practice of law (UPL) risk varies by jurisdiction
- Confidentiality obligations if user shares privileged information
- Must cite jurisdiction-specific statutes, not generalizations
- Attorney supervision required for client-facing applications in most jurisdictions
The deployment boundary is not a technical decision
For healthcare and legal agents, the line between "providing information" and "practicing medicine/law" is a legal question, not an engineering one. Always consult a domain expert or compliance officer before deploying in these fields. The regulatory landscape is also rapidly changing in response to AI agent deployments.
The safest high-stakes pattern
Design the agent to: (1) gather information efficiently (replacing the tedious parts), (2) present structured findings to a human expert, and (3) let the human make the final decision. This "research and present" pattern avoids direct decision-making while still providing significant efficiency gains.
Chapter 21 Quiz
1. Why is SWE-bench particularly well-suited for evaluating coding agents?
2. What is "citation hallucination" in research agents?
3. What is the recommended deployment pattern for agents in regulated healthcare/legal domains?