Learning Objectives

By the end of this chapter, you will be able to:

Explain the agentic AI concept behind Domain-Specific Agents.
Apply Domain-Specific Agents to design reliable, production-grade agent systems.
Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Section 5 — Advanced Topics & Frontiers

Chapter 21: Domain-Specific Agents

Coding, research, computer-use, data analysis, and high-stakes domains

Domain Constraints Change Everything

A general-purpose agent design from Chapter 2 is the starting point, but each domain introduces unique constraints: specific tools, safety requirements, user expectations, regulatory obligations, and failure modes. This chapter covers the major categories.

Domain	Primary Capability	Key Constraint	Benchmark
Coding	Code generation, test execution, bug fixing	Must run in sandboxed environment; test-driven verification	SWE-bench (% issues resolved)
Deep Research	Multi-source retrieval, synthesis, citation	Accuracy and source attribution; must not hallucinate facts	GAIA, BrowseComp
Computer-Use	UI navigation, form filling, screenshot-based control	Slow; brittle to UI changes; high error cost on write actions	WebArena, OSWorld
Data Analysis	SQL queries, pandas, visualization generation	Data privacy; correct statistical interpretation	DS-1000, BIRD
Customer Support	Triage, FAQ, escalation, CRM integration	Cannot make promises; must escalate edge cases	Business-specific KPIs
Healthcare / Legal	Information retrieval, document analysis	Regulatory compliance; cannot give medical/legal advice directly	Domain-specific, human-in-loop required

Coding Agents

Coding agents are among the most mature and commercially deployed category. Claude Sonnet on SWE-bench (verified) reached 49% issue resolution in 2024; Devin 2 and similar systems reach 55%+ in 2025. The test suite is the verifier, making these agents ideal for RLVR fine-tuning (Chapter 20).

Perception

Repository Context File tree, relevant code files, issue description, test output

Action

read_file Read file contents at a path

edit_file Apply a targeted edit (unified diff)

run_tests Execute test suite in sandbox; return pass/fail

search_codebase Semantic or lexical search across repo

Loop

Understand → Locate → Edit → Test → Iterate Until all tests pass or max iterations reached

Key engineering decisions for coding agents

1
Targeted edits over full rewritesHave the agent emit unified diffs or targeted function replacements, not rewrite whole files — reduces errors and makes review easier
2
Test harness as oracleAfter every edit, run the test suite. The result is the ground-truth feedback signal — much more reliable than LLM self-evaluation of code correctness
3
Repo-map for context efficiencyInstead of providing full file contents, use a repository map (file tree + function signatures) to help the agent navigate to relevant files before reading them in full

Deep Research Agents

Research agents (Perplexity Deep Research, ChatGPT Deep Research, OpenAI o3 + search) synthesize multi-source information into comprehensive reports. The central challenge is source attribution — preventing hallucination by ensuring every factual claim is backed by a retrieved source.

❓

Research Query

→

🗺

Plan Searches

Decompose into sub-queries

→

🔍

Parallel Search

N agents, N sub-queries

→

📖

Read & Extract

Verified facts + citations

→

📝

Synthesize

Report with inline citations

Common failure: citation hallucination

The agent cites a real URL but attributes a claim to it that does not appear in the source. Mitigation: after synthesis, run a citation verification pass — for each cited claim, retrieve the source and verify the claim appears in it. Flag or remove uncorroborated claims.

High-Stakes Domains: Healthcare & Legal

Agents in regulated domains face hard constraints that do not apply to general assistants. Understanding these constraints prevents both regulatory risk and harm to end users.

Healthcare Agent Constraints

Cannot diagnose or prescribe — can only provide general information
Must disclose AI-generated nature of responses
HIPAA: no storage of PHI without patient consent and encryption
Must recommend professional consultation for symptoms
Audit trail required for all interactions
Human-in-the-loop for any clinical decision support

Legal Agent Constraints

Cannot provide specific legal advice — only general legal information
Unauthorized practice of law (UPL) risk varies by jurisdiction
Confidentiality obligations if user shares privileged information
Must cite jurisdiction-specific statutes, not generalizations
Attorney supervision required for client-facing applications in most jurisdictions

The deployment boundary is not a technical decision

For healthcare and legal agents, the line between "providing information" and "practicing medicine/law" is a legal question, not an engineering one. Always consult a domain expert or compliance officer before deploying in these fields. The regulatory landscape is also rapidly changing in response to AI agent deployments.

The safest high-stakes pattern

Design the agent to: (1) gather information efficiently (replacing the tedious parts), (2) present structured findings to a human expert, and (3) let the human make the final decision. This "research and present" pattern avoids direct decision-making while still providing significant efficiency gains.