Course Building Agentic AI Systems Chapter 2 Difficulty advanced Estimated Time 600 min

Chapter 2: Anatomy of an Agent

Anatomy of an Agent in Building Agentic AI Systems.

9% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the agentic AI concept behind Anatomy of an Agent.
  • Apply Anatomy of an Agent to design reliable, production-grade agent systems.
  • Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Chapter 2: Anatomy of an Agent

Every component explained — from backbone to action space

Inside Every Agent

The agent loop from Chapter 1 is the high-level view. Zooming in, every agent is built from a specific set of components. Knowing what each component does — and what it does not do — is what separates a working agent from a fragile demo.

Interface
User / Operator Channel Incoming goals, system prompts, constraints
Reasoning
LLM Core GPT-4o / Claude / Gemini / local model
Scratchpad / CoT Internal reasoning trace
State
Working Memory In-context conversation state
Long-term Memory Vector DB · Knowledge graph · Session store
Execution
Tool Registry Available tools, schemas, permissions
Tool Executor Dispatches calls, validates outputs
Planner Task decomposition strategy

Component Deep-Dive

Orchestrator vs Executor

This distinction is critical in multi-agent systems (Chapter 10). The orchestrator decides what to do next; the executor carries out one specific action. A single agent often acts as both, but in scaled systems they are separate processes with separate permissions.

1 — LLM Core (Backbone)

The LLM is the only component that cannot be replaced with deterministic code. Its job is to map (context + instructions + tool list) → (next action decision). Model choice matters:

Model FamilyStrength in Agent ContextWeakness
GPT-4o / GPT-4.1Strong function calling, broad knowledgeCost at scale
o3 / o4-miniLong-horizon reasoning, complex multi-stepHigher latency per step
Claude 3.5 / 4Instruction following, long contextDifferent tool calling syntax
Gemini 2.0 / 2.5Multimodal, natively long contextNewer, less agent tooling maturity
Qwen / DeepSeekCost-effective, open weights for fine-tuningSmaller ecosystem

2 — Memory Subsystems (Preview)

Memory is covered in depth in Chapter 7. Here, the key insight is that memory is not the context window — it is a set of retrieval mechanisms that surface relevant information into the context window on demand.

1
Working MemoryThe active context window — cheapest, fastest, smallest (32K–2M tokens)
2
Episodic MemoryTimestamped logs of past interactions — "what happened before?"
3
Semantic MemoryVector embeddings of knowledge — "what do I know about X?"
4
Procedural MemoryStored tool schemas, prompt templates, learned skills

3 — Tool Interface

Tools are functions the agent can invoke. Each tool has a JSON schema describing its name, description, and parameter types. The LLM reads these schemas at inference time and generates a structured call when it decides to use one. Chapter 5 covers this fully.

4 — Planner

The planner translates a goal into a sequence of steps. In simple agents, the LLM is the planner (implicit planning in CoT). In advanced agents, the planner is a separate system prompt or even a separate LLM call dedicated to decomposition before execution begins.

5 — Action Space

Everything the agent is allowed to do. This is a designed constraint, not an emergent property. The action space should be the minimum set of capabilities needed to accomplish the goal — broader action spaces mean larger attack surfaces.

Read-only actions

  • Web search
  • Database query
  • File read
  • API GET request
  • Memory retrieval

Write / side-effect actions

  • Send email / Slack
  • Create / modify files
  • API POST / PATCH / DELETE
  • Execute code
  • Spawn sub-agents

Principle of Least Privilege

Never give an agent a write action it does not need for the current task. A research agent that can only search and read is far safer than one that also has email access — even if it never uses email.

Selecting the Right Backbone

The backbone model determines the ceiling on your agent's reasoning quality. But it is not the only factor — prompt structure, tool schemas, and memory retrieval quality often matter more than model choice for a given task.

📋
System Prompt

Role, goals, constraints

+
🔧
Tool Schemas

Available actions

+
🧩
Memory Retrieval

Relevant past context

+
💬
Conversation

Turns + observations

🤖
LLM Decision

Next action

The LLM sees all of this assembled into a single prompt. Context assembly quality is as important as model quality. If the assembled context is noisy, contradictory, or overflowing, even the best model will produce poor decisions.

Agent Class Skeleton

Here is a minimal but well-structured Python class that expresses the component map above. Chapters 5–12 will expand each method significantly.

python — agent_core.py
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Callable

@dataclass
class Tool:
    name: str
    description: str
    parameters: dict[str, Any]
    fn: Callable[..., str]

    def to_schema(self) -> dict[str, Any]:
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            },
        }


class AgentCore:
    """
    Single-agent loop with a pluggable LLM client, tool registry,
    and swappable memory backend.
    """

    def __init__(
        self,
        llm_client: Any,
        model: str,
        system_prompt: str,
        tools: list[Tool] | None = None,
        memory_backend: Any | None = None,
        max_iterations: int = 25,
    ) -> None:
        self.llm = llm_client
        self.model = model
        self.system_prompt = system_prompt
        self.tools: dict[str, Tool] = {t.name: t for t in (tools or [])}
        self.memory = memory_backend
        self.max_iterations = max_iterations
        self._messages: list[dict[str, Any]] = [
            {"role": "system", "content": system_prompt}
        ]

    # ── Public API ──────────────────────────────────────────────────────────

    def run(self, goal: str) -> str:
        self._messages.append({"role": "user", "content": goal})
        self._inject_memory(goal)          # retrieve relevant past context

        for _ in range(self.max_iterations):
            response = self._call_llm()
            choice = response.choices[0]

            if choice.finish_reason == "tool_calls":
                for call in choice.message.tool_calls:
                    result = self._execute_tool(call)
                    self._messages.append({
                        "role": "tool",
                        "content": result,
                        "tool_call_id": call.id,
                    })

            elif choice.finish_reason == "stop":
                answer = choice.message.content
                self._persist_memory(goal, answer)   # store to long-term memory
                return answer

        raise RuntimeError("Max iterations reached without FINISH signal")

    # ── Private helpers ──────────────────────────────────────────────────────

    def _call_llm(self) -> Any:
        return self.llm.chat.completions.create(
            model=self.model,
            messages=self._messages,
            tools=[t.to_schema() for t in self.tools.values()] or None,
        )

    def _execute_tool(self, call: Any) -> str:
        tool = self.tools.get(call.function.name)
        if tool is None:
            return f"Error: unknown tool '{call.function.name}'"
        import json
        args = json.loads(call.function.arguments)
        return tool.fn(**args)

    def _inject_memory(self, query: str) -> None:
        if self.memory:
            relevant = self.memory.search(query, top_k=5)
            if relevant:
                self._messages.append({
                    "role": "system",
                    "content": "Relevant past context:\n" + "\n".join(relevant),
                })

    def _persist_memory(self, query: str, answer: str) -> None:
        if self.memory:
            self.memory.store(query=query, answer=answer)

Chapter 2 Quiz

1. What is the primary role of the LLM Core in an agent?

2. Which type of memory answers the question "what happened in our last conversation?"

3. Why should an agent's action space be the minimum required set?