Chapter 2: Agent Architecture Components
Building Blocks of Agents
Learning Objectives
- Understand agent architecture components fundamentals
- Master the mathematical foundations
- Learn practical implementation
- Apply knowledge through examples
- Recognize real-world applications
Agent Architecture Components
🎯 The Building Blocks
Every AI agent is built from five core components that work together to enable autonomous behavior. Understanding these components is essential for building effective agents.
🏗️ Agent Architecture Components Diagram
LLM Core
Reasoning engine
Memory System
Short & long-term
Tool Interface
Function calling
Planner
Task breakdown
Action Executor
Execute & observe
Key: All components work together. LLM reasons using memory, planner creates strategy, tool interface executes actions, executor observes results!
🔄 Component Interactions
How components work together:
- LLM Core receives input and uses Memory for context
- Planner breaks down task into steps using LLM reasoning
- Tool Interface selects and calls appropriate tools
- Action Executor executes actions and observes results
- Results update Memory, and the cycle continues
Key Concepts
🔑 Component 1: LLM as Reasoning Engine
The LLM is the "brain" of the agent - it processes information, reasons about tasks, and makes decisions.
What the LLM Does
- Understanding: Interprets user requests and context
- Reasoning: Thinks through problems step by step
- Decision Making: Chooses what action to take next
- Planning: Breaks down complex tasks into steps
Example: LLM Reasoning Process
User: "Research quantum computing and write a summary"
LLM thinks:
- "I need to search for information about quantum computing"
- "Then I need to read and extract key points"
- "Finally, I need to write a summary"
Output: Plan with 3 steps
Component 2: Memory System
Memory allows agents to remember past interactions and maintain context across conversations.
Memory System Architecture
Short-Term Memory
- Current conversation
- Recent actions
- Immediate context
- Last 10-20 turns
Long-Term Memory
- User preferences
- Important facts
- Past learnings
- Persistent storage
Memory Example
Conversation 1:
- User: "I prefer meetings in the morning"
- Agent stores in long-term memory: {"preference": "morning_meetings"}
Conversation 2 (weeks later):
- User: "Schedule a meeting"
- Agent retrieves preference → Schedules for morning
Component 3: Tool Interface
The tool interface allows agents to interact with external systems and APIs.
How Tool Interface Works
- Tool Definition: Each tool has a name, description, and parameters
- Tool Selection: LLM decides which tool to use based on task
- Tool Execution: Agent calls the tool with appropriate parameters
- Result Integration: Tool results are fed back to LLM for further reasoning
Tool Interface Flow
LLM Decision
"Use weather API"
Tool Selection
get_weather()
Execute
Call API
Result
72°F, sunny
Component 4: Planner
The planner breaks down complex tasks into manageable steps.
Planning Process
Task: "Research AI trends and create a presentation"
Planner creates:
- Search for "AI trends 2024"
- Read and extract key points from articles
- Organize information into categories
- Create presentation slides
- Review and refine presentation
Component 5: Action Executor
The executor carries out actions and observes results.
Execution Process
- Execute: Runs the selected action (tool call, API request, etc.)
- Observe: Captures the result or outcome
- Validate: Checks if action succeeded
- Update State: Updates agent state with new information
Mathematical Formulations
Agent State Representation
What This Measures
This formula represents the complete internal state of an AI agent at any given moment. It captures all the information the agent needs to make decisions: what it remembers, what it can do, what it's trying to achieve, and what it has done. This state representation is the foundation for agent decision-making.
Breaking It Down
- M (Memory): Memory system containing both short-term (recent conversation turns, immediate context buffer) and long-term (persistent facts, learned patterns, user preferences stored in vector databases or knowledge graphs). Memory enables the agent to maintain context across interactions and learn from experience.
- T (Tools): Available tools - the set of functions, APIs, or capabilities the agent can use to interact with external systems. This includes tool definitions, descriptions, parameters, and execution functions. Tools extend the agent's capabilities beyond text generation.
- G (Goal): Current goal or task - the objective the agent is trying to achieve. This guides all decision-making and helps the agent determine when a task is complete. Goals can be high-level (e.g., "answer user's question") or specific (e.g., "get weather for New York").
- H (History): History of actions - a record of what the agent has done, including past actions, their results, and the sequence of decisions. History helps the agent avoid repeating mistakes, track progress, and understand the context of the current situation.
Where This Is Used
This state representation is used throughout the agent's operation. It's passed to the decision function to determine the next action, updated after each action based on observations, stored persistently for long-term memory, and used to track progress toward goals. The state is the agent's "memory" of its current situation and past experiences.
Why This Matters
A comprehensive state representation is essential for intelligent agent behavior. Without proper state tracking, agents cannot maintain context, learn from experience, or make informed decisions. This formula ensures all critical information (memory, capabilities, objectives, history) is captured and available for decision-making, enabling agents to operate autonomously and adaptively.
Example Calculation
Given: An agent helping a user with research
- M = {"short_term": ["User asked about quantum computing"], "long_term": {"user_interests": ["AI", "physics"], "preferred_format": "detailed"}}
- T = ["search_web", "read_document", "summarize", "write_report"]
- G = "Research quantum computing and provide detailed summary"
- H = [{"action": "search_web", "query": "quantum computing 2024", "result": "found 5 articles"}]
State: state = (M, T, G, H)
Interpretation: The agent knows the user's interests and preferences (from M), has access to research tools (T), is working toward providing a detailed quantum computing summary (G), and has already searched the web (H). This complete state enables the agent to make informed next decisions, such as reading the found articles or generating the summary.
Tool Selection Function
What This Measures
This function determines which tool the agent should use from its available toolset. It calculates the probability that each tool is appropriate given the current state and goal, then selects the tool with the highest probability. This enables intelligent tool selection based on context rather than random or fixed choices.
Breaking It Down
- T: Set of available tools - all functions, APIs, or capabilities the agent can use (e.g., ["get_weather", "search_web", "calculate", "send_email"]). The agent evaluates each tool in this set.
- state: Current agent state - includes memory, history, current observations, and any relevant context. The state provides information about what the agent knows and what situation it's in.
- goal: Current goal - the objective the agent is trying to achieve. The goal helps determine which tool would be most useful (e.g., if goal is "get weather", weather tools have higher probability).
- P(tool = t | state, goal): Probability that tool t is appropriate given the state and goal. This is typically calculated using LLM reasoning (the LLM evaluates tool descriptions against the current context) or learned models. Higher probability means the tool is more likely to help achieve the goal.
- t^*: Selected tool - the tool with maximum probability. This is the optimal tool choice that maximizes the likelihood of successfully completing the goal.
Where This Is Used
This function is called during the "Reason" step of the agent loop when the agent determines it needs to use a tool. The agent evaluates all available tools, calculates their appropriateness probabilities, and selects the best one. This happens before tool execution, ensuring the agent uses the most suitable tool for the current situation.
Why This Matters
Intelligent tool selection is crucial for agent effectiveness. Using the wrong tool wastes resources and fails to achieve goals, while using the right tool efficiently accomplishes tasks. This formula enables agents to make context-aware tool choices rather than random selection, dramatically improving success rates and efficiency. Without proper tool selection, agents would either use tools inappropriately or need to try multiple tools, leading to wasted resources and poor performance.
Example Calculation
Given:
- T = ["get_weather", "search_web", "calculate", "send_email"]
- state = "User asked: 'What's the weather in New York?'"
- goal = "Provide accurate weather information"
Step 1: Calculate P(tool | state, goal) for each tool:
- P(get_weather | state, goal) = 0.95 (very high - directly relevant)
- P(search_web | state, goal) = 0.3 (moderate - could work but less direct)
- P(calculate | state, goal) = 0.05 (very low - not relevant)
- P(send_email | state, goal) = 0.02 (very low - not relevant)
Step 2: Find maximum: max P = 0.95
Result: t^* = get_weather (tool with P = 0.95)
Interpretation: The agent correctly identified that get_weather is the most appropriate tool for answering a weather question. The high probability (0.95) reflects that this tool directly matches the user's request and the agent's goal. This demonstrates how the tool selection function enables context-aware decision-making.
Memory Update Function
What This Measures
This function describes how the agent's memory system evolves over time. It takes the current memory, the action that was taken, the observation that resulted, and an importance score, then updates the memory accordingly. This enables agents to learn from experience and maintain relevant information for future decisions.
Breaking It Down
- M_t: Memory at time t - the current state of the agent's memory system, including short-term buffer (recent conversation turns, immediate context) and long-term store (persistent facts, learned patterns, important information). This is the agent's knowledge base before the update.
- action_t: Action taken at time t - the specific action the agent executed (e.g., called a tool, generated text, asked a question). The action provides context about what the agent was trying to do.
- observation_t: Result observed from the action - what happened as a result (tool output, user response, error, environmental change). Observations are the new information that needs to be incorporated into memory.
- importance: How important the information is to remember - a score (typically 0-1) that determines whether information goes to short-term memory (low importance, temporary) or long-term memory (high importance, persistent). Importance can be determined by: user explicitly marking as important, agent reasoning about relevance, frequency of similar information, or success/failure of actions.
- M_{t+1}: Updated memory after incorporating the new information. The Update function: adds observation to short-term buffer, evaluates importance to decide if it should be stored long-term, updates existing memories if new information contradicts or enhances them, and manages memory capacity (may remove low-importance old information).
Where This Is Used
This update happens after every action in the agent loop, specifically after observing the result of an action. The memory update is a critical step that ensures the agent learns from experience. It's used to: maintain conversation context (short-term), store important facts for future use (long-term), update beliefs when new information contradicts old information, and manage memory capacity by prioritizing important information.
Why This Matters
Effective memory updates are essential for agent learning and adaptation. Without proper memory updates, agents cannot learn from experience, maintain context across conversations, or adapt their behavior based on outcomes. This function enables agents to: remember user preferences, learn from successful and failed actions, maintain conversation context, and build a knowledge base over time. This is what makes agents "intelligent" rather than just reactive - they learn and improve.
Example Calculation
Given:
- M_t = {"short_term": ["User asked about weather"], "long_term": {"user_preference": "Celsius"}}
- action_t = "call get_weather(city='New York')"
- observation_t = {"temp": 22, "condition": "sunny", "date": "2024-12-10"}
- importance = 0.3 (moderate - weather data is useful but time-sensitive)
Step 1: Add observation to short-term memory (always done for recent context)
Step 2: Evaluate importance (0.3) → below threshold (0.5) → store in short-term only
Step 3: Update short-term: add weather data, keep last 10 turns
Step 4: Long-term memory unchanged (importance too low)
Result: M_{t+1} = {"short_term": ["User asked about weather", "Weather: 22°C sunny on 2024-12-10"], "long_term": {"user_preference": "Celsius"}}
Interpretation: The weather data was added to short-term memory (for current conversation context) but not to long-term memory (because it's time-sensitive and will be outdated soon). The user preference remains in long-term memory as it's still relevant. This demonstrates how importance scores guide memory storage decisions, ensuring important information persists while temporary data is kept only for immediate context.
Planning Function
What This Measures
This function generates a sequence of actions (a plan) to achieve a goal. It takes the desired outcome, current state, and any constraints, then produces an ordered sequence of actions that, when executed, should accomplish the goal. This enables agents to handle complex, multi-step tasks by breaking them down into manageable steps.
Breaking It Down
- goal: Desired outcome - the objective the agent wants to achieve (e.g., "Research quantum computing and write a summary", "Help user book a flight"). The goal defines what success looks like and guides the planning process.
- state: Current state - the agent's current understanding including memory, available tools, and current situation. The state tells the planner what resources and information are available, and what the starting point is.
- constraints: Limitations that must be considered - time limits, resource constraints (API rate limits, token budgets), dependencies between actions, safety requirements, or user preferences. Constraints ensure the plan is feasible and acceptable.
- plan: Sequence of actions [action_1, action_2, ..., action_n] - an ordered list of steps to execute. Each action in the sequence builds on previous actions, with later actions depending on results from earlier ones. The plan provides a roadmap from current state to goal.
Where This Is Used
This function is called when the agent receives a complex goal that requires multiple steps. The planner analyzes the goal, evaluates the current state, considers constraints, and generates a step-by-step plan. The plan is then executed sequentially, with the agent following each step and adapting if needed. Planning typically happens at the start of a task, though plans can be revised if circumstances change.
Why This Matters
Planning enables agents to handle complex tasks that require multiple coordinated actions. Without planning, agents would make decisions reactively, one step at a time, without considering the full path to the goal. This leads to inefficient behavior, missed dependencies, and failure on complex tasks. Planning allows agents to: break down complex goals into manageable steps, identify dependencies between actions, optimize the sequence of actions, and anticipate potential issues. This is what distinguishes planning agents from simple reactive agents.
Example Calculation
Given:
- goal = "Research quantum computing developments in 2024 and write a 500-word summary"
- state = {"memory": ["User interested in AI"], "tools": ["search_web", "read_document", "summarize", "write"]}
- constraints = {"max_steps": 10, "time_limit": 5 minutes, "token_budget": 10000}
Step 1: Planner analyzes goal → needs research, reading, and writing
Step 2: Planner checks state → has necessary tools available
Step 3: Planner considers constraints → must complete in 10 steps, 5 minutes
Step 4: Planner generates sequence:
- action_1 = "search_web(query='quantum computing 2024')"
- action_2 = "read_document(articles from search)"
- action_3 = "extract_key_points(documents)"
- action_4 = "write_summary(key_points, length=500_words)"
Result: plan = [search_web, read_document, extract_key_points, write_summary]
Interpretation: The planner broke down the complex goal into 4 sequential steps: first search for information, then read the found documents, extract important points, and finally write the summary. The plan respects constraints (4 steps < 10 max, estimated time < 5 minutes) and follows logical dependencies (can't write before reading, can't read before searching). This demonstrates how planning enables systematic approach to complex tasks.
Detailed Examples
Example 1: Email Agent - Component Interaction
Task: "Check my emails and summarize important ones"
This example demonstrates how all agent components work together to complete a complex task. The agent must coordinate between its reasoning engine, memory system, planning module, tool interface, and action executor to successfully retrieve and summarize emails.
Component Interaction Flow
LLM Core
Receives: "Check emails and summarize important ones"
Uses Memory: Retrieves email preferences
Planner
Creates plan: 1) Fetch emails, 2) Filter important, 3) Summarize
Tool Interface
Selects: fetch_emails()
Action Executor
Executes: Calls email API, receives 20 emails
Memory
Stores: Email count, important email IDs
LLM Core (Again)
Summarizes important emails using retrieved data
Example 2: Memory System in Action
Demonstrating short-term vs long-term memory:
Memory System Example
| Turn | Short-Term Memory | Long-Term Memory |
|---|---|---|
| 1 | User: "I'm John" | Stored: name = "John" |
| 2 | User: "Schedule meeting" Agent: Uses name from memory |
name = "John" (retrieved) |
| 3 | User: "What's my name?" Agent: "John" (from long-term) |
name = "John" (persistent) |
Implementation
Memory System Implementation
from typing import Dict, List, Any
from datetime import datetime
import json
class MemorySystem:
"""Agent memory system with short-term and long-term storage"""
def __init__(self):
self.short_term = [] # Recent conversation (last N turns)
self.long_term = {} # Persistent facts and preferences
self.max_short_term = 20 # Keep last 20 turns
def add_to_short_term(self, role: str, content: str):
"""Add to short-term memory (conversation history)"""
self.short_term.append({
'role': role, # 'user' or 'agent'
'content': content,
'timestamp': datetime.now().isoformat()
})
# Keep only recent turns
if len(self.short_term) > self.max_short_term:
self.short_term = self.short_term[-self.max_short_term:]
def add_to_long_term(self, key: str, value: Any, importance: float = 0.5):
"""
Add to long-term memory
Parameters:
key: Memory key (e.g., 'user_name', 'preference')
value: Memory value
importance: How important (0-1), affects retention
"""
self.long_term[key] = {
'value': value,
'importance': importance,
'timestamp': datetime.now().isoformat(),
'access_count': 0
}
def retrieve_from_long_term(self, key: str) -> Any:
"""Retrieve from long-term memory"""
if key in self.long_term:
self.long_term[key]['access_count'] += 1
return self.long_term[key]['value']
return None
def get_context(self, max_turns: int = 10) -> List[Dict]:
"""Get recent conversation context"""
return self.short_term[-max_turns:]
def search_long_term(self, query: str) -> List[Dict]:
"""Search long-term memory by key or value"""
results = []
query_lower = query.lower()
for key, data in self.long_term.items():
if query_lower in key.lower() or query_lower in str(data['value']).lower():
results.append({'key': key, 'value': data['value']})
return results
# Example usage
memory = MemorySystem()
# Add to short-term (conversation)
memory.add_to_short_term('user', "What's the weather?")
memory.add_to_short_term('agent', "I'll check the weather for you.")
# Add to long-term (preferences)
memory.add_to_long_term('user_name', 'John', importance=0.9)
memory.add_to_long_term('preferred_timezone', 'EST', importance=0.7)
# Retrieve context
context = memory.get_context(max_turns=5)
print("Context:", context)
# Retrieve from long-term
user_name = memory.retrieve_from_long_term('user_name')
print(f"User name: {user_name}")
Tool Interface Implementation
from typing import Dict, List, Callable, Any
import inspect
class ToolInterface:
"""Tool interface for agent tool management and execution"""
def __init__(self):
self.tools: Dict[str, Dict] = {}
def register_tool(self, name: str, description: str, function: Callable, parameters: Dict):
"""
Register a tool
Parameters:
name: Tool name
description: What the tool does
function: Python function to execute
parameters: Parameter schema (name -> type)
"""
self.tools[name] = {
'name': name,
'description': description,
'function': function,
'parameters': parameters
}
def list_tools(self) -> List[Dict]:
"""List all available tools with descriptions"""
return [
{
'name': tool['name'],
'description': tool['description'],
'parameters': tool['parameters']
}
for tool in self.tools.values()
]
def select_tool(self, task_description: str, llm) -> str:
"""
Use LLM to select appropriate tool
Parameters:
task_description: What needs to be done
llm: Language model for tool selection
Returns:
Selected tool name
"""
available_tools = self.list_tools()
prompt = f"""
Task: {task_description}
Available tools: {json.dumps(available_tools, indent=2)}
Select the most appropriate tool. Return only the tool name.
"""
selected = llm.generate(prompt).strip()
return selected
def execute_tool(self, tool_name: str, parameters: Dict) -> Any:
"""
Execute a tool
Parameters:
tool_name: Name of tool to execute
parameters: Parameters for the tool
Returns:
Tool execution result
"""
if tool_name not in self.tools:
raise ValueError(f"Tool {tool_name} not found")
tool = self.tools[tool_name]
func = tool['function']
# Validate parameters
sig = inspect.signature(func)
required_params = list(sig.parameters.keys())
# Call function with parameters
try:
result = func(**parameters)
return {
'success': True,
'tool': tool_name,
'result': result
}
except Exception as e:
return {
'success': False,
'tool': tool_name,
'error': str(e)
}
# Example tools
def get_weather(city: str) -> str:
"""Get weather for a city"""
# In real implementation, call weather API
return f"Weather in {city}: 72°F, sunny"
def calculate(expression: str) -> float:
"""Evaluate a mathematical expression"""
try:
return eval(expression)
except:
return "Invalid expression"
# Example usage
tool_interface = ToolInterface()
# Register tools
tool_interface.register_tool(
name='get_weather',
description='Get current weather for a city',
function=get_weather,
parameters={'city': 'string'}
)
tool_interface.register_tool(
name='calculate',
description='Calculate a mathematical expression',
function=calculate,
parameters={'expression': 'string'}
)
# List tools
tools = tool_interface.list_tools()
print("Available tools:", tools)
# Execute tool
result = tool_interface.execute_tool('get_weather', {'city': 'New York'})
print("Result:", result)
Planner Implementation
from typing import List, Dict, Any
class Planner:
"""Agent planner for breaking down tasks into steps"""
def __init__(self, llm):
self.llm = llm
def create_plan(self, goal: str, available_tools: List[str], constraints: Dict = None) -> List[Dict]:
"""
Create a plan to achieve goal
Parameters:
goal: The goal to achieve
available_tools: List of available tool names
constraints: Constraints (max_steps, time_limit, etc.)
Returns:
List of planned steps
"""
constraints = constraints or {}
max_steps = constraints.get('max_steps', 10)
prompt = f"""
Goal: {goal}
Available tools: {', '.join(available_tools)}
Maximum steps: {max_steps}
Create a step-by-step plan to achieve this goal.
Return a JSON array of steps, each with:
- step_number: int
- action: string (tool name or "reason")
- description: string
- parameters: dict (if using a tool)
Example:
[
{{"step_number": 1, "action": "search_web", "description": "Search for information", "parameters": {"query": "..."}}},
{{"step_number": 2, "action": "reason", "description": "Analyze results", "parameters": {}}}
]
"""
plan_json = self.llm.generate(prompt)
plan = json.loads(plan_json)
return plan
def refine_plan(self, current_plan: List[Dict], new_information: str) -> List[Dict]:
"""
Refine plan based on new information
Parameters:
current_plan: Current plan steps
new_information: New information that affects the plan
Returns:
Refined plan
"""
prompt = f"""
Current plan: {json.dumps(current_plan, indent=2)}
New information: {new_information}
Refine the plan based on this new information.
Return the updated plan as JSON array.
"""
refined_plan_json = self.llm.generate(prompt)
refined_plan = json.loads(refined_plan_json)
return refined_plan
# Example usage
# planner = Planner(llm=my_llm)
# plan = planner.create_plan(
# goal="Research quantum computing and write summary",
# available_tools=['search_web', 'read_document', 'write_document']
# )
# print("Plan:", plan)
Real-World Applications
🌍 Component Usage in Real Systems
Each component is critical in production agent systems:
1. Customer Support Agents
- LLM Core: Understands customer queries, generates responses
- Memory: Remembers customer history, preferences, past issues
- Tool Interface: Accesses order database, CRM system, knowledge base
- Planner: Creates resolution steps (check order → verify issue → process refund)
- Executor: Executes database queries, updates records
2. Research Agents
- LLM Core: Analyzes research questions, synthesizes information
- Memory: Stores research findings, source citations
- Tool Interface: Uses web search, academic databases, PDF readers
- Planner: Plans research strategy (search → read → extract → synthesize)
- Executor: Executes searches, processes documents
3. Code Generation Agents
- LLM Core: Understands requirements, generates code
- Memory: Remembers codebase patterns, user preferences
- Tool Interface: Uses code editor, compiler, test runner, git
- Planner: Plans development steps (design → implement → test → refactor)
- Executor: Writes files, runs tests, commits code
📊 Component Importance Matrix
📊 Component Criticality by Agent Type
| Component | Simple Agent | Tool Agent | Planning Agent |
|---|---|---|---|
| LLM Core | Critical | Critical | Critical |
| Memory | Optional | Important | Critical |
| Tool Interface | None | Critical | Critical |
| Planner | None | Optional | Critical |
| Executor | None | Important | Critical |