Chapter 3: Tool-Using Agents
Agents That Can Use Tools
Learning Objectives
- Understand tool-using agents fundamentals
- Master the mathematical foundations
- Learn practical implementation
- Apply knowledge through examples
- Recognize real-world applications
Tool-Using Agents
What are Tool-Using Agents?
Tool-using agents can interact with external systems through function calls, APIs, and tools. This capability transforms agents from simple text generators into powerful autonomous systems that can affect the real world.
Think of tool-using agents like a Swiss Army knife:
- Traditional LLM: Like a single tool - can only generate text
- Tool-Using Agent: Like a multi-tool - can use different tools for different tasks
- Key advantage: Can access real-time data, perform calculations, interact with systems
🔧 Tool-Using Agent Architecture
User Request
"Get weather in NYC"
🧠 LLM Reasoning
Analyzes request → Decides: "Need weather tool"
Tool 1
Weather API
Tool 2
Calculator
Tool 3
Web Search
⚡ Tool Execution
Calls: get_weather("New York") → Returns: "72°F, sunny"
Response
"Weather in NYC: 72°F, sunny"
Why Tools Matter
Tools extend agent capabilities beyond text generation:
- Real-time data: Access current information (weather, stock prices, news)
- Computations: Perform calculations, data analysis
- System integration: Interact with databases, APIs, software systems
- Actions: Send emails, create files, update records
Key Concepts
🔑 Function Calling Mechanism
Function calling allows LLMs to request tool execution:
How Function Calling Works
- Tool Definition: Define tools with name, description, and parameters
- LLM Decision: LLM analyzes request and decides which tool to use
- Function Call: LLM generates function call with parameters
- Tool Execution: System executes the function
- Result Integration: Tool result is fed back to LLM
- Response Generation: LLM generates final response using tool result
🔧 Function Calling Flow Diagram
1. Tool Def
Define tools
2. LLM Select
Choose tool
3. Execute
Run function
4. Response
Generate answer
Tool Definition Schema
Tools must be properly defined for LLMs to understand and use them:
Tool Schema Components
- name: Unique identifier for the tool
- description: What the tool does (critical for LLM selection!)
- parameters: Input parameters with types and descriptions
- returns: What the tool returns
Example Tool Definition
{
"name": "get_weather",
"description": "Get current weather for a specific city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g., 'New York'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
},
"returns": {
"type": "object",
"properties": {
"temperature": {"type": "number"},
"condition": {"type": "string"},
"humidity": {"type": "number"}
}
}
}
Tool Selection Process
The LLM selects tools based on task requirements:
Selection Criteria
- Task match: Does tool description match the task?
- Parameter availability: Can we provide required parameters?
- Relevance: Is this the best tool for the job?
Tool Selection Decision Tree
User Request
LLM Analyzes Request
Weather?
Calculate?
Search?
Selected Tool
get_weather(city="New York")
⚙️ Error Handling in Tool Usage
Agents must handle tool execution errors gracefully:
Common Error Scenarios
- Tool not found: Requested tool doesn't exist
- Invalid parameters: Wrong parameter types or missing required params
- Tool failure: Tool execution fails (API error, network issue)
- Timeout: Tool takes too long to execute
Error Handling Strategies
- Retry: Retry failed operations (with backoff)
- Fallback: Use alternative tool or approach
- Clarification: Ask user for more information
- Graceful degradation: Provide partial answer if possible
Mathematical Formulations
Tool Selection Probability
What This Measures
This formula calculates the probability that a specific tool is the right choice for a given user request. It uses semantic similarity between the request and tool descriptions to determine how well each tool matches the user's need, then converts similarity scores into probabilities using softmax. The tool with highest probability is selected.
Breaking It Down
- request: User's request - the natural language query or instruction from the user (e.g., "What's the weather in New York?", "Calculate 15 * 23", "Search for quantum computing articles"). This is what the agent needs to fulfill.
- tool.description: Tool description - a text description of what each tool does, its purpose, and when to use it (e.g., "Get current weather for a city", "Perform mathematical calculations", "Search the web for information"). Descriptions are typically provided to the LLM to help it understand tool capabilities.
- similarity(request, tool.description): Semantic similarity score - a numerical value (typically 0-1) measuring how semantically similar the request is to the tool description. Higher scores indicate better matches. This is typically calculated using embedding models (cosine similarity between embeddings) or LLM-based scoring.
- softmax(...): Softmax function - converts raw similarity scores into probabilities that sum to 1. Softmax ensures that: all probabilities are between 0 and 1, probabilities sum to 1 (one tool must be selected), and higher similarity scores get higher probabilities. Formula: \(P_i = \frac{e^{s_i}}{\sum_{j=1}^{n} e^{s_j}}\) where \(s_i\) is similarity score for tool i.
- P(tool = t | ...): Final probability - the probability that tool t is the correct choice. The agent selects the tool with maximum probability.
Where This Is Used
This probability calculation happens during the "Reason" step when the agent determines it needs to use a tool. The agent: (1) embeds the user request and all tool descriptions, (2) calculates similarity scores between request and each tool, (3) applies softmax to get probabilities, (4) selects the tool with highest probability. This enables intelligent tool selection based on semantic understanding rather than keyword matching.
Why This Matters
Semantic similarity-based tool selection is crucial for handling natural language requests. Users don't always use exact keywords that match tool names - they might say "weather" when the tool is called "get_weather", or "calculate" when the tool is "math_calculator". By using semantic similarity, agents can match user intent to tool capabilities even when wording differs. The softmax ensures probabilities are properly normalized and the selection is probabilistic (allowing for uncertainty handling), making tool selection robust and intelligent.
Example Calculation
Given:
- request = "What's the weather like in New York?"
- available_tools = ["get_weather", "search_web", "calculate", "send_email"]
- tool descriptions = ["Get current weather for a city", "Search the internet", "Perform calculations", "Send email messages"]
Step 1: Calculate similarity scores:
- similarity(request, "Get current weather for a city") = 0.92 (very high - direct match)
- similarity(request, "Search the internet") = 0.45 (moderate - could find weather info)
- similarity(request, "Perform calculations") = 0.08 (very low - not relevant)
- similarity(request, "Send email messages") = 0.05 (very low - not relevant)
Step 2: Apply softmax to get probabilities:
- P(get_weather) = e^0.92 / (e^0.92 + e^0.45 + e^0.08 + e^0.05) = 2.51 / (2.51 + 1.57 + 1.08 + 1.05) = 2.51 / 6.21 = 0.40
- P(search_web) = 1.57 / 6.21 = 0.25
- P(calculate) = 1.08 / 6.21 = 0.17
- P(send_email) = 1.05 / 6.21 = 0.17
Result: P(get_weather) = 0.40 (highest probability)
Interpretation: The semantic similarity correctly identified that "get_weather" is the most appropriate tool, even though the user said "weather like" rather than using the exact tool name. The softmax normalized the scores into proper probabilities, with get_weather having the highest probability (0.40) despite other tools having non-zero probabilities. This demonstrates how semantic understanding enables robust tool selection.
Tool Execution Function
What This Measures
This function executes a tool with given parameters and returns either a successful result or an error message. It encapsulates the actual execution of tool functions, handling both success and failure cases. This is the mechanism by which agents interact with external systems and perform actions beyond text generation.
Breaking It Down
- tool: The tool function to execute - the actual function, API call, or capability that will be invoked (e.g., get_weather function, web search API, calculator function). This is the executable code that performs the action.
- parameters: Input parameters for the tool - the arguments needed to execute the tool (e.g., {"city": "New York", "unit": "Celsius"} for weather tool, {"query": "quantum computing"} for search tool). Parameters are typically extracted from the user request by the LLM.
- tool(params): Tool execution with parameters - calling the tool function with the provided parameters. This is the actual execution step where the tool performs its operation (API call, computation, database query, etc.).
- success case: If execution succeeds, the function returns the tool's output result. This could be: data (weather data, search results), computed values (calculation results), status messages (email sent, file saved), or any other tool-specific output. The result is then used by the agent to inform its next decision.
- error case: If execution fails, the function returns an error message. Failures can occur due to: invalid parameters, API failures, network issues, rate limits, authentication problems, or tool-specific errors. Error messages help the agent understand what went wrong and decide how to proceed (retry, use alternative tool, ask for clarification).
Where This Is Used
This function is called during the "Act" step of the agent loop after the agent has selected a tool. The execution happens synchronously (waits for result) or asynchronously (handles result when ready), depending on the tool type. The result (success or error) is then observed by the agent, which uses it to update state and make the next decision.
Why This Matters
Tool execution is what enables agents to take actions beyond text generation. Without this function, agents would be limited to generating text based on training data. Tool execution allows agents to: access real-time information (weather APIs, databases), perform computations (calculators, data processing), interact with systems (send emails, update databases), and affect the external world. The error handling is crucial - agents must gracefully handle failures, retry when appropriate, and adapt their behavior when tools don't work as expected. This robustness is essential for production agent systems.
Example Calculation
Given:
- tool = get_weather function
- parameters = {"city": "New York", "unit": "Celsius"}
Case 1: Success
- Execute(get_weather, {"city": "New York", "unit": "Celsius"})
- Tool executes: calls weather API with parameters
- API returns: {"temp": 22, "condition": "sunny", "humidity": 65}
- result = {"success": true, "data": {"temp": 22, "condition": "sunny", "humidity": 65}}
Case 2: Error
- Execute(get_weather, {"city": "InvalidCity123", "unit": "Celsius"})
- Tool executes: calls weather API
- API returns: {"error": "City not found"}
- result = {"success": false, "error": "City not found: InvalidCity123"}
Interpretation: In Case 1, the tool execution succeeded and returned weather data, which the agent can use to answer the user. In Case 2, the execution failed due to invalid parameters, and the error message helps the agent understand the problem and potentially ask the user for clarification or try a different approach. This demonstrates how tool execution enables agents to interact with external systems and handle both success and failure cases.
Tool Result Integration
What This Measures
This function generates the agent's final response by combining the original user request, tool execution results, and conversation context. It takes the raw tool output and synthesizes it into a natural language response that addresses the user's question. This is how agents transform tool results into human-readable answers.
Breaking It Down
- request: Original user request - the initial question or instruction from the user (e.g., "What's the weather in New York?", "Calculate 15 * 23"). This provides the context of what the user wants to know.
- tool_result: Result from tool execution - the output from the tool that was called (e.g., weather data {"temp": 22, "condition": "sunny"}, calculation result {"answer": 345}, search results [list of articles]). This is the raw data that needs to be interpreted and presented to the user.
- context: Conversation context and memory - additional information including: previous conversation turns, user preferences from memory, relevant facts, and any other contextual information that helps generate a better response. Context ensures the response is coherent with the conversation and personalized to the user.
- LLM(...): Language model generation - the LLM processes all inputs (request, tool_result, context) and generates a natural language response. The LLM: understands the user's question, interprets the tool result, incorporates context, and synthesizes everything into a coherent answer.
- response: Final generated response - the natural language answer that addresses the user's request using the tool result and context. The response should be: accurate (based on tool result), relevant (addresses the request), coherent (makes sense in context), and helpful (provides value to the user).
Where This Is Used
This function is called after tool execution succeeds, during the response generation phase of the agent loop. The agent: (1) receives the tool result, (2) retrieves relevant context from memory, (3) constructs a prompt with request, tool_result, and context, (4) calls the LLM to generate the response, (5) returns the response to the user. This happens in the final step before the agent completes its task.
Why This Matters
Tool result integration is what makes agents useful to humans. Raw tool outputs (JSON data, API responses, calculation results) are not user-friendly - they need to be interpreted and presented in natural language. This function enables agents to: transform technical data into understandable responses, incorporate context to make answers relevant and personalized, handle edge cases (errors, missing data) gracefully, and provide explanations along with data. Without proper integration, tool results would be meaningless to users - this function bridges the gap between tool outputs and human understanding.
Example Calculation
Given:
- request = "What's the weather in New York?"
- tool_result = {"temp": 22, "condition": "sunny", "humidity": 65, "unit": "Celsius"}
- context = {"memory": ["User prefers Celsius", "Last asked about weather yesterday"], "conversation": ["User: What's the weather?"]}
Step 1: LLM receives inputs: request, tool_result, context
Step 2: LLM processes: understands user wants weather, sees temperature is 22°C, notes user prefers Celsius, sees it's sunny
Step 3: LLM generates response incorporating all information
Result: response = "The weather in New York is currently 22°C (72°F) and sunny with 65% humidity. It's a pleasant day!"
Interpretation: The LLM took the raw tool result (JSON data) and transformed it into a natural, informative response. It used the context (user prefers Celsius) to present temperature in Celsius first, and added a friendly comment ("pleasant day") to make the response more conversational. This demonstrates how tool result integration creates user-friendly responses from technical data.
Detailed Examples
Example 1: Weather Agent - Complete Tool Usage
Task: "What's the weather in New York and Paris?"
Multi-Tool Execution Flow
| Step | Action | Tool Call | Result |
|---|---|---|---|
| 1 | LLM analyzes request | - | Decides: Need 2 weather calls |
| 2 | Call weather tool | get_weather("New York") | 72°F, sunny |
| 3 | Call weather tool | get_weather("Paris") | 65°F, cloudy |
| 4 | LLM generates response | - | "NYC: 72°F sunny, Paris: 65°F cloudy" |
Example 2: Calculator Agent - Tool Chaining
Task: "Calculate (15 + 27) × 3 and convert to binary"
🔗 Tool Chaining Example
Step 1
calculate("15+27")
→ 42
Step 2
calculate("42*3")
→ 126
Step 3
convert_to_binary(126)
→ "1111110"
Key: Agent chains multiple tools, using output of one tool as input to the next!
Implementation
Complete Tool-Using Agent Implementation
from typing import Dict, List, Any, Callable
import json
from enum import Enum
class ToolExecutionStatus(Enum):
SUCCESS = "success"
ERROR = "error"
TIMEOUT = "timeout"
class ToolUsingAgent:
"""Agent that can use tools to interact with external systems"""
def __init__(self, llm):
self.llm = llm
self.tools: Dict[str, Dict] = {}
self.tool_execution_history = []
def register_tool(self, name: str, description: str, function: Callable,
parameters: Dict, returns: Dict = None):
"""
Register a tool for the agent to use
Parameters:
name: Tool name
description: What the tool does (critical for LLM selection!)
function: Python function to execute
parameters: Parameter schema
returns: Return type schema
"""
self.tools[name] = {
'name': name,
'description': description,
'function': function,
'parameters': parameters,
'returns': returns
}
def get_tool_descriptions(self) -> str:
"""Get formatted tool descriptions for LLM"""
descriptions = []
for tool in self.tools.values():
desc = f"- {tool['name']}: {tool['description']}"
if tool['parameters']:
params = ', '.join(tool['parameters'].keys())
desc += f" (parameters: {params})"
descriptions.append(desc)
return '\n'.join(descriptions)
def select_tool(self, user_request: str) -> Dict:
"""
Use LLM to select appropriate tool
Returns:
Dict with tool name and parameters
"""
tool_descriptions = self.get_tool_descriptions()
prompt = f"""
User request: {user_request}
Available tools:
{tool_descriptions}
Select the most appropriate tool and parameters.
Return JSON: {{"tool": "tool_name", "parameters": {"param": "value"}}}
"""
response = self.llm.generate(prompt)
decision = json.loads(response)
return decision
def execute_tool(self, tool_name: str, parameters: Dict) -> Dict:
"""
Execute a tool
Returns:
Dict with status and result/error
"""
if tool_name not in self.tools:
return {
'status': ToolExecutionStatus.ERROR,
'error': f'Tool {tool_name} not found',
'result': None
}
tool = self.tools[tool_name]
func = tool['function']
try:
# Execute tool
result = func(**parameters)
execution_record = {
'tool': tool_name,
'parameters': parameters,
'status': ToolExecutionStatus.SUCCESS,
'result': result,
'timestamp': datetime.now().isoformat()
}
self.tool_execution_history.append(execution_record)
return {
'status': ToolExecutionStatus.SUCCESS,
'result': result,
'error': None
}
except Exception as e:
execution_record = {
'tool': tool_name,
'parameters': parameters,
'status': ToolExecutionStatus.ERROR,
'error': str(e),
'timestamp': datetime.now().isoformat()
}
self.tool_execution_history.append(execution_record)
return {
'status': ToolExecutionStatus.ERROR,
'result': None,
'error': str(e)
}
def process_with_tools(self, user_request: str, max_iterations: int = 5) -> str:
"""
Process user request using tools
Parameters:
user_request: User's request
max_iterations: Maximum tool calls allowed
Returns:
Final response
"""
context = []
for iteration in range(max_iterations):
# Step 1: Select tool
decision = self.select_tool(user_request)
tool_name = decision.get('tool')
parameters = decision.get('parameters', {})
if not tool_name or tool_name == 'respond':
# LLM decided to respond directly
prompt = f"""
User request: {user_request}
Context: {json.dumps(context, indent=2)}
Generate a response.
"""
response = self.llm.generate(prompt)
return response
# Step 2: Execute tool
execution_result = self.execute_tool(tool_name, parameters)
context.append({
'tool': tool_name,
'result': execution_result
})
# Step 3: Check if we have enough information
if execution_result['status'] == ToolExecutionStatus.SUCCESS:
# Generate response using tool result
prompt = f"""
User request: {user_request}
Tool result: {json.dumps(execution_result['result'], indent=2)}
Generate a response using the tool result.
"""
response = self.llm.generate(prompt)
return response
else:
# Tool failed, try again or ask for clarification
user_request = f"Tool {tool_name} failed: {execution_result['error']}. Please try a different approach."
continue
return "Agent reached maximum iterations without completing the task."
# Example tools
def get_weather(city: str) -> Dict:
"""Get weather for a city"""
# In real implementation, call weather API
return {"temperature": 72, "condition": "sunny", "city": city}
def calculate(expression: str) -> float:
"""Calculate mathematical expression"""
try:
return eval(expression)
except:
raise ValueError(f"Invalid expression: {expression}")
# Example usage
# agent = ToolUsingAgent(llm=my_llm)
# Register tools
# agent.register_tool(
# name='get_weather',
# description='Get current weather for a city',
# function=get_weather,
# parameters={'city': 'string'}
# )
# agent.register_tool(
# name='calculate',
# description='Calculate a mathematical expression',
# function=calculate,
# parameters={'expression': 'string'}
# )
# Process request
# response = agent.process_with_tools("What's the weather in New York?")
# print(response)
Error Handling in Tool Execution
class RobustToolExecutor:
"""Tool executor with error handling and retry logic"""
def __init__(self, max_retries: int = 3, retry_delay: float = 1.0):
self.max_retries = max_retries
self.retry_delay = retry_delay
def execute_with_retry(self, tool_func: Callable, parameters: Dict) -> Dict:
"""
Execute tool with retry logic
Parameters:
tool_func: Tool function to execute
parameters: Parameters for the function
Returns:
Execution result with status
"""
last_error = None
for attempt in range(self.max_retries):
try:
result = tool_func(**parameters)
return {
'success': True,
'result': result,
'attempts': attempt + 1
}
except Exception as e:
last_error = e
if attempt < self.max_retries - 1:
time.sleep(self.retry_delay * (attempt + 1)) # Exponential backoff
continue
# All retries failed
return {
'success': False,
'error': str(last_error),
'attempts': self.max_retries
}
def execute_with_fallback(self, primary_tool: Callable, fallback_tool: Callable,
parameters: Dict) -> Dict:
"""
Execute tool with fallback
Parameters:
primary_tool: Primary tool to try
fallback_tool: Fallback tool if primary fails
parameters: Parameters for tools
Returns:
Execution result
"""
# Try primary tool
result = self.execute_with_retry(primary_tool, parameters)
if result['success']:
return result
# Primary failed, try fallback
print(f"Primary tool failed, trying fallback...")
fallback_result = self.execute_with_retry(fallback_tool, parameters)
if fallback_result['success']:
fallback_result['used_fallback'] = True
return fallback_result
# Both failed
return {
'success': False,
'error': f"Both primary and fallback tools failed",
'primary_error': result.get('error'),
'fallback_error': fallback_result.get('error')
}
# Example usage
# executor = RobustToolExecutor(max_retries=3)
# result = executor.execute_with_retry(get_weather, {'city': 'New York'})
# print(result)
Real-World Applications
🌍 Tool-Using Agents in Production
Tool-using agents power many real-world applications:
1. Customer Support Agents
- Tools: Order database, CRM system, knowledge base
- Use cases: Check order status, process returns, answer questions
- Example: "Check my order #12345" → Agent calls order_lookup tool
2. Research Agents
- Tools: Web search, academic databases, PDF readers
- Use cases: Research topics, gather information, cite sources
- Example: "Research quantum computing" → Agent uses search and read tools
3. Data Analysis Agents
- Tools: Database connectors, statistical libraries, visualization tools
- Use cases: Query data, analyze trends, create charts
- Example: "Analyze sales data" → Agent queries DB, runs analysis, creates visualizations
4. Code Generation Agents
- Tools: Code editor, compiler, test runner, git
- Use cases: Write code, run tests, commit changes
- Example: "Create a REST API" → Agent writes code, tests it, commits to git
Tool Categories
Common Tool Categories
🌐 Information Tools
- Web search
- Database queries
- API calls
🔢 Computation Tools
- Calculator
- Data analysis
- Statistical functions
💼 Business Tools
- CRM systems
- Email systems
- Calendar APIs
Development Tools
- Code editors
- Version control
- Build systems