Chapter 5: Tool Use
Tool Use in Building Agentic AI Systems.
Learning Objectives
By the end of this chapter, you will be able to:
- Explain the agentic AI concept behind Tool Use.
- Apply Tool Use to design reliable, production-grade agent systems.
- Recognize operational trade-offs in tool use, orchestration, safety, and cost.
Chapter 5: Tool Use
Anatomy, schemas, parallel calls, failures, and integration
Tool Use — The Agent's Hands
Tool use is what transforms a language model into an agent that can affect the world. Without tools, an agent can only generate text. With tools, it can search the web, run code, call APIs, read/write files, query databases, and trigger workflows.
How tool calling works at inference time
The model receives the tool schemas alongside the conversation. When it decides to use a tool, it generates a structured JSON object containing the tool name and arguments instead of a free-form text response. Your application code intercepts that JSON, executes the real function, and returns the result as a new message. The model never directly executes anything — it only decides what to call.
Generates JSON call
Parses & routes
Executes
Returned as tool message
Reads result, continues
Tool Anatomy
A well-designed tool has three parts: a clear name, a precise description, and a strict parameter schema. The description is read by the LLM at inference time — it is documentation for the model, not for developers.
from pydantic import BaseModel, Field
from typing import Literal
# Use Pydantic for schema generation — avoids JSON schema hand-rolling
class WebSearchInput(BaseModel):
query: str = Field(
description="Specific, focused search query. Be precise — vague queries return poor results."
)
max_results: int = Field(
default=5,
ge=1, le=20,
description="Number of results to retrieve. Use 3-5 for general queries."
)
def search_web(query: str, max_results: int = 5) -> str:
"""
Search the web for current information not in the training data.
Use for: current events, real-time data, recent publications.
Do NOT use for: mathematical computation, code execution, or static factual knowledge.
Returns: top results as formatted text with titles, URLs, and snippets.
"""
# Implementation: call search API (Brave, Serper, Tavily, etc.)
results = _call_search_api(query, max_results)
return _format_results(results)
# Convert to OpenAI tool schema
import openai_schema_generator # or build manually
TOOLS = [
{
"type": "function",
"function": {
"name": "search_web",
"description": search_web.__doc__,
"parameters": WebSearchInput.model_json_schema(),
"strict": True, # Forces exact schema adherence (OpenAI structured outputs)
}
}
]
Tool Design Rules
{"error": "...", "retryable": true} insteadTool Call Flow & Failure Handling
import json
import time
from typing import Any
def dispatch_tool_call(
tool_registry: dict[str, callable],
tool_name: str,
raw_arguments: str,
max_retries: int = 2,
retry_delay: float = 1.0,
) -> str:
# 1. Parse arguments — catch malformed JSON from the LLM
try:
args = json.loads(raw_arguments)
except json.JSONDecodeError as e:
return json.dumps({"error": f"Invalid tool arguments: {e}", "retryable": False})
# 2. Resolve tool — catch hallucinated tool names
tool_fn = tool_registry.get(tool_name)
if tool_fn is None:
return json.dumps({
"error": f"Unknown tool '{tool_name}'. Available: {list(tool_registry.keys())}",
"retryable": False,
})
# 3. Execute with retry for transient failures (rate limits, timeouts)
for attempt in range(max_retries + 1):
try:
result = tool_fn(**args)
return result if isinstance(result, str) else json.dumps(result)
except RateLimitError:
if attempt < max_retries:
time.sleep(retry_delay * (2 ** attempt)) # exponential back-off
continue
return json.dumps({"error": "Rate limit exceeded after retries", "retryable": True})
except TimeoutError:
return json.dumps({"error": "Tool timed out", "retryable": True})
except Exception as e:
return json.dumps({"error": str(e), "retryable": False})
Never let a tool exception propagate to the LLM loop
If a tool raises an uncaught exception, your agent loop crashes and the user sees an error. Wrap every tool execution in a try/except and return a structured error message. The LLM can then decide to retry, use an alternative tool, or report the problem gracefully.
Parallel vs Sequential Tool Calls
Modern LLMs (GPT-4o, Claude 3.5+) can request multiple tool calls in a single response turn. This is called parallel function calling. When the calls are independent, it can cut total latency dramatically.
Sequential
- Call 1 → wait → Call 2 → wait → Call 3
- Total latency = sum of all tool latencies
- Necessary when: Call 2 depends on Call 1's result
- Example: search → read top result → summarize
Parallel
- Calls 1, 2, 3 dispatched simultaneously
- Total latency = max(individual latencies)
- Necessary when: all calls are independent
- Example: search 3 topics simultaneously for a report
import asyncio
from openai.types.chat import ChatCompletionMessageToolCall
async def execute_parallel_tools(
tool_calls: list[ChatCompletionMessageToolCall],
tool_registry: dict[str, callable],
) -> list[dict]:
"""Execute all tool calls concurrently; return results in the same order."""
async def _run_one(call: ChatCompletionMessageToolCall) -> dict:
result = await asyncio.to_thread(
dispatch_tool_call, tool_registry, call.function.name, call.function.arguments
)
return {"role": "tool", "tool_call_id": call.id, "content": result}
return await asyncio.gather(*[_run_one(c) for c in tool_calls])
Tool selection when the list is large
Models perform poorly when given more than ~30 tools at once — the schema descriptions overload the context and the model struggles to select correctly. For large tool sets (50+), use semantic tool routing: embed the user query, find the top-K most semantically relevant tool descriptions, and only include those K tools in the current turn's context.
Chapter 5 Quiz
1. When the LLM "calls a tool," what actually executes the function?
2. Why should a tool's description specify when NOT to use it?
3. When are parallel tool calls NOT appropriate?