Chapter 8: Planning
Planning in Building Agentic AI Systems.
Learning Objectives
By the end of this chapter, you will be able to:
- Explain the agentic AI concept behind Planning.
- Apply Planning to design reliable, production-grade agent systems.
- Recognize operational trade-offs in tool use, orchestration, safety, and cost.
Chapter 8: Planning
ReAct loop, Plan-and-Execute, HTN, MCTS, and dynamic replanning
Planning: Decomposing Goals Into Actions
Planning is the process of transforming a high-level goal into an ordered sequence of executable steps. In simple agents, planning is implicit โ the LLM decides the next action in the context of current observations. In sophisticated agents, planning is a separate, explicit phase with its own prompt, model, or algorithm.
Implicit Planning (ReAct)
- Each step: observe โ think โ act
- No upfront plan document
- Adapts naturally to surprises
- Can make locally good but globally poor decisions
- Good for short-to-medium tasks
Explicit Planning (Plan-and-Execute)
- Phase 1: generate a plan (full task graph)
- Phase 2: execute each step
- Better global coherence
- Replanning needed when plan fails
- Good for long-horizon, structured tasks
ReAct: The Default Planning Loop
ReAct (Yao et al., 2022) is the most widely deployed planning pattern. The agent interleaves Thought (explicit reasoning about what to do), Action (tool call), and Observation (tool result) until it decides to emit a final answer.
"I need to check X before doing Y"
tool_name(args)
{result from tool}
Preventing Infinite Loops
Plan-and-Execute Pattern
For tasks that require many ordered steps, an upfront planning phase produces a better global strategy than reactive step-by-step decisions.
from pydantic import BaseModel
from typing import Literal
class PlanStep(BaseModel):
step_id: str
description: str
tool: str
tool_args: dict
depends_on: list[str]
status: Literal["pending", "running", "done", "failed"] = "pending"
class Plan(BaseModel):
goal: str
steps: list[PlanStep]
def generate_plan(goal: str, available_tools: list[str], llm_client) -> Plan:
"""Phase 1: ask the LLM to decompose the goal into ordered steps."""
system = (
"You are a planning agent. Given a goal and available tools, "
"produce a JSON execution plan with ordered steps. "
"Each step must specify exactly which tool to call and with what arguments."
)
response = llm_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": f"Goal: {goal}\nTools: {available_tools}"},
],
response_format={"type": "json_object"},
)
return Plan.model_validate_json(response.choices[0].message.content)
def execute_plan(plan: Plan, tool_registry: dict, llm_client) -> str:
"""Phase 2: execute each step in dependency order; replan on failure."""
results: dict[str, str] = {}
for step in plan.steps:
# Check dependencies are complete
if not all(results.get(dep) for dep in step.depends_on):
continue # will be retried after dependencies complete
step.status = "running"
result = dispatch_tool_call(tool_registry, step.tool, step.tool_args)
if '"error"' in result:
step.status = "failed"
plan = replan(plan, step, result, llm_client) # dynamic replanning
else:
step.status = "done"
results[step.step_id] = result
return results.get(plan.steps[-1].step_id, "Plan did not complete")
Advanced Planning: HTN and MCTS
Hierarchical Task Networks (HTN)
HTN planning decomposes goals hierarchically: a high-level task is broken into sub-tasks, which are broken into primitive actions. This mirrors how humans plan โ first at a high level ("prepare a report"), then at a mid level ("research, draft, edit"), then at a primitive level ("call search API with query X").
Write report on Topic X
Research ยท Draft ยท Edit
search() ยท write() ยท revise()
MCTS with LLM Guidance
Monte Carlo Tree Search (MCTS) explores the action space as a tree. At each node (state), the LLM proposes candidate next actions (expansion), a fast evaluator scores them (simulation), and high-scoring paths are explored deeper (selection). Used in AlphaApollo and similar systems for tasks where the solution space has strong branching.
Selection
Traverse tree from root using UCB1 formula: balance exploitation (high-score nodes) vs exploration (less-visited nodes)
Expansion
LLM proposes N candidate next actions from the current node's state
Simulation
Fast rollout: run actions to terminal state; score with reward function or LLM-as-judge
Backpropagation
Update scores back up the tree; nodes leading to high-reward outcomes get higher visit priorities
When to use MCTS vs ReAct
ReAct is greedy โ it always takes the next best action. MCTS explores alternatives before committing. Use MCTS when: (1) wrong decisions early are costly/irreversible, (2) the task has multiple valid solution paths, (3) you have a reliable reward signal. MCTS is compute-intensive โ typical budget is 10โ50 simulations per decision node.
Chapter 8 Quiz
1. In Plan-and-Execute, what triggers the Replanner?
2. What is the primary purpose of "Backpropagation" in MCTS?
3. Which state change detection rule helps prevent an agent from being stuck in a ReAct loop?