Learning Objectives

By the end of this chapter, you will be able to:

Explain the agentic AI concept behind Planning.
Apply Planning to design reliable, production-grade agent systems.
Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Section 2 — Core Building Blocks

Chapter 8: Planning

ReAct loop, Plan-and-Execute, HTN, MCTS, and dynamic replanning

Planning: Decomposing Goals Into Actions

Planning is the process of transforming a high-level goal into an ordered sequence of executable steps. In simple agents, planning is implicit — the LLM decides the next action in the context of current observations. In sophisticated agents, planning is a separate, explicit phase with its own prompt, model, or algorithm.

Implicit Planning (ReAct)

Each step: observe → think → act
No upfront plan document
Adapts naturally to surprises
Can make locally good but globally poor decisions
Good for short-to-medium tasks

Explicit Planning (Plan-and-Execute)

Phase 1: generate a plan (full task graph)
Phase 2: execute each step
Better global coherence
Replanning needed when plan fails
Good for long-horizon, structured tasks

ReAct: The Default Planning Loop

ReAct (Yao et al., 2022) is the most widely deployed planning pattern. The agent interleaves Thought (explicit reasoning about what to do), Action (tool call), and Observation (tool result) until it decides to emit a final answer.

💭

Thought

"I need to check X before doing Y"

→

🔧

Action

tool_name(args)

→

👁

Observation

{result from tool}

→

🔄

Loop

Preventing Infinite Loops

1
Max iterations guardHard cap: if the agent reaches N iterations without a FINISH action, abort and surface an error
2
State change detectionIf the last 3 Thoughts and Actions are identical, the agent is stuck in a loop — break out
3
Tool call deduplicationIf the agent calls the same tool with the same arguments twice, flag it and provide a hint to try a different approach
4
Time budgetHard wall-clock timeout; important for user-facing agents where response time matters

Plan-and-Execute Pattern

For tasks that require many ordered steps, an upfront planning phase produces a better global strategy than reactive step-by-step decisions.

Phase 1

Planner LLM Call Given goal + context → produce ordered JSON task list

Plan

Step 1: search "topic X" depends_on: []

Step 2: summarize results depends_on: [step_1]

Step 3: write report depends_on: [step_2]

Phase 2

Executor Runs each step; on failure → triggers Replanner

Replanner Receives updated state → revises remaining steps

python — plan-and-execute skeleton

from pydantic import BaseModel
from typing import Literal

class PlanStep(BaseModel):
    step_id: str
    description: str
    tool: str
    tool_args: dict
    depends_on: list[str]
    status: Literal["pending", "running", "done", "failed"] = "pending"

class Plan(BaseModel):
    goal: str
    steps: list[PlanStep]

def generate_plan(goal: str, available_tools: list[str], llm_client) -> Plan:
    """Phase 1: ask the LLM to decompose the goal into ordered steps."""
    system = (
        "You are a planning agent. Given a goal and available tools, "
        "produce a JSON execution plan with ordered steps. "
        "Each step must specify exactly which tool to call and with what arguments."
    )
    response = llm_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": f"Goal: {goal}\nTools: {available_tools}"},
        ],
        response_format={"type": "json_object"},
    )
    return Plan.model_validate_json(response.choices[0].message.content)


def execute_plan(plan: Plan, tool_registry: dict, llm_client) -> str:
    """Phase 2: execute each step in dependency order; replan on failure."""
    results: dict[str, str] = {}

    for step in plan.steps:
        # Check dependencies are complete
        if not all(results.get(dep) for dep in step.depends_on):
            continue   # will be retried after dependencies complete

        step.status = "running"
        result = dispatch_tool_call(tool_registry, step.tool, step.tool_args)

        if '"error"' in result:
            step.status = "failed"
            plan = replan(plan, step, result, llm_client)   # dynamic replanning
        else:
            step.status = "done"
            results[step.step_id] = result

    return results.get(plan.steps[-1].step_id, "Plan did not complete")

Advanced Planning: HTN and MCTS

Hierarchical Task Networks (HTN)

HTN planning decomposes goals hierarchically: a high-level task is broken into sub-tasks, which are broken into primitive actions. This mirrors how humans plan — first at a high level ("prepare a report"), then at a mid level ("research, draft, edit"), then at a primitive level ("call search API with query X").

🎯

Goal

Write report on Topic X

→

📦

Sub-tasks

Research · Draft · Edit

→

⚙️

Primitives

search() · write() · revise()

MCTS with LLM Guidance

Monte Carlo Tree Search (MCTS) explores the action space as a tree. At each node (state), the LLM proposes candidate next actions (expansion), a fast evaluator scores them (simulation), and high-scoring paths are explored deeper (selection). Used in AlphaApollo and similar systems for tasks where the solution space has strong branching.

Selection

Traverse tree from root using UCB1 formula: balance exploitation (high-score nodes) vs exploration (less-visited nodes)

Expansion

LLM proposes N candidate next actions from the current node's state

Simulation

Fast rollout: run actions to terminal state; score with reward function or LLM-as-judge

Backpropagation

Update scores back up the tree; nodes leading to high-reward outcomes get higher visit priorities

When to use MCTS vs ReAct

ReAct is greedy — it always takes the next best action. MCTS explores alternatives before committing. Use MCTS when: (1) wrong decisions early are costly/irreversible, (2) the task has multiple valid solution paths, (3) you have a reliable reward signal. MCTS is compute-intensive — typical budget is 10–50 simulations per decision node.