Course Building Agentic AI Systems Chapter 8 Difficulty advanced Estimated Time 600 min

Chapter 8: Planning

Planning in Building Agentic AI Systems.

36% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the agentic AI concept behind Planning.
  • Apply Planning to design reliable, production-grade agent systems.
  • Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Chapter 8: Planning

ReAct loop, Plan-and-Execute, HTN, MCTS, and dynamic replanning

Planning: Decomposing Goals Into Actions

Planning is the process of transforming a high-level goal into an ordered sequence of executable steps. In simple agents, planning is implicit โ€” the LLM decides the next action in the context of current observations. In sophisticated agents, planning is a separate, explicit phase with its own prompt, model, or algorithm.

Implicit Planning (ReAct)

  • Each step: observe โ†’ think โ†’ act
  • No upfront plan document
  • Adapts naturally to surprises
  • Can make locally good but globally poor decisions
  • Good for short-to-medium tasks

Explicit Planning (Plan-and-Execute)

  • Phase 1: generate a plan (full task graph)
  • Phase 2: execute each step
  • Better global coherence
  • Replanning needed when plan fails
  • Good for long-horizon, structured tasks

ReAct: The Default Planning Loop

ReAct (Yao et al., 2022) is the most widely deployed planning pattern. The agent interleaves Thought (explicit reasoning about what to do), Action (tool call), and Observation (tool result) until it decides to emit a final answer.

๐Ÿ’ญ
Thought

"I need to check X before doing Y"

โ†’
๐Ÿ”ง
Action

tool_name(args)

โ†’
๐Ÿ‘
Observation

{result from tool}

โ†’
๐Ÿ”„
Loop

Preventing Infinite Loops

1
Max iterations guardHard cap: if the agent reaches N iterations without a FINISH action, abort and surface an error
2
State change detectionIf the last 3 Thoughts and Actions are identical, the agent is stuck in a loop โ€” break out
3
Tool call deduplicationIf the agent calls the same tool with the same arguments twice, flag it and provide a hint to try a different approach
4
Time budgetHard wall-clock timeout; important for user-facing agents where response time matters

Plan-and-Execute Pattern

For tasks that require many ordered steps, an upfront planning phase produces a better global strategy than reactive step-by-step decisions.

Phase 1
Planner LLM Call Given goal + context โ†’ produce ordered JSON task list
Plan
Step 1: search "topic X" depends_on: []
Step 2: summarize results depends_on: [step_1]
Step 3: write report depends_on: [step_2]
Phase 2
Executor Runs each step; on failure โ†’ triggers Replanner
Replanner Receives updated state โ†’ revises remaining steps
python โ€” plan-and-execute skeleton
from pydantic import BaseModel
from typing import Literal

class PlanStep(BaseModel):
    step_id: str
    description: str
    tool: str
    tool_args: dict
    depends_on: list[str]
    status: Literal["pending", "running", "done", "failed"] = "pending"

class Plan(BaseModel):
    goal: str
    steps: list[PlanStep]

def generate_plan(goal: str, available_tools: list[str], llm_client) -> Plan:
    """Phase 1: ask the LLM to decompose the goal into ordered steps."""
    system = (
        "You are a planning agent. Given a goal and available tools, "
        "produce a JSON execution plan with ordered steps. "
        "Each step must specify exactly which tool to call and with what arguments."
    )
    response = llm_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": f"Goal: {goal}\nTools: {available_tools}"},
        ],
        response_format={"type": "json_object"},
    )
    return Plan.model_validate_json(response.choices[0].message.content)


def execute_plan(plan: Plan, tool_registry: dict, llm_client) -> str:
    """Phase 2: execute each step in dependency order; replan on failure."""
    results: dict[str, str] = {}

    for step in plan.steps:
        # Check dependencies are complete
        if not all(results.get(dep) for dep in step.depends_on):
            continue   # will be retried after dependencies complete

        step.status = "running"
        result = dispatch_tool_call(tool_registry, step.tool, step.tool_args)

        if '"error"' in result:
            step.status = "failed"
            plan = replan(plan, step, result, llm_client)   # dynamic replanning
        else:
            step.status = "done"
            results[step.step_id] = result

    return results.get(plan.steps[-1].step_id, "Plan did not complete")

Advanced Planning: HTN and MCTS

Hierarchical Task Networks (HTN)

HTN planning decomposes goals hierarchically: a high-level task is broken into sub-tasks, which are broken into primitive actions. This mirrors how humans plan โ€” first at a high level ("prepare a report"), then at a mid level ("research, draft, edit"), then at a primitive level ("call search API with query X").

๐ŸŽฏ
Goal

Write report on Topic X

โ†’
๐Ÿ“ฆ
Sub-tasks

Research ยท Draft ยท Edit

โ†’
โš™๏ธ
Primitives

search() ยท write() ยท revise()

MCTS with LLM Guidance

Monte Carlo Tree Search (MCTS) explores the action space as a tree. At each node (state), the LLM proposes candidate next actions (expansion), a fast evaluator scores them (simulation), and high-scoring paths are explored deeper (selection). Used in AlphaApollo and similar systems for tasks where the solution space has strong branching.

Selection

Traverse tree from root using UCB1 formula: balance exploitation (high-score nodes) vs exploration (less-visited nodes)

Expansion

LLM proposes N candidate next actions from the current node's state

Simulation

Fast rollout: run actions to terminal state; score with reward function or LLM-as-judge

Backpropagation

Update scores back up the tree; nodes leading to high-reward outcomes get higher visit priorities

When to use MCTS vs ReAct

ReAct is greedy โ€” it always takes the next best action. MCTS explores alternatives before committing. Use MCTS when: (1) wrong decisions early are costly/irreversible, (2) the task has multiple valid solution paths, (3) you have a reliable reward signal. MCTS is compute-intensive โ€” typical budget is 10โ€“50 simulations per decision node.

Chapter 8 Quiz

1. In Plan-and-Execute, what triggers the Replanner?

2. What is the primary purpose of "Backpropagation" in MCTS?

3. Which state change detection rule helps prevent an agent from being stuck in a ReAct loop?