Learning Objectives

By the end of this chapter, you will be able to:

Explain the agentic AI concept behind What is an Agent?.
Apply What is an Agent? to design reliable, production-grade agent systems.
Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Section 1 — Foundations & Mental Models

Chapter 1: What is an Agent?

From prompt-response to goal-directed autonomous systems

What is an Agent?

Definition

An agent is a system that perceives its environment, reasons about a goal, chooses actions from a set of tools, executes them, and uses the results to continue until the goal is met — without human intervention at each step.

Every LLM you have used is stateless by default: you send a prompt, you get a response. The LLM does not remember the exchange tomorrow, cannot search the web, cannot write a file, and cannot retry when something fails. Agents break all four of those constraints.

Four Defining Properties

1

                                Perceive
                                Receive observations from the environment — user messages, tool outputs, API responses, file contents
                            
2

                                Reason
                                Use an LLM to interpret the observation, recall relevant memory, and decide what to do next
                            
3

                                Plan
                                Decompose a complex goal into an ordered sequence of sub-tasks and contingencies
                            
4

                                Act
                                Execute a tool call, send a message, write to storage, or signal task completion
                            

Why this matters now

Function calling was available in GPT-4 in 2023. What changed in 2024–2026 is reliability: reasoning models like o1, o3, and DeepSeek-R1 lowered the error rate per step enough that long multi-step tasks became practical in production.

The Agent Loop

Every agent, regardless of its sophistication, runs some variation of this loop. Understanding it deeply is the foundation for reasoning about every design decision in later chapters.

🎯

Goal

High-level task

↓

👁

Observe

Read environment

↑

LOOP

↓

⚡

Act

Execute tool / output

↑

🧠

Think & Plan

Reason, decide action

The loop runs until the agent emits a FINISH signal or reaches a step/time limit.

What can go wrong in the loop?

Failure Mode	Typical Cause	Mitigation
Infinite loop	No convergence signal, LLM repeats same action	Max iterations + state change check
Hallucinated tool result	LLM infers result instead of calling the tool	Force tool-use mode; don't allow free-form text as "observations"
Context overflow	History grows unbounded across iterations	Summarize or evict old observations (Chapter 9)
Irreversible action	Agent deletes or sends without confirmation	Human-in-the-loop gate for write operations (Chapter 16)

Agent vs Chatbot

The distinction is not about model size or intelligence — it is about who executes side effects.

Chatbot / LLM

Single-turn input → output
No persistent memory across sessions
No tool execution (unless explicitly added)
Goal is to generate good text
User drives every step
Deterministic control flow

Agent

Multi-step, goal-directed loop
Persistent memory (episodic, semantic)
Calls tools, APIs, databases
Goal is to complete a task
Agent drives its own steps
Emergent, conditional control flow

When an agent is the wrong choice

Not every task benefits from an agent loop. If the problem can be solved in a single, well-structured prompt, adding an agent loop only introduces latency, cost, and failure surfaces. Use an agent when the task genuinely requires multiple steps, external data, or decisions based on intermediate results.

Two Foundational Paradigms

Modern agentic architectures fall into two lineages that differ in how they represent knowledge and make decisions. Most production systems blend both.

Symbolic / Classical

Explicit rules and logic
Planning via search (A*, STRIPS, HTN)
Deterministic, auditable
Brittle under distribution shift
Dominates safety-critical domains (healthcare, aviation)

Neural / Generative

LLM-driven reasoning and generation
Planning via prompt instructions
Flexible, adaptive, natural language
Hallucination and non-determinism
Dominates data-rich, adaptive domains

Neuro-Symbolic (Hybrid)

LLM for understanding + symbolic planner for execution
Combines flexibility with verifiability
Emerging research area (2025–)
Best of both paradigms
Future direction for high-stakes agents

python — minimal agent skeleton

from openai import OpenAI

client = OpenAI()
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"]
            }
        }
    }
]

messages = [{"role": "user", "content": "What is the latest GPT-5 benchmark result?"}]

while True:
    response = client.chat.completions.create(
        model="gpt-4o", messages=messages, tools=tools
    )
    choice = response.choices[0]

    # Agent decided to call a tool
    if choice.finish_reason == "tool_calls":
        for call in choice.message.tool_calls:
            result = execute_tool(call.function.name, call.function.arguments)
            messages.append({"role": "tool", "content": result, "tool_call_id": call.id})

    # Agent decided it has enough information
    elif choice.finish_reason == "stop":
        print(choice.message.content)
        break   # Exit the loop — task complete

Chapter 1 takeaway

An agent is a loop. The LLM is the brain, tools are the hands, and memory is what makes it coherent across steps. Everything in the remaining 21 chapters is about making that loop reliable, safe, and scalable.