Course Building Agentic AI Systems Chapter 9 Difficulty advanced Estimated Time 600 min

Chapter 9: Context Management

Context Management in Building Agentic AI Systems.

41% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the agentic AI concept behind Context Management.
  • Apply Context Management to design reliable, production-grade agent systems.
  • Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Chapter 9: Context Management

Budget, summarization, system prompt design, and instruction hierarchy

Context is the Agent's Working Memory

The context window is both the most important and most constrained resource in an agent. Everything the LLM sees — the system prompt, tool schemas, conversation history, memory retrievals, tool results — must fit inside it. Managing what goes in is a first-class engineering concern.

Cost reality check

A GPT-4o-mini agent with a 16K token context per turn, running 10 turns to complete a task, with 100 tasks per day = 16M tokens/day just for input. At current pricing, context management decisions directly translate to infrastructure cost. A 40% reduction in average context size = 40% cost reduction for the same throughput.

Context Budget Management

Budget management is the practice of deciding, for each turn, what to include in the context window and at what token cost.

Fixed
System Prompt ~500–2000 tokens · Constant per request
Tool Schemas ~50–200 tokens per tool · Semi-constant
Dynamic
Memory Retrievals ~500–3000 tokens · Varies with query
Recent Conversation ~1000–8000 tokens · Grows with turns
Tool Results ~200–5000 tokens per call · Varies with tool

Eviction Strategies

StrategyWhat gets evictedTrade-off
Sliding windowOldest N messagesSimple; loses early context that may still be relevant
SummarizationOldest N messages → replaced by summaryPreserves information; costs 1 LLM call per eviction
Importance scoringMessages with lowest relevance scoreBest quality; requires scoring (embedding similarity or LLM)
Tool result compressionFull tool output → compressed key-valueMajor token savings; requires extraction step

System Prompt Design for Agents

The system prompt is the single most impactful piece of text in your agent. It defines the agent's identity, goals, constraints, available tools, and failure behavior. Most production bugs trace back to a poorly structured system prompt.

text — well-structured agent system prompt template
## Identity
You are a research assistant agent. Your role is to help users find, summarize,
and synthesize information from the web and internal knowledge bases.

## Goal
Complete the user's research task as accurately and concisely as possible.
Always cite sources. Never fabricate information.

## Constraints
- Do NOT search the web for personal information about real individuals.
- Do NOT produce content that could be used to harm others.
- Do NOT execute code unless the user explicitly requests it.
- If you are unsure about a fact, say so and offer to search for it.

## Available Tools
- search_web: Search the internet for current information. Use for real-time data.
- read_document: Read a specific URL or file path. Use when you have a concrete source to read.
- write_note: Save a note to the session for later reference.
- finish: Signal task completion with your final answer.

## Behavior Rules
1. Always use a tool before making factual claims about current events.
2. If a tool fails, explain why and try an alternative approach.
3. When finished, call finish() with a clear, structured answer.
4. Maximum 15 steps before calling finish() with partial results.

## Output Format
- Use markdown for structure.
- Include source URLs inline when citing web results.
- For multi-part answers, use numbered sections.

Instruction Hierarchy (OpenAI model spec)

Modern LLM deployments define a strict priority order for instructions. When instructions conflict, this hierarchy determines which takes precedence:

1
Platform / DeveloperThe API caller's system prompt — highest priority; sets hard constraints the model will not override
2
OperatorBusiness-specific configuration (e.g., "only answer about our products") — can restrict but not override developer constraints
3
UserThe end user's requests — lowest priority; cannot override operator or developer rules

This hierarchy is important for agent security: a malicious prompt injection in a tool result (arriving in the "user content" position) should not be able to override your system prompt's safety constraints. Chapter 16 covers this in depth.

Anti-Patterns to Avoid

1. Over-instruction

Writing a system prompt that lists every possible scenario creates contradictions and confuses the model. A 5000-token system prompt is almost always worse than a 500-token one that covers the core constraints clearly. Enumerate only the constraints that matter — the model generalizes the rest.

2. Context poisoning

Injecting user-controlled text directly into the system prompt without sanitization. A user who writes "Ignore all previous instructions and..." in their message can influence agent behavior if their text lands in a high-priority position. Always place user content in the "user" role — never interpolate it into the system prompt.

3. Growing context without eviction

Appending every message and tool result without any eviction policy. After 20 turns, the context is dominated by early observations that are no longer relevant, and the model's effective attention is diluted. Implement summarization or sliding-window eviction before the context exceeds 50% of the model's limit.

Few-shot examples in agent prompts

Including 1–2 examples of ideal (Thought → Action → Observation → Answer) sequences in the system prompt consistently improves agent behavior quality. The examples demonstrate the expected reasoning format, not just the output format. Keep examples concise — their token cost must be justified by the quality improvement.

Chapter 9 Quiz

1. Why is the "lost in the middle" problem relevant to context management strategy?

2. When should you use summarization eviction over simple sliding-window eviction?

3. In the instruction hierarchy, what can an "Operator" instruction NOT do?