Chapter 9: Context Management
Context Management in Building Agentic AI Systems.
Learning Objectives
By the end of this chapter, you will be able to:
- Explain the agentic AI concept behind Context Management.
- Apply Context Management to design reliable, production-grade agent systems.
- Recognize operational trade-offs in tool use, orchestration, safety, and cost.
Chapter 9: Context Management
Budget, summarization, system prompt design, and instruction hierarchy
Context is the Agent's Working Memory
The context window is both the most important and most constrained resource in an agent. Everything the LLM sees — the system prompt, tool schemas, conversation history, memory retrievals, tool results — must fit inside it. Managing what goes in is a first-class engineering concern.
Cost reality check
A GPT-4o-mini agent with a 16K token context per turn, running 10 turns to complete a task, with 100 tasks per day = 16M tokens/day just for input. At current pricing, context management decisions directly translate to infrastructure cost. A 40% reduction in average context size = 40% cost reduction for the same throughput.
Context Budget Management
Budget management is the practice of deciding, for each turn, what to include in the context window and at what token cost.
Eviction Strategies
| Strategy | What gets evicted | Trade-off |
|---|---|---|
| Sliding window | Oldest N messages | Simple; loses early context that may still be relevant |
| Summarization | Oldest N messages → replaced by summary | Preserves information; costs 1 LLM call per eviction |
| Importance scoring | Messages with lowest relevance score | Best quality; requires scoring (embedding similarity or LLM) |
| Tool result compression | Full tool output → compressed key-value | Major token savings; requires extraction step |
System Prompt Design for Agents
The system prompt is the single most impactful piece of text in your agent. It defines the agent's identity, goals, constraints, available tools, and failure behavior. Most production bugs trace back to a poorly structured system prompt.
## Identity
You are a research assistant agent. Your role is to help users find, summarize,
and synthesize information from the web and internal knowledge bases.
## Goal
Complete the user's research task as accurately and concisely as possible.
Always cite sources. Never fabricate information.
## Constraints
- Do NOT search the web for personal information about real individuals.
- Do NOT produce content that could be used to harm others.
- Do NOT execute code unless the user explicitly requests it.
- If you are unsure about a fact, say so and offer to search for it.
## Available Tools
- search_web: Search the internet for current information. Use for real-time data.
- read_document: Read a specific URL or file path. Use when you have a concrete source to read.
- write_note: Save a note to the session for later reference.
- finish: Signal task completion with your final answer.
## Behavior Rules
1. Always use a tool before making factual claims about current events.
2. If a tool fails, explain why and try an alternative approach.
3. When finished, call finish() with a clear, structured answer.
4. Maximum 15 steps before calling finish() with partial results.
## Output Format
- Use markdown for structure.
- Include source URLs inline when citing web results.
- For multi-part answers, use numbered sections.
Instruction Hierarchy (OpenAI model spec)
Modern LLM deployments define a strict priority order for instructions. When instructions conflict, this hierarchy determines which takes precedence:
This hierarchy is important for agent security: a malicious prompt injection in a tool result (arriving in the "user content" position) should not be able to override your system prompt's safety constraints. Chapter 16 covers this in depth.
Anti-Patterns to Avoid
1. Over-instruction
Writing a system prompt that lists every possible scenario creates contradictions and confuses the model. A 5000-token system prompt is almost always worse than a 500-token one that covers the core constraints clearly. Enumerate only the constraints that matter — the model generalizes the rest.
2. Context poisoning
Injecting user-controlled text directly into the system prompt without sanitization. A user who writes "Ignore all previous instructions and..." in their message can influence agent behavior if their text lands in a high-priority position. Always place user content in the "user" role — never interpolate it into the system prompt.
3. Growing context without eviction
Appending every message and tool result without any eviction policy. After 20 turns, the context is dominated by early observations that are no longer relevant, and the model's effective attention is diluted. Implement summarization or sliding-window eviction before the context exceeds 50% of the model's limit.
Few-shot examples in agent prompts
Including 1–2 examples of ideal (Thought → Action → Observation → Answer) sequences in the system prompt consistently improves agent behavior quality. The examples demonstrate the expected reasoning format, not just the output format. Keep examples concise — their token cost must be justified by the quality improvement.
Chapter 9 Quiz
1. Why is the "lost in the middle" problem relevant to context management strategy?
2. When should you use summarization eviction over simple sliding-window eviction?
3. In the instruction hierarchy, what can an "Operator" instruction NOT do?