Course Building Agentic AI Systems Chapter 7 Difficulty advanced Estimated Time 600 min

Chapter 7: Memory Systems

Memory Systems in Building Agentic AI Systems.

32% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the agentic AI concept behind Memory Systems.
  • Apply Memory Systems to design reliable, production-grade agent systems.
  • Recognize operational trade-offs in tool use, orchestration, safety, and cost.

Chapter 7: Memory Systems

Four memory types, temporal knowledge graphs, and hybrid retrieval

Memory is Not the Context Window

The most common misconception in agent design: treating the context window as memory. The context window is a fixed-size working buffer. Memory is a set of external storage systems that selectively surface relevant information into that buffer on demand.

Why context ≠ memory

  • Context windows are expensive — GPT-4o charges per token; a 128K context with 10K tokens of conversation history costs significantly more per turn than one with only relevant 2K tokens
  • Context windows are ephemeral — they disappear when the session ends
  • Context windows degrade at long range — model performance on facts in the middle of a very long context is measurably worse than facts at the start or end (the "lost in the middle" problem)

The goal of a memory system is to decide what is worth putting in the context window for a given query, and to persist information across sessions.

Four Memory Types

1
Working Memory — In-context State The active conversation messages. Sub-10ms access. Used for the current task. Cleared when the session ends.
2
Episodic Memory — What Happened Timestamped log of past interactions, tool calls, and outcomes. Supports temporal queries: "what did we do last Tuesday?" Stored in relational DB or time-series store.
3
Semantic Memory — What I Know Facts and knowledge stored as vector embeddings. Supports similarity search: "what do I know about topic X?" Stored in vector DB (pgvector, Chroma, Qdrant).
4
Procedural Memory — How to Do Things Stored tool schemas, system prompt templates, workflow patterns, and learned skills. Consulted when the agent needs to know its own capabilities.

Temporal Knowledge Graphs

Standard vector databases only answer "what is similar to this query?" A temporal knowledge graph (e.g. Zep/Graphiti) also answers "how did things change over time?" and "what relationships exist between entities?" It combines semantic search with entity extraction, relationship modeling, and time-range filtering.

Temporal Knowledge Graph Structure

User: Alice
—prefers→
Python
↓ worked_on
Project: API Refactor
—completed→
2026-04-01
↓ related_to
FastAPI migration
—blocked_by→
Auth module

Edges carry timestamps — supports "what was true before/after date X?"

Zep's Graphiti achieves 94.8% accuracy on Deep Memory Retrieval benchmarks, compared to flat vector search baselines. The improvement comes from using graph traversal to chain related facts rather than returning independent document chunks.

Retrieval Strategies

Query

"Tell me about Alice's recent work"

🔢
Embed

Dense vector (e.g., text-embedding-3)

🔍
Hybrid Search

Dense + BM25 + rerank

📋
Top-K Chunks

Injected into context

Dense vs Sparse vs Hybrid

StrategyMechanismBest ForWeakness
Dense (vector)Cosine similarity on embeddingsSemantic similarity, paraphrasesMisses exact keyword matches
Sparse (BM25)TF-IDF keyword matchingExact terms, codes, IDsMisses semantic similarity
HybridDense + sparse, score fusionGeneral purpose — best recallMore complex pipeline, higher latency
RerankingCross-encoder re-scores top-KPrecision on top-1 resultAdded latency; requires 2nd model call

Delta compression for multi-turn agents

RetainDB's research shows delta compression of episodic memory achieves 50–90% token savings in multi-turn scenarios by storing only what changed between turns rather than the full conversation state. This is especially valuable for long-running agent sessions where memory accumulates quickly.

Memory System Implementation

python — pluggable memory backend for an agent
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
import json

@dataclass
class MemoryEntry:
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)
    timestamp: datetime = field(default_factory=datetime.utcnow)
    memory_type: str = "episodic"   # episodic | semantic | procedural


class MemoryBackend(ABC):
    @abstractmethod
    def store(self, entry: MemoryEntry) -> str:
        """Store a memory entry; return its ID."""

    @abstractmethod
    def search(self, query: str, top_k: int = 5, memory_type: str | None = None) -> list[MemoryEntry]:
        """Retrieve the most relevant memories for the query."""


class ChromaMemoryBackend(MemoryBackend):
    """Semantic memory using ChromaDB and OpenAI embeddings."""

    def __init__(self, collection_name: str = "agent_memory") -> None:
        import chromadb
        from chromadb.utils import embedding_functions

        self._client = chromadb.PersistentClient(path="./chroma_store")
        self._ef = embedding_functions.OpenAIEmbeddingFunction(
            model_name="text-embedding-3-small"
        )
        self._collection = self._client.get_or_create_collection(
            name=collection_name, embedding_function=self._ef
        )

    def store(self, entry: MemoryEntry) -> str:
        import uuid
        entry_id = str(uuid.uuid4())
        self._collection.add(
            documents=[entry.content],
            metadatas=[{**entry.metadata, "type": entry.memory_type, "ts": entry.timestamp.isoformat()}],
            ids=[entry_id],
        )
        return entry_id

    def search(self, query: str, top_k: int = 5, memory_type: str | None = None) -> list[MemoryEntry]:
        where = {"type": memory_type} if memory_type else None
        results = self._collection.query(
            query_texts=[query], n_results=top_k, where=where
        )
        entries = []
        for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
            entries.append(MemoryEntry(
                content=doc,
                metadata=meta,
                timestamp=datetime.fromisoformat(meta.get("ts", datetime.utcnow().isoformat())),
                memory_type=meta.get("type", "semantic"),
            ))
        return entries

Chapter 7 Quiz

1. What is the "lost in the middle" problem?

2. What advantage does a temporal knowledge graph have over a flat vector database?

3. Why does hybrid search (dense + BM25) outperform either approach alone?