Agent Memory Architectures: Working, Episodic, Semantic, and Procedural

The hardest property to engineer into an agent is not reasoning. It is useful continuity. A reasoner that starts from zero every turn is a chatbot. A reasoner with structured memory across the four tiers below is something that compounds value with use.

TL;DR — The Four Tiers

Tier	What it stores	Storage	Retrieval rule
Working	Current task state	In-process (LangGraph state)	Always in context
Episodic	Summaries of past sessions	Vector store + metadata	Retrieved by similarity to current query
Semantic	Extracted facts about user/world	Relational schema	Retrieved by entity + relevance
Procedural	Learned routines and policies	Versioned documents (git or DB)	Retrieved by intent + role

Build all four. They are not interchangeable; each compensates for a different failure mode.

Why Context Window Is Not Enough

The pitch from frontier-model providers has been that ever-longer context windows make memory engineering obsolete. The reality is more nuanced. Three properties of long context that you cannot ignore:

Lost-in-the-middle. Multiple controlled studies (NIAH, RULER, BABILong) show accuracy degrading 30–50% on facts placed in the middle of a 200k-token context, versus the same facts at the start or end.
Cost is linear, attention is not. A 200k-token request costs 200× a 1k-token request, but the model's effective use of the context does not scale 200×.
Latency is linear in input length. Even with caching, prefill time at 200k tokens dominates TTFT.

Long context is the delivery vehicle. Memory architecture is the curation system that decides what gets delivered.

Working Memory

Working memory is the current task's running state. It lives in the orchestration layer.

In LangGraph terms, it is the typed state object that flows through nodes. In a custom loop, it is the state: dict that you pass between steps.

What belongs here:

The user's current message and recent turns (last N)
The current plan or sub-goals
Intermediate results from tool calls
Open approvals or interrupts waiting on humans
Step counter / budget remaining

What does NOT belong here:

Anything that should survive process restart (put it in episodic)
Anything you want to query later (put it in semantic)
Anything that describes how to do work (put it in procedural)

The working-memory failure mode is state explosion: tools dump their entire output into state, the state grows past the context window, and the agent slows to a crawl. Fix: every tool output passes through a projection function that extracts only what the next step needs.

Episodic Memory

Episodic memory is "what happened, summarized." It is the agent's diary.

Implementation pattern:

python
class Episode(BaseModel):
    id: UUID
    user_id: UUID
    started_at: datetime
    ended_at: datetime
    summary: str           # natural language, ~3 sentences
    outcome: Outcome       # success | escalated | abandoned
    topics: list[str]      # tagged for filtering
    embedding: list[float] # for similarity retrieval
    facts_extracted: list[FactRef]  # foreign keys into semantic store

# Retrieval at session start
async def recall_relevant_episodes(query: str, user: User) -> list[Episode]:
    embedding = await embed(query)
    candidates = await vector_store.search(
        embedding,
        filter={"user_id": user.id},
        top_k=8,
    )
    return await reranker.rerank(query, candidates, top_n=3)

Three design rules that make episodic memory useful:

Summaries, not transcripts. Storing whole conversations is cheap but useless — retrieval surfaces too much noise. Summarize at session end with a structured template.
Per-user scoping. Filter by user_id in the vector search itself, not in post-processing. Without this, you have a privacy bug waiting to happen.
Decay. Older episodes contribute less to retrieval. Multiply similarity by exp(-age_days / half_life) and re-sort.

Semantic Memory

Semantic memory stores extracted facts. Where episodic answers what happened, semantic answers what is true.

Schema sketch:

sql
CREATE TABLE facts (
  id uuid PRIMARY KEY,
  user_id uuid NOT NULL,
  subject text NOT NULL,           -- "user", "user.company", "user.preferences"
  predicate text NOT NULL,         -- "uses_database", "industry", "timezone"
  object text NOT NULL,            -- "postgres-16", "fintech", "America/New_York"
  confidence float NOT NULL,       -- 0..1
  source_episode_id uuid REFERENCES episodes(id),
  recorded_at timestamptz NOT NULL,
  superseded_at timestamptz NULL,
  superseded_by uuid REFERENCES facts(id)
);

CREATE INDEX ON facts (user_id, subject, predicate) WHERE superseded_at IS NULL;

The structure deliberately mirrors RDF triples. Facts are immutable; updates produce a new row that supersedes the old one. This gives you:

Auditability. Every fact has a source episode and a timestamp.
Temporal queries. "What did we believe about this user on April 3?" is a SQL query.
Safe deletion. GDPR erasure becomes a single UPDATE.

The retrieval rule for semantic memory differs from episodic: you fetch facts by entity identity, not similarity. "Tell me everything I know about this user's company" is a join, not a vector search.

Procedural Memory

Procedural memory stores learned routines: how to handle specific intent classes, escalation paths, policies the agent must follow.

Concretely, these are versioned documents the agent retrieves on demand:

"Refund procedure for orders under $200"
"Escalation policy for HIPAA-regulated questions"
"Tone guide for first-time vs returning customers"
"Decision flowchart for sales-qualifying a lead"

Where to store: in git, in a content-managed DB with versioning, or in your existing knowledge base — as long as each procedure is independently retrievable and versioned.

Where NOT to store: bake them into the system prompt. System prompts that try to encode every procedure become brittle, expensive (prefill cost grows), and impossible to A/B test.

The retrieval rule: at intent classification time, fetch the procedures matching (intent, user_role). Include them in context only for the steps that need them.

Memory Consolidation — The Layer Most People Skip

Without consolidation, memory rots. Facts contradict; episodes accumulate noise; procedural docs drift.

A minimal consolidation pipeline runs nightly:

python
async def consolidate_user_memory(user_id: UUID):
    # 1. Dedupe facts: cluster by (subject, predicate); keep highest-confidence,
    #    most-recent in each cluster. Older are marked superseded.
    facts = await db.fetch_active_facts(user_id)
    for cluster in group_by_subject_predicate(facts):
        canonical = max(cluster, key=lambda f: (f.recorded_at, f.confidence))
        for other in cluster:
            if other.id != canonical.id:
                await db.supersede(other.id, by=canonical.id)

    # 2. Compact episodes: merge episodes older than 30 days within the same topic.
    old_episodes = await db.fetch_episodes(user_id, older_than=30)
    for topic_group in group_by_topic(old_episodes):
        merged_summary = await summarizer.summarize_episodes(topic_group)
        await db.replace_episodes(topic_group, with_summary=merged_summary)

    # 3. Detect contradictions: flag facts that conflict with source-of-truth systems.
    for fact in await db.fetch_active_facts(user_id):
        truth = await source_of_truth.check(fact)
        if truth.contradicts:
            await db.flag_for_review(fact, reason=truth.reason)

This is the layer that turns "agent gets weirder over time" into "agent gets sharper over time."

Putting It Together: A Request Lifecycle

1. User sends a message.
2. Working memory initialized with state{message, user_id, history_summary}.
3. Episodic recall fires: top-3 relevant past episodes loaded.
4. Semantic recall fires: facts about user + entities in message loaded.
5. Procedural recall fires: procedures matching the classified intent loaded.
6. Orchestrator runs nodes; tool outputs projected into working memory.
7. On task completion: write an episode summary. Extract new facts (with
   confidence + source) into semantic store.
8. Async: consolidation job runs nightly. Outdated facts superseded;
   stale episodes compacted.

Six of those seven steps are cheap. The expensive one is fact extraction — a separate LLM call after each session. Run it asynchronously after returning the user's response; the user does not pay the latency.

Common Anti-Patterns

Auto-remember everything. Writing facts after every turn produces a stew of low-confidence noise. Make the agent explicitly call remember(fact).
One vector store for everything. Episodic and semantic have different retrieval semantics. Mixing them ruins both.
No decay. A user's preferred language from 2024 is not relevant in 2026 if newer evidence contradicts it. Bake recency into retrieval and consolidation.
Memory in the system prompt. Inflates every request. Prefer just-in-time retrieval scoped to the current task.
No erasure path. GDPR Article 17 is real. Plan for "delete everything about this user" before regulators ask.

Frequently Asked Questions

Why isn't the LLM's context window enough memory?

Context windows grow but attention degrades — the 'lost-in-the-middle' effect means accuracy on facts buried in the middle of a 200k-token context can drop by 30–50%. Long context is a delivery mechanism for selected memory, not a substitute for it.

What's the difference between episodic and semantic memory?

Episodic memory stores summaries of past interactions. Semantic memory stores extracted facts. Episodic answers "what happened?"; semantic answers "what is true?"

How do I prevent the agent from remembering wrong things?

Make memory writes deliberate, not automatic. The agent calls an explicit "remember" tool, the input is schema-validated, and a periodic consolidation job deduplicates and verifies entries against the source-of-truth system.

Where should I store agent memory?

Working memory in the orchestrator's state object. Episodic in a vector store with structured metadata. Semantic in a relational schema you can query and audit. Procedural in versioned policy documents the agent retrieves on-demand.

Does this work for stateless / serverless agents?

Yes — working memory still lives in the request, but episodic, semantic, and procedural stores are externalised and survive across invocations. The state object is just a transient view.

Key Takeaways

Memory is four distinct concerns — working, episodic, semantic, procedural — and each needs its own storage and retrieval rules.
Make memory writes deliberate, not automatic, to prevent compounding errors.
Long context windows are a delivery mechanism for selected memory, not a substitute for memory architecture.
Consolidation jobs that deduplicate and verify memory are the difference between an agent that gets smarter and one that gets weirder.
Plan for erasure from day one — both for trust and for compliance.

Frequently Asked Questions

Why isn't the LLM's context window enough memory?

Context windows grow but attention degrades — the 'lost-in-the-middle' effect means accuracy on facts buried in the middle of a 200k-token context can drop by 30–50% versus the same facts at the start. Long context is a delivery mechanism for selected memory, not a substitute for it.

What's the difference between episodic and semantic memory?

Episodic memory stores summaries of past interactions ('On May 3, the user asked about refunds and was satisfied with the policy'). Semantic memory stores extracted facts ('User uses Postgres 16'). Episodic answers 'what happened?'; semantic answers 'what is true?'

How do I prevent the agent from remembering wrong things?

Make memory writes deliberate, not automatic. The agent calls an explicit 'remember' tool, the input is schema-validated, and a periodic consolidation job deduplicates and verifies entries against the source-of-truth system.

Agent Memory Architectures: Working, Episodic, Semantic, and Procedural

TL;DR — The Four Tiers

Why Context Window Is Not Enough

Working Memory

Episodic Memory

Semantic Memory

Procedural Memory

Memory Consolidation — The Layer Most People Skip

Putting It Together: A Request Lifecycle

Common Anti-Patterns

Frequently Asked Questions

Why isn't the LLM's context window enough memory?

What's the difference between episodic and semantic memory?

How do I prevent the agent from remembering wrong things?

Where should I store agent memory?

Does this work for stateless / serverless agents?

Key Takeaways

Why isn't the LLM's context window enough memory?

What's the difference between episodic and semantic memory?

How do I prevent the agent from remembering wrong things?

Where should I store agent memory?

Let's Build Your
Sovereign System

SyntharaTechnologies

Services

Directives

Direct Communication

INITIATE
PROTOCOL.

Agent Memory Architectures: Working, Episodic, Semantic, and Procedural

TL;DR — The Four Tiers

Why Context Window Is Not Enough

Working Memory

Episodic Memory

Semantic Memory

Procedural Memory

Memory Consolidation — The Layer Most People Skip

Putting It Together: A Request Lifecycle

Common Anti-Patterns

Frequently Asked Questions

Why isn't the LLM's context window enough memory?

What's the difference between episodic and semantic memory?

How do I prevent the agent from remembering wrong things?

Where should I store agent memory?

Does this work for stateless / serverless agents?

Key Takeaways

Why isn't the LLM's context window enough memory?

What's the difference between episodic and semantic memory?

How do I prevent the agent from remembering wrong things?

Where should I store agent memory?

Let's Build Your Sovereign System

SyntharaTechnologies

Services

Directives

Direct Communication

INITIATE PROTOCOL.

Let's Build Your
Sovereign System

INITIATE
PROTOCOL.