Skip to Content

Consolidation

Over time, your memory store accumulates duplicates, stale information, and raw conversation chunks. Consolidation is the background process that cleans this up — merging duplicates, fading unused memories, expiring old chunks, and promoting raw data into clean insights.

Think of it as sleep-time compute for your agent’s memory. The agent works fast during the day (zero-LLM writes), and consolidation does the heavy lifting in the background.

Four operations

1. Deduplication

Finds pairs of memories with cosine similarity above a threshold (default: 0.92). When two memories say essentially the same thing, they get merged.

  • Without LLM: keeps the longer or newer memory, soft-deletes the other
  • With LLM: merges both into a single, more concise memory

Example: “User prefers dark mode” and “The user likes dark mode for all their apps” get merged into one clean memory.

2. Importance decay

Memories that nobody asks about gradually lose importance:

ConditionEffect
Not accessed for 7+ daysimportance × 0.95
Not accessed for 30+ daysimportance × 0.80
Importance drops below 0.1Soft-deleted

This keeps your memory store focused on information that’s actually useful. Memories that get recalled regularly maintain their importance automatically.

3. Raw expiry

raw memories are temporary by design — they’re conversation chunks waiting to be processed. After 30 days, unprocessed raw memories are soft-deleted. This prevents the store from growing unboundedly with ingested conversations.

4. Promotion (requires LLM)

The most interesting step. Promotion takes raw conversation chunks and distills them into clean insight memories:

raw: "user: I just moved to Denver, used to live in Austin" raw: "user: I work at Acme Corp as a senior engineer" ↓ LLM distillation insight: "User lives in Denver (previously Austin)" insight: "User is a senior engineer at Acme Corp"

The LLM reads the raw text, extracts the key facts, and stores them as insights with higher importance. The original raw chunks are marked as consolidated.

Without an LLM, consolidation still handles dedup, decay, and expiry — just not promotion.

Running consolidation

Manual trigger

Run consolidation once for a specific agent:

report = await store.consolidate( org_id="my-app", agent_id="bot", llm=my_llm_callable, # optional — enables promotion similarity_threshold=0.92, # dedup threshold ) print(f"Duplicates merged: {report.duplicates_merged}") print(f"Importance decayed: {report.memories_decayed}") print(f"Raw expired: {report.memories_expired}") print(f"Promoted to insight:{report.memories_promoted}")

Automatic scheduling

For production, attach a scheduler that runs consolidation on a regular cadence:

from unforget import ConsolidationScheduler scheduler = ConsolidationScheduler( store, interval_seconds=3600, # every hour write_threshold=50, # or after every 50 writes, whichever comes first llm=my_llm_callable, # optional ) store.attach_scheduler(scheduler) await scheduler.start()

The scheduler automatically discovers all active agents (by querying distinct org_id/agent_id pairs) and consolidates each one. If an agent has no new data since the last consolidation, it’s skipped.

Failures are handled with exponential backoff, capped at 1 hour.

The LLM callable

Consolidation accepts any async function with the signature async (prompt: str) -> str. Use whatever LLM you prefer:

# OpenAI async def my_llm(prompt: str) -> str: response = await openai_client.chat.completions.create( model="gpt-4.1-nano", messages=[{"role": "user", "content": prompt}], ) return response.choices[0].message.content # Anthropic async def my_llm(prompt: str) -> str: response = await anthropic_client.messages.create( model="claude-haiku-4-5", max_tokens=256, messages=[{"role": "user", "content": prompt}], ) return response.content[0].text # Local via Ollama async def my_llm(prompt: str) -> str: async with httpx.AsyncClient() as client: r = await client.post("http://localhost:11434/api/generate", json={"model": "llama3.2", "prompt": prompt, "stream": False}) return r.json()["response"]

A small, fast model works best for consolidation. The prompts are simple (merge these two texts, extract facts from this chunk) — you don’t need a large model.

When to consolidate

  • High-write apps (chat bots, support agents): schedule every 30-60 minutes or after every 100 writes
  • Low-write apps (personal assistants): once a day is enough
  • Batch ingestion: run manually after ingesting a large dataset
Last updated on
Apache 2.0 · Unforget