Skip to Content

Retrieval

Unforget doesn’t rely on a single search strategy. Every recall() fires four different search channels in parallel — semantic similarity, keyword matching, entity overlap, and temporal recency — then fuses the results into a single ranked list. This happens in one SQL round trip.

The four channels

ChannelHow it worksWhat it’s good atIndex
SemanticCosine similarity on embeddingsFinding conceptually similar content, even with different wordsHNSW (pgvector)
BM25PostgreSQL full-text search with stemmingExact keyword matches, names, technical termsGIN (tsvector)
EntityNamed entity array overlapQueries about specific people, places, dates, or productsGIN (text[])
TemporalOrdered by last access time”What did we just discuss?” or recent contextB-tree

Each channel returns its top candidates independently. A memory might rank #1 in semantic but #15 in BM25. The fusion step combines these signals.

Why four channels?

No single search method works for everything. Semantic search is great at “user settings” matching “dark mode preference” but misses exact names. BM25 nails “PostgreSQL 16” but can’t handle paraphrases. Entity overlap catches “When did Caroline go camping?” when the semantic embedding doesn’t strongly connect “Caroline” to camping. Temporal recency helps with “what were we just talking about?”

By running all four in parallel, Unforget covers gaps that any single method would miss.

Reciprocal Rank Fusion (RRF)

After each channel returns its ranked results, RRF combines them into a single score:

score(memory) = Σ weight[channel] × (1 / (k + rank[channel]))

A memory that ranks high in multiple channels gets a much higher fused score than one that ranks high in only one channel. The parameter k (default: 60) controls how much top ranks dominate — lower k means the #1 result from any channel gets a proportionally bigger boost.

Example: A memory that’s #2 in semantic and #3 in BM25 will outscore a memory that’s #1 in semantic but nowhere in BM25. Consensus across channels is rewarded.

Type boosts

After fusion, each memory’s score is multiplied by its type boost:

TypeBoostWhy
insight×1.5Distilled facts are more useful than raw conversation
event×1.0Baseline
raw×0.5Raw chunks are noisy — prefer insights when available

Cross-encoder reranking

After the 4-channel fusion produces a candidate list, an optional cross-encoder model (ms-marco-MiniLM-L-6-v2) reranks the top results. This adds ~10ms but catches cases where the embedding-based ranking got the order wrong.

The reranker looks at the query-memory pair together (not just the embedding distance) so it understands context better. It’s especially helpful for ambiguous queries.

Disable it with rerank=False if you need the lowest possible latency.

The full pipeline

Query Embed query (~3ms) 4-channel SQL CTE (semantic + BM25 + entity + temporal) (~5ms) RRF fusion + type boosting Cross-encoder reranking (~10ms) Deduplicate overlapping results Return top-k results (~25ms total)

All of this happens in a single recall() call.

Usage

# Basic recall — uses all defaults results = await memory.recall("user preferences", limit=10) # Filtered by type results = await memory.recall("deploy issues", memory_type="event", limit=5) # Without reranking (faster, slightly less precise) results = await memory.recall("recent conversations", rerank=False) # With a minimum score threshold results = await memory.recall("team structure", threshold=0.1) # Skip cache for fresh results results = await memory.recall("latest updates", use_cache=False)

Auto-recall for LLM prompts

auto_recall() wraps recall with formatting — ready to drop into a system prompt:

context = await memory.auto_recall( "What does the user prefer?", max_tokens=2000, # token budget limit=10, ) # Returns: # "[Memory Context] # - User prefers dark mode # - User is allergic to shellfish # - User works in Berlin"

Tuning

Different use cases benefit from different channel weights. Here are some starting points:

Customer support (exact terms matter):

channel_weights={"semantic": 1.0, "bm25": 1.2, "entity": 0.8, "temporal": 0.2}

Personal assistant (conversational, recency matters):

channel_weights={"semantic": 1.0, "bm25": 0.8, "entity": 0.6, "temporal": 0.8}

Knowledge base (factual, entity-heavy):

channel_weights={"semantic": 1.0, "bm25": 1.0, "entity": 1.0, "temporal": 0.1}
Last updated on
Apache 2.0 · Unforget