Home/Agentic AI/Long-Term Memory/Retrieval Strategies

Long-Term Memory

Master how AI agents store and retrieve knowledge across sessions using persistent memory systems

Optimizing Memory Retrieval

Having a database full of memories is only useful if you can retrieve the right information quickly. Retrieval strategies determine what gets returned, how fast, and how relevantit is.

Let's explore the trade-offs between different retrieval approaches.

Interactive: Retrieval Strategy Explorer

πŸ”’ Dense Retrieval (Vector Search)

Converts query to embeddings, finds nearest neighbors in vector space. Best for semantic similarity.

βœ… Strengths
  • β€’ Understands meaning
  • β€’ Cross-lingual search
  • β€’ Handles synonyms
❌ Weaknesses
  • β€’ Misses exact phrases
  • β€’ Embedding cost
  • β€’ Requires model
# Dense retrieval example
query_vec = embed("password reset")
results = db.search(query_vec, top_k=5)
# Returns semantically similar docs
Precision
85%
Good
Recall
78%
Good
Speed
50%
Slow
Cost
High
Per 1K queries

πŸ” Metadata Filtering

Before retrieving, you can filter by metadataβ€”attributes like date, user ID, document type, or tags. This narrows the search space and improves relevance.

Without Filtering

results = db.search(
query_vector,
top_k=5
)
Searches all 10M documents β†’ Slower, less relevant

With Filtering

results = db.search(
query_vector,
top_k=5,
filter={"user_id": "123", "date": ">2024-01-01"}
)
Searches only 5K relevant documents β†’ Faster, more relevant
πŸ’‘ Pro Tip:

Always include user_id in your metadata for multi-tenant systems. This prevents cross-user data leakage and improves retrieval speed.

⚑ Caching for Speed

Many queries repeat. Cache common results to avoid redundant database calls.

πŸ”

Semantic Caching

Cache by similarity: If query is 98% similar to cached query, return cached result

if similar_query_in_cache(query, threshold=0.98):
return cached_results
πŸ“¦

Result Caching

Cache entire result sets with TTL (time-to-live). Refresh periodically.

cache.set(query_hash, results, ttl=3600) # 1 hour
πŸ”₯

Hot Path Optimization

Precompute results for common queries (e.g., "What are your hours?")

HOT_QUERIES = {"hours": [...], "refund": [...]}

πŸ’‘ Key Insight

Production systems use multi-stage retrieval: (1) Fast broad search with filters, (2) Rerank top candidates, (3) Cache common results. This balances speed, relevance, and cost at scale.

←Previous