Home/Agentic AI/Vector Databases/Vector Operations

Vector Databases for Memory

Master how AI agents use vector databases to store, search, and retrieve embeddings for semantic memory

Similarity Metrics & Search

Once data is stored as embeddings, vector databases enable similarity search: finding vectors closest to a query vector. The choice of distance metric determines how "similarity" is calculated.

Three primary metrics: Cosine Similarity (direction), Dot Product (magnitude + direction), and Euclidean Distance (geometric distance).

Interactive: Similarity Metric Calculator

Adjust vectors and switch metrics to see how similarity scores change.

Vector 1: Query

Vector 2: Document

Similarity Score
0.9600
Higher = More Similar (Range: -1 to 1)
Vector 1:[0.80, 0.60]
Vector 2:[0.60, 0.80]

📐 Comparing Distance Metrics

📏Cosine Similarity

Measures angle between vectors (direction, not magnitude)

✓ Range: -1 (opposite) to 1 (identical)
✓ Best for: Text, normalized embeddings
✓ Most common for semantic search

Dot Product

Combines direction and magnitude

✓ Range: Unbounded
✓ Best for: Normalized vectors, speed
✓ Faster than cosine (no division)

📍Euclidean Distance

Straight-line distance in space

✓ Range: 0 (identical) to ∞
✓ Best for: Coordinate data, images
✓ Lower values = more similar

🔍 Nearest Neighbor Search

Vector databases find the K nearest neighbors to a query vector. Here's how semantic clustering works:

Example: Query = "cat" [0.8, 0.3]
cat[0.80, 0.30]
100.0%
kitten[0.75, 0.35]
99.7%
dog[0.70, 0.40]
98.7%
computer[0.20, 0.90]
54.6%
laptop[0.25, 0.85]
60.1%
Observation: Animal terms (cat, kitten, dog) cluster together with high similarity. Tech terms (computer, laptop) form a separate cluster. This is semantic organization!

⚙️ Practical Considerations

1.
Vector Normalization: Normalize embeddings to unit length for consistent cosine similarity scores (many models do this automatically).
2.
Top K Results: Retrieve the K most similar vectors (e.g., top 5). Balance recall (get enough) vs relevance (not too many).
3.
Similarity Threshold: Filter results below a minimum similarity score (e.g., 0.7) to ensure quality.
4.
Metadata Filtering: Combine vector search with attribute filters ("Find similar documents WHERE author = 'Alice'").
Prev