Home/Agentic AI/Vector Databases/Vector Operations

Vector Databases for Memory

Master how AI agents use vector databases to store, search, and retrieve embeddings for semantic memory

Your Progress

0 / 5 completed

Introduction

Embeddings Fundamentals

Vector Operations

Database Architecture

Key Takeaways

Similarity Metrics & Search

Once data is stored as embeddings, vector databases enable similarity search: finding vectors closest to a query vector. The choice of distance metric determines how "similarity" is calculated.

Three primary metrics: Cosine Similarity (direction), Dot Product (magnitude + direction), and Euclidean Distance (geometric distance).

Interactive: Similarity Metric Calculator

Adjust vectors and switch metrics to see how similarity scores change.

Vector 1: Query

X: 0.80

Y: 0.60

Vector 2: Document

X: 0.60

Y: 0.80

Similarity Score

0.9600

Higher = More Similar (Range: -1 to 1)

Vector 1:[0.80, 0.60]

Vector 2:[0.60, 0.80]

📐 Comparing Distance Metrics

📏Cosine Similarity

Measures angle between vectors (direction, not magnitude)

✓ Range: -1 (opposite) to 1 (identical)

✓ Best for: Text, normalized embeddings

✓ Most common for semantic search

⚡Dot Product

Combines direction and magnitude

✓ Range: Unbounded

✓ Best for: Normalized vectors, speed

✓ Faster than cosine (no division)

📍Euclidean Distance

Straight-line distance in space

✓ Range: 0 (identical) to ∞

✓ Best for: Coordinate data, images

✓ Lower values = more similar

🔍 Nearest Neighbor Search

Vector databases find the K nearest neighbors to a query vector. Here's how semantic clustering works:

Example: Query = "cat" [0.8, 0.3]

cat[0.80, 0.30]

100.0%

kitten[0.75, 0.35]

99.7%

dog[0.70, 0.40]

98.7%

computer[0.20, 0.90]

54.6%

laptop[0.25, 0.85]

60.1%

Observation: Animal terms (cat, kitten, dog) cluster together with high similarity. Tech terms (computer, laptop) form a separate cluster. This is semantic organization!

⚙️ Practical Considerations

Vector Normalization: Normalize embeddings to unit length for consistent cosine similarity scores (many models do this automatically).

Top K Results: Retrieve the K most similar vectors (e.g., top 5). Balance recall (get enough) vs relevance (not too many).

Similarity Threshold: Filter results below a minimum similarity score (e.g., 0.7) to ensure quality.

Metadata Filtering: Combine vector search with attribute filters ("Find similar documents WHERE author = 'Alice'").

← Embeddings FundamentalsPrev