Short-Term Memory
Master how AI agents manage conversation context and working memory
Your Progress
0 / 5 completedAttention: The Focus Mechanism
While context windows define what information is available, attention mechanisms determine what information matters. Think of attention as a spotlight that can illuminate different parts of the context with varying intensity.
Instead of treating all tokens equally, attention assigns weights to each token based on its relevance to the current generation step. This allows models to "focus" on important words while ignoring filler.
Interactive: Attention Weight Visualization
Watch how attention highlights relevant words. Token brightness = attention weight.
Interactive: Types of Attention
Self-Attention
Each token attends to all other tokens in the same sequence. Used within a single input (e.g., understanding a sentence).
Analogy: Reading a sentence and understanding each word based on the surrounding words.
How Attention Works (Simplified)
Step 1: Compute Scores
For each token pair, calculate a similarity score (how related they are).
score(Q, K) = Q ยท K^T / โd_kStep 2: Softmax Normalization
Convert scores to weights that sum to 1 (probability distribution).
weights = softmax(scores)Step 3: Weighted Sum
Multiply each token's value by its weight and sum them up.
output = ฮฃ (weights[i] ร V[i])Result: Each token gets a representation that's a weighted combination of all tokens, with relevant ones contributing more.
Why Attention Matters for Agents
โ Enables
- โข Selective focus: Ignore filler, focus on key info
- โข Long-range dependencies: Connect distant tokens
- โข Parallel processing: All tokens computed at once
- โข Interpretability: Visualize what model focuses on
โ ๏ธ Limitations
- โข Quadratic complexity: O(nยฒ) for sequence length n
- โข Memory intensive: Stores attention matrix
- โข Still bounded: Can't attend beyond context window
- โข Computational cost: Expensive for very long contexts