Haystack Agents

Master Haystack for building production-ready RAG agents and NLP pipelines

Building RAG Agents with Haystack

RAG (Retrieval-Augmented Generation) agents combine semantic search with LLM generation to provide accurate, grounded responses based on your documents. Haystack makes building these agents straightforward.

Complete RAG Pipeline

Full RAG Implementation
from haystack import Pipeline, Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

# Initialize document store and index documents
document_store = InMemoryDocumentStore()
documents = [
    Document(content="RAG combines retrieval with generation for accurate answers."),
    Document(content="Haystack is an open-source framework for NLP pipelines."),
    # ... more documents
]

# Embed and write documents
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs_with_embeddings = doc_embedder.run(documents)
document_store.write_documents(docs_with_embeddings["documents"])

# Build RAG pipeline
rag_pipeline = Pipeline()

# Add components
rag_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=5))
rag_pipeline.add_component("ranker", TransformersSimilarityRanker(top_k=3))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template="""
Answer the question based on the provided context.

Context:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}

Question: {{ question }}
Answer:
"""))
rag_pipeline.add_component("generator", OpenAIGenerator(api_key=api_key, model="gpt-4"))

# Connect pipeline
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "ranker.documents")
rag_pipeline.connect("ranker.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "generator.prompt")

# Run query
result = rag_pipeline.run({
    "text_embedder": {"text": "What is RAG?"},
    "prompt_builder": {"question": "What is RAG?"}
})

print(result["generator"]["replies"][0])

Interactive: RAG Pipeline Execution

Watch how a RAG pipeline processes a query step-by-step:

Advanced RAG Patterns

🔄 Conversational RAG

Maintain conversation history and context across multiple turns for chat-like experiences.

# Add memory component
from haystack.components.memory import ConversationMemory

pipeline.add_component("memory", ConversationMemory())
pipeline.connect("memory.history", "prompt_builder.history")

🎯 Filtered Retrieval

Apply metadata filters to retrieve only relevant document subsets (by date, author, category).

# Filter by metadata
result = pipeline.run({
    "retriever": {
        "filters": {"author": "John Doe", "year": {"$gte": 2023}}
    }
})

📊 Hybrid Search

Combine BM25 keyword search with semantic embeddings for best of both worlds.

# Use both BM25 and embedding retrieval
pipeline.add_component("bm25_retriever", InMemoryBM25Retriever())
pipeline.add_component("embedding_retriever", InMemoryEmbeddingRetriever())
pipeline.add_component("joiner", DocumentJoiner())

🎯 RAG Best Practices

  • Chunk documents wisely: 200-500 words per chunk balances context and precision
  • Use re-ranking: Improves precision by filtering retrieved docs before LLM
  • Include metadata: Date, author, source help with filtering and attribution
  • Monitor retrieval quality: Track if relevant docs are being retrieved
Prev