Home/Blog/Generative Ai/Mastering RAG with RAPTOR: A comprehensive guide using LlamaIndex
Home/Blog/Generative Ai/Mastering RAG with RAPTOR: A comprehensive guide using LlamaIndex

Mastering RAG with RAPTOR: A comprehensive guide using LlamaIndex

Saif Ali
Sep 06, 2024
17 min read
content
What is RAG?
How RAG works
Benefits of RAG
Challenges with traditional RAG
What is RAPTOR?
How does RAPTOR work?
1. Preprocessing
2. Recursive processing
3. Tree construction
4. Retrieval (Inference)
What is LlamaIndex?
Implementation of RAPTOR using LlamaIndex
1. Setup and configuration
2. Document loading and vector store setup
3. RAPTOR pack configuration
4. Retrieval and query engine
From RAGs to riches
share

In today’s dynamic AI landscape, mastering advanced techniques like Retrieval-Augmented Generation (RAG) is crucial for Data Engineers, Data Scientists, and ML Engineers. 

RAG combines information retrieval with natural language generation to enhance AI responses with accuracy and context. However, traditional RAG methods have limitations, which are now being addressed through the innovative technique: RAPTOR (Recursive Abstractive Processing for Tree Organized Retrieval). 

Today, we’ll explore into the mechanics of RAG, its challenges, and its benefits. We’ll then discuss how RAPTOR overcomes traditional RAG challenges and utilizes LlamaIndex to advance practical implementation.

What is RAG?

RAG is a hybrid approach that combines the strengths of information retrieval and generative models to enhance the quality and relevance of generated text. Unlike traditional models that only use their training data, RAG utilizes additional context to give better responses.

How RAG works

RAG tackles the limitations of large language models (LLMs) by incorporating external knowledge into their generation process.

Here’s a breakdown of how it works:

  • Retrieval: The first step involves gathering relevant information. RAG acts like a skilled researcher when a user presents a question or prompt. It consults a vast knowledge base, which could be the entire internet, a company’s internal documents, or any other source of textual data. This retrieval process ensures that the LLM can access the most up-to-date and potentially relevant information to address the user’s query.

  • Augmentation: Imagine feeding the retrieved information directly to the LLM. It might be overwhelming! RAG employs various augmentation techniques to make this knowledge more digestible. These techniques can involve summarizing the key points of the retrieved passages or encoding them in a way the LLM can understand efficiently. This augmentation step improves the raw information so the LLM can use it better.

  • Generation: The LLM generates the response with its inherent understanding of language and the augmented knowledge from the retrieval stage. This response can take various forms depending on the user’s intent. It could be a direct answer to a question, a creative text format inspired by a prompt, or any other kind of textual output. By combining its language skills with augmented knowledge, the LLM aims to deliver a response that is not just creative but also grounded in factual accuracy.

The RAG workflow
The RAG workflow

Benefits of RAG

RAG offers several advantages over traditional generative models:

  • Enhanced accuracy and relevance: RAG incorporates external information during generation, leading to more accurate and contextually relevant responses. This reduces the risk of factual errors or irrelevant information commonly found in models trained on static data.

  • Improved knowledge coverage: Unlike models limited to their training data, RAG can access and leverage up-to-date information from external sources. This expands the model’s knowledge base and ensures responses reflect current information.

  • Reduced hallucinations: Generative models can sometimes generate believable but incorrect information (“hallucinations”). RAG, which uses real-world data, helps reduce this problem by promoting factual responses.

  • Increased adaptability: RAG models can be tailored to specific domains by incorporating relevant knowledge bases and retrieval techniques. This allows them to excel in areas like customer support (company policies) or legal research (case law).

  • Enhanced user trust: RAG’s ability to cite sources builds user trust by demonstrating transparency and accountability. Users can verify the information and dive deeper if desired.

  • Cost-effective development: RAG leverages pretrained generative models like GPT-3, GPT-4, Gemini, or Claude, reducing development costs compared to building a model from scratch. Additionally, the focus on retrieval allows for efficient updates by incorporating new information sources.

Challenges with traditional RAG

RAG provides a strong method, but it faces challenges in some situations:

  • Context deficiency with long documents: Dividing long documents into uniform chunks for retrieval (a common practice) disrupts information flow and makes it challenging for the LLM to grasp the overarching context. Essential relationships between concepts spread across chunks might be overlooked, leading to inaccurate or incomplete responses.

  • Flat retrieval structure: In standard RAG, all retrieved information is treated equally when generating responses. This method doesn’t recognize that important information could be buried deep within the documents. As a result, the LLM may struggle to prioritize and use that information effectively.

  • Limited reasoning and fact-checking: While RAG can access external information, its ability to reason over that information or perform robust fact-checking can be limited. This can lead to outputs that combine factual elements with inconsistencies or illogical connections.

  • Bias and fairness: The quality and bias inherent in the retrieved documents can be reflected in the RAG output.

  • Interpretability and explainability: Understanding how RAG generates its outputs can be difficult because of the complex interaction between retrieving information and generating responses. This lack of interpretability can make it harder to debug and build trust in the RAG system.

What is RAPTOR?

RAPTOR (Recursive Abstractive Processing for Tree Organized RetrievalSarthi, Parth, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. "Raptor: Recursive abstractive processing for tree-organized retrieval." arXiv preprint arXiv:2401.18059 (2024).) is a new LLM information retrieval approach. It involves building a tree structure from documents, allowing the model to consider information at different levels of detail. This is claimed to be more efficient and context-aware than traditional methods.

How does RAPTOR work?

The following illustration elaborates on how RAPTOR works:

The RAPTOR workflow
The RAPTOR workflow

Here’s a breakdown of the RAPTOR algorithm step-by-step:

1. Preprocessing

The document is segmented into smaller units like sentences or paragraphs. These units are then converted into dense vector embeddings, numerical representations capturing the document’s semantic meaning. This allows for efficient similarity comparisons during retrieval.

RAPTOR is specifically designed to work with textual data. This means it is best suited for processing and analyzing information presented in written format. Keep this in mind as you explore the capabilities of RAPTOR.

2. Recursive processing

This iterative core process refines the document representation:

  • Clustering: A clustering algorithm, typically based on Gaussian Mixture Models (GMMs), groups similar text chunks together. This helps organize related information for better summarization.

  • Model-based summarization: Each cluster is sent to an LLM like GPT-3. The LLM generates a concise and informative summary of the text within the cluster.

  • Re-embedding: The summaries created by the LLM are then converted back into numerical representations suitable for further processing.

3. Tree construction

After multiple rounds of clustering and summarization (controlled recursion depth), a hierarchical tree is built:

  • Leaf nodes: Original text chunks form the base of the tree.

  • Summary nodes: As you move up the tree, each node represents a concise summary of its children, capturing the essence of the sub-document it represents.

  • Hierarchical embeddings: Each node in the tree can also be associated with its own vector embedding, capturing the summarized meaning at that level.

This multi-layered representation, with both textual summaries and vector embeddings, allows for efficient retrieval at various levels of detail.

4. Retrieval (Inference)

Given a query, RAPTOR employs two primary retrieval mechanisms for navigating the tree and retrieving relevant information:

  • Tree traversal retrieval: This approach systematically explores the tree structure, starting from the root node and progressing down the branches.

    • Beam searchBeam search is an algorithm used in natural language processing and other areas involving search problems. It is an approximate search method that keeps track of only the top k best candidates at each step, making it efficient in terms of time and memory. By selecting the most promising paths, it aims to find a high-quality solution without exploring every possibility​. is a common technique used for tree traversal. It considers a limited number of the most promising branches at each level, focusing the search on the most relevant parts of the tree based on the query.

  • Collapsed tree retrieval: This simplified approach views the tree as a single layer, directly comparing the query embedding to the vector embeddings of all leaf nodes (original text chunks) and summary nodes. This is suitable for factual, keyword-based queries where specific details are needed.

Tree traversal and collapsed tree retrieval mechanisms (source: Parth Sarthi, Recursive Abstractive Processing for Tree Organized Retrieval)
Tree traversal and collapsed tree retrieval mechanisms (source: Parth Sarthi, Recursive Abstractive Processing for Tree Organized Retrieval)

RAPTOR’s ability to choose the appropriate retrieval mechanism based on query complexity and utilize both textual summaries and vector embeddings empowers it to retrieve information at the optimal level of abstraction, satisfying diverse query needs.

What is LlamaIndex?

LlamaIndex is a powerful toolkit designed to enhance the capabilities of LLMs by enabling efficient data retrieval and manipulation from various sources. It allows LLMs to perform tasks such as question answering, text summarization, and knowledge base construction more accurately and effectively.

Here are some key features of LlamaIndex:

  • Data integration: Connects LLMs with diverse data sources like databases, APIs, and files.

  • Efficient retrieval: Optimizes data retrieval processes to ensure quick access to relevant information.

  • Custom indexing: Supports the creation of custom indexes tailored to specific tasks and datasets.

  • Scalability: Handles large volumes of data, making it suitable for extensive applications.

  • Flexible querying: Allows for complex queries to enhance LLMs’ understanding and response generation.

  • Ease of use: Provides user-friendly interfaces and tools for seamless integration with existing systems.

Implementation of RAPTOR using LlamaIndex

This section explains the implementation of RAPTOR using LlamaIndex, and steps are as follows:

1. Setup and configuration

Setting up our environment and configuring the necessary parameters is crucial before diving into the implementation. This step ensures that we have all the required tools and settings for a smooth implementation process.

Let’s start by creating a configuration file to store sensitive information and adjustable parameters:

openai_api_key: "your_api_key_here"
models:
embedding: "text-embedding-3-small"
llm: "gpt-3.5-turbo"
chunk_size: 400
chunk_overlap: 50
similarity_top_k: 2
mode: "tree_traversal"
temperature: 0.1

Code explanation:

  • Line 1: Specifies the API key for accessing OpenAI’s services.

  • Lines 2–4: Define the models used: "text-embedding-3-small" for embeddings and "gpt-3.5-turbo" for language tasks.

  • Line 5: Sets the size of text chunks to 400 tokens for processing.

  • Line 6: Indicates an overlap of 50 tokens between chunks to maintain context.

  • Line 7: Specifies retrieving the top 2 most similar chunks during searches.

  • Line 8: Uses a tree traversal method for navigating and processing data.

  • Line 9: Sets the language model’s randomness level, with a low value for more deterministic responses.

Now, let’s implement the setup code:

import yaml
import os
import logging
from typing import Dict, Any
# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
def load_config(config_path: str = 'config.yaml') -> Dict[str, Any]:
"""Load configuration from YAML file."""
try:
with open(config_path, 'r') as file:
config = yaml.safe_load(file)
return config
except FileNotFoundError:
logger.error(f"Configuration file not found: {config_path}")
raise
except yaml.YAMLError as e:
logger.error(f"Error parsing YAML configuration: {e}")
raise
# Load configuration
config = load_config("config.yaml")
# Set OpenAI API key
os.environ["OPENAI_API_KEY"] = config['openai_api_key']
# Install required packages
!pip install -q llama-index llama-index-packs-raptor llama-index-vector-stores-chroma
# Download the RAPTOR paper
!wget -q https://arxiv.org/pdf/2401.18059.pdf -O ./raptor_paper.pdf
logger.info("Setup completed successfully.")

Code explanation:

  • Lines 1–4: Import necessary libraries for the program to function. yaml is imported to read YAML configuration files, os to interact with the operating system (setting environment variables), logging for structured logging messages and Dict, Any from typing for type hints.

  • Lines 7–8: Set up basic logging configuration. logging.basicConfig() configures logging to display messages with an INFO level or higher, formatted with a timestamp, logging level, and message content. logger = logging.getLogger(__name__) creates a logger object specific to the current module.

  • Lines 10–21: Define the load_config function, which loads configuration from a YAML file. The function attempts to open and parse the YAML file specified by config_path. It uses a try...except block to handle FileNotFoundError if the file is not found or yaml.YAMLError for errors during parsing. Successful loading returns the configuration as a dictionary (Dict[str, Any]).

  • Line 24: Calls the load_config function with a specific path to load the configuration into the config variable.

  • Line 27: Sets the OpenAI API key by retrieving openai_api_key from the config dictionary. This sets an environment variable needed for interactions with OpenAI services.

  • Line 30: Silently installs required Python packages using pip install -q within a Jupyter Notebook environment.

  • Line 33: Silently downloads a PDF file from a specified URL and saves it locally as raptor_paper.pdf.

  • Line 35: Logs an informational message using the configured logger indicating that the setup process has been completed without errors. This helps track the procedure’s progress and status.

2. Document loading and vector store setup

We’ll load the document and set up the vector store in this step. This is crucial in preparing our data for efficient retrieval and processing.

import nest_asyncio
from llama_index.core import SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
# Apply nest_asyncio to allow asynchronous operations in Jupyter notebooks
nest_asyncio.apply()
def load_document(file_path: str):
"""Load document from file."""
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
logger.info(f"Document loaded successfully: {file_path}")
return documents
def setup_vector_store(db_path: str, collection_name: str):
"""Set up ChromaDB vector store."""
client = chromadb.PersistentClient(path=db_path)
collection = client.get_or_create_collection(collection_name)
vector_store = ChromaVectorStore(chroma_collection=collection)
logger.info(f"Vector store set up successfully: {collection_name}")
return vector_store
# Load document
documents = load_document("./raptor_paper.pdf")
# Setup vector store
vector_store = setup_vector_store("./raptor_paper_db", "raptor")

Code explanation:

  • Lines 1–4: Import libraries essential for document processing and vector storage. nest_asyncio is imported to enable asynchronous operations in Jupyter Notebooks, SimpleDirectoryReader from llama_index.core is used to read documents from a directory, ChromaVectorStore from llama_index.vector_stores.chroma facilitates storing and retrieving document vectors via ChromaDB, and chromadb provides functionality for interacting with ChromaDB.

  • Line 7: Applies the nest_asyncio library to enable asynchronous operations in the current Jupyter Notebook environment. This setup allows concurrent task execution without blocking.

  • Lines 9–13: Define the load_document function, which loads a document from a specified file path. It uses SimpleDirectoryReader to read the document specified by file_path, logs a success message using the logger (logger.info()), and returns the loaded document (documents).

  • Lines 15–21: Define the setup_vector_store function, which sets up a ChromaDB vector store. It creates a PersistentClient object from chromadb to interact with the database located at db_path. It then retrieves or creates a collection named collection_name using client.get_or_create_collection(). A ChromaVectorStore object is instantiated using the obtained collection, logs a success message, and returns the created vector_store.

  • Line 24: Invokes the load_document function with the path ./raptor_paper.pdf to load the RAPTOR paper document content into the documents variable.

  • Line 27: Calls setup_vector_store with database path ./raptor_paper_db and collection name raptor to initialize a ChromaDB vector store named "raptor". The resulting vector_store object is set up for storing and retrieving document vectors in ChromaDB.

3. RAPTOR pack configuration

Now that we have loaded our document and vector store, we’ll configure the RAPTOR pack. This step involves setting up the core components of the RAPTOR system.

from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.packs.raptor import RaptorPack
def create_raptor_pack(documents, vector_store, config):
"""Create and configure RAPTOR pack."""
pack = RaptorPack(
documents,
embed_model=OpenAIEmbedding(model=config['models']['embedding']),
llm=OpenAI(model=config['models']['llm'], temperature=config['temperature']),
vector_store=vector_store,
similarity_top_k=config['similarity_top_k'],
mode=config['mode'],
transformations=[SentenceSplitter(chunk_size=config['chunk_size'], chunk_overlap=config['chunk_overlap'])],
)
logger.info("RAPTOR pack created successfully.")
return pack
# Create RAPTOR pack
raptor_pack = create_raptor_pack(documents, vector_store, config)

Code explanation:

  • Lines 1–4: Import components necessary for building and configuring a RAPTOR pack. SentenceSplitter from llama_index.core.node_parser facilitates document segmentation into sentences, OpenAI from llama_index.llms.openai provides an interface for using OpenAI’s LLMs, OpenAIEmbedding from llama_index.embeddings.openai generates document embeddings using OpenAI’s models and RaptorPack from llama_index.packs.raptor represents a pre-configured workflow for RAPTOR.

  • Lines 6–18: Define the create_raptor_pack function. It takes three arguments:

    • documents: Represents the loaded documents intended for processing within the RAPTOR pack.

    • vector_store: Refers to the ChromaDB vector store object previously created, used for storing and retrieving document vectors.

    • config: A dictionary containing various configuration settings for customizing the RAPTOR pack.

    • Within this function:

      • RaptorPack(...) initializes a RaptorPack object using the following parameters:

RaptorPack parameters

Parameter

Description

documents

Loaded documents for processing.

embed_model=OpenAIEmbedding(model=config['models']['embedding'])

Specifies the embedding model to use based on the configuration.

llm=OpenAI(model=config['models']['llm'], temperature=config['temperature'])

Sets up the OpenAI LLM model with specified parameters like model name and temperature.

vector_store=vector_store

Assigns the ChromaDB vector store object for managing document vectors.

similarity_top_k=config['similarity_top_k']

Determines the number of most similar documents considered during retrieval based on the configuration.

mode=config['mode']

Defines the operational mode of the RAPTOR pack, influencing its behavior.

transformations=[SentenceSplitter(chunk_size=config['chunk_size'], chunk_overlap=config['chunk_overlap'])]

Specifies transformations to apply to documents, here using SentenceSplitter with parameters from the configuration to segment documents into manageable parts.

    • logger.info("RAPTOR pack created successfully."): Logs an informational message confirming the successful creation of the RAPTOR pack.

    • return pack: Returns the initialized RaptorPack object, ready for further use in specific tasks as defined by the configured mode.

  • Line 21: Calls the create_raptor_pack function with documents, vector_store, and config as arguments, resulting in instantiating a specific RAPTOR pack configured according to the provided documents, vector store, and configuration settings. This raptor_pack instance is now ready for performing tasks such as document retrieval and processing within the specified environment.

4. Retrieval and query engine

In this final section, we’ll set up the retriever and query engine, which will allow us to perform queries on our processed document.

from llama_index.packs.raptor import RaptorRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from typing import List, Tuple
def create_raptor_retriever(vector_store, config):
"""Create RAPTOR retriever."""
retriever = RaptorRetriever(
[],
embed_model=OpenAIEmbedding(model=config['models']['embedding']),
llm=OpenAI(model=config['models']['llm'], temperature=config['temperature']),
vector_store=vector_store,
similarity_top_k=config['similarity_top_k'],
mode=config['mode'],
)
logger.info("RAPTOR retriever created successfully.")
return retriever
def create_query_engine(retriever, config):
"""Create query engine."""
query_engine = RetrieverQueryEngine.from_args(
retriever,
llm=OpenAI(model=config['models']['llm'], temperature=config['temperature'])
)
logger.info("Query engine created successfully.")
return query_engine
def run_multiple_queries(query_engine, queries: List[str]) -> List[Tuple[str, str]]:
"""Run multiple queries and return results."""
results = []
for query in queries:
response = query_engine.query(query)
results.append((query, str(response)))
logger.info(f"Query processed successfully: {query}")
return results
# Create retriever and query engine
retriever = create_raptor_retriever(vector_store, config)
query_engine = create_query_engine(retriever, config)
# Example usage
queries = [
"What baselines was RAPTOR compared against and why?",
"What are the main advantages of RAPTOR over traditional retrieval methods?",
"How does RAPTOR handle long documents?"
]
results = run_multiple_queries(query_engine, queries)
# Print results
for query, response in results:
print(f"Query: {query}")
print(f"Response: {response}")
print("-" * 50)

Code explanation:

  • Lines 1–3: Import necessary components for setting up a retrieval system and processing queries. RaptorRetriever from llama_index.packs.raptor likely implements document retrieval functionality within the RAPTOR framework. RetrieverQueryEngine from llama_index.core.query_engine represents a query engine capable of interacting with retrievers and potentially language models for query processing.

  • Lines 5–16: Define the create_raptor_retriever function:

    • Takes vector_store, representing the ChromaDB vector store, and config, a dictionary holding configuration settings.

    • Creates a RaptorRetriever object (retriever) with parameters:

RaptorRetriever parameters

Parameter

Description

[]

An empty list ([]) as a placeholder for documents.

embed_model=OpenAIEmbedding(model=config['models']['embedding'])

Specifies the embedding model for generating document embeddings based on the configuration.

llm=OpenAI(model=config['models']['llm'], temperature=config['temperature'])

Sets up the OpenAI language model using the model name and temperature from the configuration.

vector_store=vector_store

Assigns the ChromaDB vector store for document vector management.

similarity_top_k=config['similarity_top_k']

Determines the number of similar documents considered during retrieval.

mode=config['mode']

Defines the operational mode of the retriever based on the configuration.

    • Logs successful creation of the retriever using logger.info.

    • Returns the initialized retriever object.

  • Lines 18–25: Define the create_query_engine function:

    • Takes retriever, the RaptorRetriever object, and config, the configuration dictionary.

    • Uses RetrieverQueryEngine.from_args() to instantiate a query engine (query_engine) with:

      • retriever: The previously created retriever object.

      • llm=OpenAI(model=config['models']['llm'], temperature=config['temperature']): Configures an OpenAI LLM for query processing.

    • Logs successful creation of the query engine using logger.info.

    • Returns the initialized query_engine object.

  • Lines 27–34: Define the run_multiple_queries function:

    • Takes query_engine, the query engine object, and queries, a list of strings representing user queries.

    • Initializes an empty list results to store query-response pairs.

    • Iterates through each query in queries:

      • Uses query_engine.query(query) to process each query, likely retrieving relevant documents and generating a response using the configured LLM.

      • Appends a tuple (query, str(response)) to results, where response is the generated response converted to a string.

      • Logs successful query processing using logger.info.

    • Returns results, a list of tuples containing the original queries and corresponding responses.

  • Lines 37–38: Call create_raptor_retriever with vector_store and config to instantiate a specific RaptorRetriever for use in the subsequent steps.

  • Lines 41–45: Define example queries related to the RAPTOR research paper stored in queries, setting up different inquiries about RAPTOR’s capabilities or comparisons.

  • Line 47: Calls run_multiple_queries with query_engine and queries to process each query using the configured retrieval and query engine setup.

  • Lines 50–53: Iterate through results, printing each query and its corresponding response:

    • Prints the original query with print(f"Query: {query}").

    • Prints the generated response from the query engine with print(f"Response: {response}").

    • Separates each query-response pair with a line of dashes print("-" * 50) for clarity.

Here is the output generated by the RAPTOR code above:

Query: What baselines was RAPTOR compared against and why?
Response: RAPTOR was compared against BM25 and DPR as baselines. This comparison was conducted to showcase RAPTOR's superior performance in information retrieval tasks, particularly on datasets like QASPER. The reason for comparing against these baselines was to highlight RAPTOR's ability to outperform methods that can only extract top similar raw text chunks, as RAPTOR's hierarchical summarization approach allows it to capture a broader range of information, from general themes to specific details, leading to better overall performance.
--------------------------------------------------
Query: What are the main advantages of RAPTOR over traditional retrieval methods?
Response: RAPTOR's main advantages over traditional retrieval methods include its hierarchical tree structure that allows for synthesizing information across different sections of retrieval corpora, its ability to handle a wider range of questions by providing both original text and higher-level summaries for retrieval, and its effectiveness in leveraging the full tree structure for more efficient retrieval during the query phase. Additionally, RAPTOR outperforms traditional retrieval methods and sets new performance benchmarks on various question-answering tasks based on controlled experiments.
--------------------------------------------------
Query: How does RAPTOR handle long documents?
Response: RAPTOR handles long documents by segmenting the retrieval corpus into short, contiguous texts of a specific length, typically 100 tokens. If a sentence exceeds this limit, the entire sentence is moved to the next chunk to maintain contextual and semantic coherence. These chunks are then embedded using SBERT, forming the leaf nodes of a tree structure. RAPTOR employs a clustering algorithm to group similar text chunks, followed by summarization using a Language Model. This cycle of embedding, clustering, and summarization continues until further clustering becomes infeasible, resulting in a structured, multi-layered tree representation of the original documents.

Check out the official RAPTOR GitHub repository for more information and resources: RAPTOR GitHub Repository.

Here’s a comparison table highlighting why RAPTOR is considered superior to traditional RAG methods:

RAG with RAPTOR vs. Traditional RAG

Aspect

RAG with RAPTOR

Traditional RAG

Retrieval structure

Hierarchical tree structure for synthesizing information across sections.

Linear retrieval of top similar chunks without hierarchical context.

Information synthesis

Combines original text and high-level summaries, providing a deeper understanding.

Focuses primarily on extracting top similar raw text chunks.

Handling long documents

Segments texts into manageable chunks, clusters, and summarizes, creating a layered tree.

Processes documents linearly, often struggling with lengthy texts.

Performance on QA tasks

Consistently outperforms traditional methods, setting new benchmarks on various datasets.

Traditional methods often rely on linear retrieval of top similar text chunks, which can result in less comprehensive information retrieval.

Scalability

Scales linearly with document size in terms of build time and token use.

May struggle with scalability due to lack of hierarchical organization.

Flexibility

Adapts to different query complexities by selecting appropriate tree nodes.

Limited flexibility; retrieves based on direct text similarity.

Integration with retrievers

Enhances performance when combined with models like SBERT, outperforming standalone retrievers.

Traditional methods do not inherently improve when combined with other retrievers.

From RAGs to riches

(Sorry, we had to.)

Now that you know a bit about RAPTOR, we hope you feel better equipped to master RAG and its various techniques.

Are you ready to gain more hands-on skills with RAG?

If so, here are a few courses you may find interesting:

You can also start building with RAG through Educative Projects, which guide you through creating tangible outcomes for your portfolio (without the setup):

Happy learning!