We’ve explored the fundamentals of RAG and how it addresses the limitations of LLMs. Let’s dive deeper into RAG systems’ different architectures or paradigms one by one, starting by examining naive RAG in detail.

Overview

Naive RAG is a simplified approach to using LLMs in conjunction with document retrieval for improved information access and response generation. Naive RAG works in three basic steps:

  1. Indexing: Data from formats like PDF or HTML is cleaned up and converted into plain text. This text is then divided into smaller parts (chunks) and turned into vector representations by passing the chunks into the embedding model to make it easier to find later.

  2. Retrieval: When someone asks a question, the RAG system turns that question into vector embedding using the same method used in indexing. Then, it compares this vector to the vectors of the indexed text parts to find the kk most similar chunks. These kk most similar chunks are used in the next step as a context.

  3. Generation: The system combines the retrieved text parts (context) with the original question to create a prompt. The language model uses this prompt to answer the question. Depending on the question, the model might use its own knowledge or focus on the information it found earlier.

Get hands-on with 1200+ tech skills courses.