Build AI Chatbots with Open-Source LLMs, LangChain, and Streamlit/

...

Retrieval Strategies: Data Chunks Retrievers Base

Learn how to retrieve the relevant data chunks from the vector stores.

We'll cover the following...

LangChain retrievers
- Functionality
- Advanced retrieval types
Integrating retrievers
- Understanding the coding process
- Challenges and considerations

Press + to interact

Retrievers work by accepting a string query and processing it to return a list of documents that best match the query’s intent. This operation is essential for applications ranging from content discovery platforms to automated customer service solutions, where accuracy and relevance in document retrieval are core.

LangChain introduces an array of advanced retrieval types, each designed to cater to distinct needs and scenarios. These retrieval mechanisms are distinguished by their unique attributes, ranging from the underlying index types they rely on to their use of large language models for enhanced query understanding and document retrieval. LangChain’s suite of retrievers offers versatile solutions.

Name	Uses an LLM	When to Use
Vectorstore	No	If we are just getting started and looking for something quick and easy.
ParentDocument	No	If our pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together.
Multi Vector	Sometimes	If we are able to extract information from documents that we think is more relevant to index than the text itself.
Self Query	Yes	If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.
Contextual Compression	Sometimes	If we are finding that our retrieved documents contain too much irrelevant information and are distracting the LLM.
Time-Weighted Vectorstore	No	If we have timestamps associated with our documents, and we want to retrieve the most recent ones.
Multi-Query Retriever	Yes	If users are asking questions that are complex and require multiple pieces of distinct information to respond.
Ensemble	No	If we have multiple retrieval methods and want to try combining them.
Long-Context Reorder	No	If we are working with a long-context model and noticing that it's not paying attention to information in the middle of retrieved documents.

Each retriever serves a unique purpose:

Vectorstore: This store initiates embeddings for each text, facilitating a simple and accessible start for embedding-based retrieval processes. For example, a customer service chatbot for a retail website can use a vector store to quickly pull up product information and FAQs. The embeddings are created from the product descriptions and customer reviews to facilitate instant and relevant responses to user queries about product features.
ParentDocument: This indexes documents by chunks but retrieves the entire document based on the similarity of those chunks, optimizing for contexts where complete documents are more ...

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons→

Preview Free Lessons

Introduction to Building Chatbots

Understanding Transformers

Understanding Large Language Models (LLMs)

Data Collection and Preparation

Optimizing RAG Workflows with LangChain

Prompt Engineering and Retrieval Chains

Chatbot User Interface Development with Streamlit

Chatbot Integration and Evaluation

Capstone Project

Conclusion and Future Developments

Retrieval Strategies: Data Chunks Retrievers Base

LangChain retrievers

Functionality

Advanced retrieval types