...

/

Retrieval Strategies: Common Document Loaders

Retrieval Strategies: Common Document Loaders

Learn how to use LangChain offline document loaders.

Document loaders

Document loaders in LangChain play an essential role in integrating diverse data sources into chatbot frameworks and other AI applications.

Press + to interact
RAG workflow: Document loaders
RAG workflow: Document loaders

These tools facilitate the import and processing of structured and unstructured data from an array of document types, including CSV, JSON, text, Microsoft Office formats, and PDFs. By providing easy access to such a variety of data inputs, LangChain expands the potential use cases for developers.

Types of document loaders

We will now experiment with a few of the many LangChain document loaders.

CSV loader

The code below demonstrates how to load and display data from comma-separated values (CSV) files using a specific loader from the langchain_community library. CSV files are a common format for storing tabular data, where each line represents a row in the table, and fields are separated by commas. This allows users to load CSV databases and analyze them using chatbots and natural language.

Run the below code to try the module:

Press + to interact
# Import libraries
from langchain_community.document_loaders import CSVLoader
# Define the path to the file and load the file
file_path = '/usercode/GDP.csv'
loader = CSVLoader(file_path=file_path)
data = loader.load()
# Print the data
for content in data:
print(content.page_content)
print('-'*80)

In this code, we perform the following steps:

  • Lines 1–2: We import the CSVLoader from the langchain_community.document_loaders.csv_loader module.

  • Lines 4–7: We define the path to the required file, and we use the CSVLoader to load the file. We use loader.load() to return the loaded data, which is then stored in the variable data.

  • Lines 9–12: We iterate through the loaded data, and we print the page content.

JSON loader

The code below demonstrates how to load and display data from JSON (JavaScript Object Notation) files, specifically using the JSONLines format. JSON is a widely used format for storing and ...

Access this course and 1400+ top-rated courses and projects.