Document Loaders
Use langchaingo loader component to handle HTML, text and CSV data.
We'll cover the following...
Document loaders provide a way to extract data from a configured source and convert them into a slice of schema.Document
in langchaingo
. Many loader implementations are supported, including HTML, text, PDF, CSV, and more.
A splitter works alongside a document loader to divide the document into manageable chunks. langchaingo
also defines a number of splitting strategies including:
Token-based: This implementation splits text by tokens.
Recursive: It splits texts recursively by different characters.
Markdown: It is used to parse and chunk Markdown files.
Let's take a look at how to use langchaingo
to load documents from different sources and split them for further processing.
HTML document loader
The HTML loader in langchaingo
makes it possible to load arbitrary HTML document and make it ready for further processing using other components such as chains, etc. Let's walk through an ...