...
/Embeddings and Vector Stores in LangChain
Embeddings and Vector Stores in LangChain
Discover different ways of storing documents and using them in your applications with LangChain
We'll cover the following...
Let’s dive right into one of the most critical pieces of building intelligent applications with LangChain: vector stores. By now, you’ve already experimented with language models to generate content or answer questions. But how do we store and retrieve text data in a way that actually captures its meaning? That’s where vector stores shine.
What are embeddings?
First, let’s talk about embeddings, because vector stores and embeddings go hand in hand. An embedding is a numerical representation of text. If we consider your text—whether it’s a word, a sentence, or an entire document—embeddings convert it into a list of numbers (a vector) that captures the semantic meaning of that text.
An easy mental image is to think of a giant three-dimensional space (though, in practice, the space often has hundreds or thousands of dimensions). Words or sentences that are related in meaning appear “close” to each other, while unrelated text drifts farther away. For example, “kitten” would be near “cat,” but both would be quite distant from “car.”
This is powerful because machines don’t speak English or some other language; they speak numbers. By encoding words in vectors that embed their semantic relationships, we bridge the gap between human language and the mathematical manipulations computers excel at.
LangChain offers integrations with multiple embedding providers. One standout option is OpenAI, which provides state-of-the-art models like text-embedding-3-large
. Here’s a quick look at how you might use it:
from langchain_openai import OpenAIEmbeddingsembeddings = OpenAIEmbeddings(model="text-embedding-3-large")
Previously, we used Groq to access our LLM of choice; however, at the time of writing this course, no embedding model was available on Groq. We will now use OpenAI to access their library of models.
That’s all it takes to get started. This line of code loads a highly capable model that ...