...
/Generating and Storing Embeddings in ChromaDB Using BERT
Generating and Storing Embeddings in ChromaDB Using BERT
Learn how to use BERT to generate and store embeddings of words in ChromaDB.
We'll cover the following...
Dataset
In our complete example at the end of the chapter, we will use a
Generating text embeddings
To understand how we generate word embeddings with BERT, let’s start with two short text sequences as an example before working with large datasets.
# Sample datamovie_info = ["Titanic is a 1997 American epic romantic disaster film directed, written, co-produced, and co-edited by James Cameron.""Incorporating both historical and fictionalized aspects, it is based on accounts of the sinking of RMS Titanic in 1912."]
Our task is to generate embeddings for each word in both sequences.
Step 1: Data preprocessing
The first step is to preprocess the text. Preprocessing involves