Dataset

In our complete example at the end of the chapter, we will use a moviehttps://www.kaggle.com/datasets/rounakbanik/the-movies-dataset dataset to generate text embeddings and perform a semantic search to find movies matching a given query. We are only creating embeddings for the first fifty movie descriptions in the dataset with columns "genre," and "title" to facilitate faster code execution.

Generating text embeddings

To understand how we generate word embeddings with BERT, let’s start with two short text sequences as an example before working with large datasets.

Get hands-on with 1200+ tech skills courses.