Vector Databases: From Embeddings to Applications/

...

HNSW Indexing in Vector Databases for Performance Optimization

Learn about HNSW, a popular indexing method used in vector databases for efficient search.

We'll cover the following...

The challenge of searching through large datasets
Indexing for faster searches
- Hierarchical navigable small world (HNSW) graph
  - Navigable small world graph (NSW)
    - Building NSW graphs
    - Performing ANN search with NSW graphs
  - Skip lists
- Integrating multi-level navigation from skip lists into NSW
Challenges of high-dimensional data
Data compression and dimensionality reduction
- Product quantization (PQ)
  - Understanding the process of product quantization
- Principal component analysis (PCA)

Imagine we need to find similar embeddings for a query from a massive collection of embeddings stored locally or in a database. Without any indexing mechanism, this search would involve comparing the query embedding with each stored embedding individually, resulting in a search process that takes linear time proportional to the total number of embeddings. For large datasets, like those on the World Wide Web, this would require an exhaustive comparison, making the process extremely slow and impractical.

Press + to interact

Indexing for faster searches

Databases use indexing to speed up the search process. Indexing is the process of organizing data to improve the speed and efficiency of retrieval operations. An index acts like a roadmap or a pointer that helps quickly locate and access the required data without searching the entire dataset sequentially. Vector databases store data in the form of vectors, where each vector represents a point in a high-dimensional space. The goal of indexing in vector databases is to quickly find vectors that are similar to or nearest to a given query vector.

Press + to interact

Traditional databases use indexing methods like B-trees and hash tables, which are well-suited for scalar data types. These indexing methods are designed for efficient exact-match searches and range queries. Vector databases, on the other hand, employ specialized indexing methods optimized for high-dimensional spaces, like HNSW (hierarchical navigable small world) graphsMalkov, Yu A., and Dmitry A. Yashunin. "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs." IEEE transactions on pattern analysis and machine intelligence 42, no. 4 (2018): 824-836. and KD-treesKD_trees, keeping in mind that the latter is best for use with low- to moderate-dimensional data (up to ten dimensions). These methods are designed for similarity searches, where the goal is to find the data points closest to a given query point in the vector space. In this lesson, we'll dive deep into the HNSW, a state-of-the-art indexing method employed by various vector databases.

Hierarchical navigable small world (HNSW) graph

HNSW graphs are advanced data structures designed for approximate nearest neighbor (ANN) search. They combine the concepts of navigable small-world graphs and skip lists to efficiently search and navigate large datasets.

Navigable small world graph (NSW)

Navigable small world (NSW) graphs are data structures that facilitate efficient search and navigation in large datasets, particularly for ANN search. The concept of “small world” in navigable small-world graphs originates from the “small-world phenomenon” in social network theory, which refers to the observation that most nodes in a large network can be reached from any other node A node refers to a person in terms of a social network. In vectors databases, it refers to a vector. through a relatively small number of intermediate connections. This concept was popularized by the famous “six degrees of separation” theory, which suggests that any two people in the world are, on average, only six social connections apart. Watts and StrogatzWatts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’networks." nature 393, no. 6684 (1998): 440-442. ...

Before Getting Started

Getting Started with Vector Databases and Embeddings

Working with Vector Databases

Developing a Music Recommendation System

Wrapping Up

HNSW Indexing in Vector Databases for Performance Optimization

The challenge of searching through large datasets

Indexing for faster searches

Hierarchical navigable small world (HNSW) graph

Navigable small world graph (NSW)