Quantized Low-Rank Adaptation (QLoRA)

Learn about the components and working of the Quantized Low-Rank Adaptation (QLoRA) technique.

We'll cover the following...

Components of QLoRA

The following are the three main components of QLoRA:

4-bit NormalFloat quantization
Double quantization
Paged optimizers

Let’s dive into the details of each component

4-bit NormalFloat quantization

The NormalFloat (NF) data type is a theoretically optimal data type that uses quantile quantizationQuantile quantization is a technique that distributes the weights of the model into equal-sized segments, called quantiles or bins, using the cumulative distribution function (CDF). to ensure that each quantization bin has an equal number of values assigned from the input tensor.

QLoRA uses a special type of quantization called 4-bit NormalFloat (NF4) quantization, which compresses the model’s weights from a 32-bit floating point to a 4-bit format. Model weights, which tend to follow a normal distribution (most values are near zero), are first scaled to fit within the range of $[- 1, 1$ ...

Getting Started

Basics of Fine-Tuning

Exploring LoRA

Wrap Up

Quantized Low-Rank Adaptation (QLoRA)

Components of QLoRA

4-bit NormalFloat quantization