Choosing the right LLM for fine-tuning

When fine-tuning an LLM for our specific task, it’s crucial to consider various factors before selecting a model. Let’s look at the initial considerations.

Model size

LLMs are available in various sizes, and the size of a model directly affects its computational demands. Larger models typically offer better performance but require substantial computational power to operate. Depending on our requirements, we might opt for a smaller model like GPT-2, which has 124 million parameters and is more lightweight, or choose a more powerful option like Llama 2, which has 70 billion parameters and provides a higher level of performance.

Press + to interact

Pretraining

The pretraining dataset forms the foundation of the model’s initial knowledge and greatly influences its understanding of user prompts and outputs. An LLM trained on a diverse and extensive dataset, such as internet text, will have a broad knowledge base, making it versatile across various topics. One such dataset is the Common Crawl dataset, created by archiving various publicly available websites. The Common Crawl dataset was used to train GPT-3, LLaMA, and T5, which are renowned for their versatility.

Press + to interact

Course Overview

Getting Started with LLMs

Fine-Tuning LLMs

Wrap Up

Exploring OpenAI API

Model Selection

Choosing the right LLM for fine-tuning

Model size

Pretraining

Limitations