Model Selection
Learn how to select appropriate LLMs for fine-tuning.
We'll cover the following...
Choosing the right LLM for fine-tuning
When fine-tuning an LLM for our specific task, it’s crucial to consider various factors before selecting a model. Let’s look at the initial considerations.
Model size
LLMs are available in various sizes, and the size of a model directly affects its computational demands. Larger models typically offer better performance but require substantial computational power to operate. Depending on our requirements, we might opt for a smaller model like GPT-2, which has 124 million parameters and is more lightweight, or choose a more powerful option like Llama 2, which has 70 billion parameters and provides a higher level of performance.
Pretraining
The pretraining dataset forms the foundation of the model’s initial knowledge and greatly influences its understanding of user prompts and outputs. An LLM trained on a diverse and extensive dataset, such as internet text, will have a broad knowledge base, making it versatile across various topics. One such dataset is the Common Crawl dataset, created by archiving various publicly available websites. The Common Crawl dataset was used to train GPT-3, LLaMA, and T5, which are renowned for their versatility.