What Are Foundation Models?
Understand what foundation models are, what they can do, how they are made, and their implications.
Have you ever watched a skilled chef at work? A skilled chef can whip up hundreds of delicious dishes with just a few essential ingredients. In contrast, someone who only knows how to cook from one specific recipe book will struggle whenever they’re faced with a dish that’s not listed in their book. This is precisely the difference between traditional AI models and what we now call foundation models.
Models like GPT can generate fluent text by understanding context and intent. What if we now tell you that even GPT is a foundation model? That’s right! GPT isn’t just another AI system—it’s an example of a powerful, adaptable foundation model capable of a wide range of tasks with minimal extra training. But what exactly does that mean? And why is this concept so groundbreaking in AI?
In this lesson, we’ll briefly introduce foundation models, explain their core principles, highlight what makes them so special, and touch on the various foundation models you’ll encounter later in this chapter.
What exactly are foundation models?
Traditionally, AI models were carefully designed and trained from scratch to accomplish a single, specialized task—such as distinguishing between images of cats and dogs, translating languages, or detecting spam emails. While these traditional models performed their jobs well, their skills were limited. Every time you needed something new, you had to start from scratch.
The term “foundation model” emerged precisely because AI now goes beyond language alone, spanning various domains like vision, audio, and multimodal applications. The diversity of these models underscores their expansive capabilities and potential.
Foundation models completely flip this approach. Instead of starting from zero each time, foundation models are trained once on vast and diverse datasets—including enormous amounts of text, images, audio, and code. They capture general knowledge and patterns within data, allowing them to rapidly adapt to multiple tasks.
We saw this earlier with GPT: this wasn’t a narrow model built for one job; it was a powerful general-purpose system capable of adapting to new tasks on-the-fly. That adaptability is precisely what defines foundation models.
Now, researchers and companies alike have shifted their focus toward creating better foundation models rather than traditional single-task ones. Techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) are behind these remarkable capabilities.
Educative byte: As AI becomes prevalent, governments and industry bodies are working to define and regulate foundation models due to their power and potential impacts. These models are important enough that people are paying attention!
Why do foundation models matter?
You might be thinking, “Okay, these models sound versatile and impressive, but what’s so revolutionary about them?” The excitement around foundation models boils down to three main factors: scale, emergent abilities, and general-purpose nature. Let’s take a look at them one by one:
What does scale mean in the context of foundation models?
We often hear about models measured by their parameter count; for instance, GPT-4.5 is reported to have trillions. While a larger parameter count can improve performance, it’s not the only factor that matters. When we talk about scale in foundation models, we’re talking about two main things:
Model size: Foundation models like GPT-4.5 can have billions or even trillions of parameters (those adjustable “knobs” inside the model). More parameters allow models to capture richer, deeper patterns and subtleties.
Training data size: The training data is enormous—think of training on the entirety of Wikipedia, ...