What Are Foundation Models?
Understand what foundation models are, what they can do, how they are made, and their implications.
Today’s most powerful AI systems—like ChatGPT, DeepSeek, and Gemini—have one thing in common: they’re built on foundation models. Unlike traditional AI models designed for specific tasks—like identifying cats in photos or translating English to Hispanic—foundation models are different.
Defining foundation models
While there is no consensus on how exactly to define a foundation model, the most agreed-upon definition is that a foundation model is a large AI model trained on massive amounts of diverse data, often using self-supervised or unsupervised learning techniques. Because they’re trained on large datasets, these models learn to recognize patterns, structures, and relationships in the data that more specialized (or smaller) models might miss.
In the past, developers would train separate models from scratch for each task—one model to classify images and another to detect spam emails. Now, we get a broad knowledge base with foundation models in a single large model. That model can then be adapted (or fine-tuned) to tackle multiple tasks with minimal additional training. This shift represents a significant leap: instead of building narrow, task-specific tools, we now have a flexible AI foundation to build upon.
As AI becomes prevalent, governments and industry bodies are working to define and regulate foundation models due to their power and potential impacts. These models are important enough that people are paying attention!
Why do foundation models matter?
You might think, “Okay, they’re big and versatile... but why does that matter? Why are people so excited about foundation models?” That’s exactly what we’re going to explore in this lesson. The real power and excitement around foundation models come from three key things: scale, emergent abilities, and their general-purpose nature. Let’s break each of these down.
Scale
We often hear about models measured by their parameter count—GPT-4, for instance, is rumored to have trillions. While a larger parameter count can improve performance, it’s not the only factor that matters. When we talk about scale in foundation models, we’re talking about two main things:
Model size: Foundation models are huge. They contain billions, even trillions, of connections, also known as parameters. Think of parameters as the knobs and dials inside the model that it learns to adjust to understand and process information. The more parameters, the more complex patterns the model can potentially learn.
Training data size: You need equally massive data to train these massive models. Foundation models are trained on mind-bogglingly large datasets—we’re talking about scraping almost the entire public internet, vast collections of images, huge libraries of code, and more!
Why does scale matter? It turns out that scale is a game-changer in foundation models. It’s not just about being a little better; it unlocks new capabilities.
Larger models trained on more data are simply better at many tasks. They can understand language more deeply, generate more coherent text, recognize images more accurately, etc. It’s like giving students more textbooks and time to study—they’ll likely learn more! The sheer scale allows these models to ...