Foundation Model
A foundation model is a large-scale machine learning model trained on broad data that can be adapted to a wide range of downstream tasks.
The term “foundation model” was introduced by the Stanford Center for Research on Foundation Models (CRFM) in their 2021 report “On the Opportunities and Risks of Foundation Models.” These models are characterized by their massive scale, typically containing billions of parameters, and are trained on vast, diverse datasets often sourced from the internet. This pre-training phase uses self-supervised or semi-supervised learning objectives, such as language modeling or image reconstruction, to capture general patterns, structures, and knowledge from the data.
Once pre-trained, a foundation model serves as a common base that can be fine-tuned or adapted for numerous specific applications with relatively little additional data or computation. For example, a single language foundation model like GPT-3 can be adapted for translation, question answering, summarization, or code generation. This paradigm contrasts with earlier approaches that required training separate models from scratch for each task. The adaptability of foundation models stems from their learned representations, which encode general features that are useful across many domains.
Foundation models have been developed across various modalities, including text (e.g., GPT, BERT), images (e.g., CLIP, DALL-E), and multimodal data. Their emergence has significantly influenced the field of artificial intelligence, enabling breakthroughs in natural language processing, computer vision, and other areas. However, they also raise concerns regarding biases, safety, environmental impact, and the concentration of power due to the substantial resources required for their development.
Why it matters
Foundation models matter because they represent a shift from task-specific AI to general-purpose AI systems that can be reused and adapted efficiently. This reduces the cost and time required to develop high-performing models for new applications, democratizing access to advanced AI capabilities. At the same time, their widespread deployment introduces systemic risks, such as amplifying societal biases or enabling misuse, making their responsible development and governance a critical concern for researchers, policymakers, and industry.
First appeared
Stanford CRFM, 2021 (“On the Opportunities and Risks of Foundation Models”).
Related terms
FAQ
How does it work?
A foundation model is first pre-trained on a large, diverse dataset using self-supervised learning, such as predicting the next word in a sentence or filling in masked parts of an image. This process teaches the model general patterns and representations. The pre-trained model can then be fine-tuned on a smaller, task-specific dataset to adapt it for particular applications, such as sentiment analysis or object detection.
What is the difference between a foundation model and a traditional machine learning model?
Traditional machine learning models are typically designed and trained for a single, specific task using a labeled dataset for that task. In contrast, a foundation model is pre-trained on broad data without a specific task in mind, and then adapted to multiple downstream tasks. This makes foundation models more flexible and reusable, but also more resource-intensive to train.
When should I use a foundation model instead of training my own model?
A foundation model is a good choice when you have limited labeled data for your task, as it can leverage knowledge from its pre-training. It is also beneficial when you need to perform multiple related tasks, since a single foundation model can be adapted for each. However, if your task is highly specialized or requires real-time inference on low-resource devices, a smaller, task-specific model may be more efficient.