Comparison

Data Scale Disparity: Pre-training vs. Fine-tuning

A fundamental distinction between pre-training and fine-tuning lies in the scale of data required. While more fine-tuning data is generally beneficial, the amount needed is orders of magnitude smaller than what is required for pre-training. For instance, fine-tuning can be effectively performed with tens or hundreds of thousands of samples, or even fewer if the data is of high quality. In contrast, pre-training models demands billions or even trillions of tokens, which consequently results in significantly larger computational requirements and longer training times.

0

1

Updated 2026-04-19

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related