1Cademy - Formula for Generalization Across Tasks

Learn Before

Two Levels of Generalization in Instruction-Tuned LLMs

Formula

Formula for Generalization Across Tasks

Generalization across tasks occurs when an instruction-fine-tuned model's average performance over all new instruction-input pairs is above a predefined threshold value, $\epsilon$ . This condition is mathematically expressed as:

$\frac{1}{|\mathcal{D}|} \sum_{(\mathbf{c}',\mathbf{z}') \in \mathcal{D}} \mathrm{P}(\mathbf{c}',\mathbf{z}',\mathbf{y}') > \epsilon$

where $\mathcal{D}$ is the set of new instruction-input pairs, $(\mathbf{c}',\mathbf{z}')$ represents a specific new instruction and input from the set, and $\mathbf{y}'$ is the corresponding model output.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Evaluating Inter-Task Generalization
A language model's ability to generalize to new tasks is evaluated using a set of 5 new instruction-input pairs. The model's performance on each pair is scored on a scale of 0 to 1, yielding the scores: [0.9, 0.8, 0.3, 0.2, 0.7]. According to the formal condition for inter-task generalization, which is defined as the average performance over the new set exceeding a threshold (ε), does this model demonstrate this capability if the threshold is set at ε = 0.6?
A model's capability to perform well across a variety of different tasks is formally assessed using the condition: $\frac{1}{|D|} \sum_{(c',z') \in D} P(c', z', y') > \epsilon$ In this expression, what is the most critical characteristic of the set of new instruction-input pairs, denoted by $D$ , for a valid evaluation?

Learn Before

Related

Learn After