Learn Before
A model's capability to perform well across a variety of different tasks is formally assessed using the condition: In this expression, what is the most critical characteristic of the set of new instruction-input pairs, denoted by , for a valid evaluation?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Inter-Task Generalization
A language model's ability to generalize to new tasks is evaluated using a set of 5 new instruction-input pairs. The model's performance on each pair is scored on a scale of 0 to 1, yielding the scores: [0.9, 0.8, 0.3, 0.2, 0.7]. According to the formal condition for inter-task generalization, which is defined as the average performance over the new set exceeding a threshold (ε), does this model demonstrate this capability if the threshold is set at ε = 0.6?
A model's capability to perform well across a variety of different tasks is formally assessed using the condition: In this expression, what is the most critical characteristic of the set of new instruction-input pairs, denoted by , for a valid evaluation?