Definition of Intra-Task Generalization
Intra-task generalization refers to a model's ability to perform well on new inputs for a specific task, which is indicated by a particular instruction . A model is considered to generalize within this task if its average performance on these new, unseen inputs surpasses a predetermined threshold value, denoted as .

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
LLM Generalization Evaluation
Definition of Intra-Task Generalization
Formal Definition of Intra-Task Generalization
An AI team fine-tunes a language model exclusively on a dataset for a single task: translating English legal documents into French. The model is then evaluated on two test sets.
- Test Set A: A new, unseen collection of English legal documents to be translated into French.
- Test Set B: A collection of diverse tasks, such as writing Python code, composing poetry, and summarizing news articles.
The model performs very well on Test Set A but performs poorly on Test Set B. What does this evaluation reveal about the model's generalization abilities?
Analyzing LLM Performance
Formula for Generalization Across Tasks
Learn After
An instruction-tuned model is evaluated on a specific task: summarizing legal documents. The goal is to achieve intra-task generalization, which is formally defined as the average performance on a set of new inputs (Z) exceeding a predefined threshold (ε).
The evaluation uses a set of 100 new legal documents the model has never seen before. The performance threshold (ε) is set to 0.85.
- Model A correctly summarizes 92 of the 100 new documents.
- Model B correctly summarizes 81 of the 100 new documents.
Based on the formal definition, which of the following conclusions is correct?
Match each variable from the formal definition of intra-task generalization with its correct description. The formula is given as:
Evaluating Model Generalization in Customer Service