1Cademy - Two Levels of Generalization in Instruction-Tuned LLMs

Test Set A: A new, unseen collection of English legal documents to be translated into French.
Test Set B: A collection of diverse tasks, such as writing Python code, composing poetry, and summarizing news articles.

Learn Before

Generalization Challenges in Instruction Fine-Tuning

Classification

Two Levels of Generalization in Instruction-Tuned LLMs

The generalization capability of an instruction-fine-tuned LLM can be assessed at two distinct levels. The first is intra-task generalization, which is the model's ability to generate correct outputs for new inputs within a single task. The second, more complex level is inter-task generalization, which refers to the model's capacity to perform accurately across a variety of different tasks as defined by diverse instructions.

Updated 2026-05-01

Contributors are: