Learn Before
Evaluating Evidence of Generalization
A language model is trained extensively on two distinct types of commands: (1) navigation commands like 'go to the blue circle' and (2) action commands like 'get the triangle'. After training, it is given the novel command 'go to the green star and get the pyramid' and executes it perfectly. A researcher claims this single successful execution is definitive proof of the model's strong ability to generalize by combining known concepts. Briefly explain why this conclusion might be premature.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
SCAN Tasks for Evaluating Compositional Generalization
Analyzing a Model's Command Interpretation Failure
A language model is trained on a dataset of simple commands. It successfully learns to execute individual actions like 'walk', 'run', and 'jump'. It also learns to apply the modifier 'twice' to the command 'run', correctly executing 'run twice'. However, when presented with the novel command 'jump twice', the model fails to produce the correct sequence of actions. This failure demonstrates a specific weakness in the model's ability for:
Evaluating Evidence of Generalization
Analyzing Model Performance on Novel Instructions