Learn Before
SCAN Tasks for Evaluating Compositional Generalization
The SCAN (Simplified versions of the CommAI Navigation tasks) benchmark is a set of tasks used to measure an LLM's ability for compositional generalization. These tasks require the model to translate natural language instructions into corresponding sequences of actions.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
SCAN Tasks for Evaluating Compositional Generalization
Analyzing a Model's Command Interpretation Failure
A language model is trained on a dataset of simple commands. It successfully learns to execute individual actions like 'walk', 'run', and 'jump'. It also learns to apply the modifier 'twice' to the command 'run', correctly executing 'run twice'. However, when presented with the novel command 'jump twice', the model fails to produce the correct sequence of actions. This failure demonstrates a specific weakness in the model's ability for:
Evaluating Evidence of Generalization
Analyzing Model Performance on Novel Instructions