Learn Before
SCAN Benchmark
The SCAN (Simplified versions of the CommAI Navigation tasks) benchmark is a set of tasks created to test a Large Language Model's capacity for compositional generalization. These tasks require the model to translate natural language instructions into a corresponding sequence of actions.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Related
SCAN Benchmark
A research team is designing a task to evaluate a language model's ability to understand and execute novel combinations of familiar instructions. The model will be trained on a set of commands and their corresponding action sequences. Which of the following training and testing splits would provide the most rigorous and direct assessment of the model's compositional reasoning capabilities?
Diagnosing LLM Generalization Failure
Evaluating a Language Model's True Understanding