Learn Before
Compositional Reasoning Tasks for LLMs
In the study of Large Language Models (LLMs), compositional reasoning tasks, such as the SCAN benchmark, are specifically designed to evaluate a model's capacity for compositional generalization. These tasks are crucial for testing the depth of an LLM's language understanding and reasoning, and they also help drive the development of more advanced problem decomposition methods.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Compositional Reasoning Tasks for LLMs
Diagnosing a Model's Language Limitation
An NLP model is trained on a dataset of commands. The training data includes 'walk left', 'walk right', 'run left', 'run right', 'jump twice', and 'jump three times'. The model performs perfectly on these commands. However, when tested on the new, unseen command 'jump left', the model fails. What does this failure most likely indicate about the model?
The Challenge of Novelty in Language Models
Learn After
SCAN Benchmark
A research team is designing a task to evaluate a language model's ability to understand and execute novel combinations of familiar instructions. The model will be trained on a set of commands and their corresponding action sequences. Which of the following training and testing splits would provide the most rigorous and direct assessment of the model's compositional reasoning capabilities?
Diagnosing LLM Generalization Failure
Evaluating a Language Model's True Understanding