Learn Before
Diagnosing LLM Generalization Failure
Read the following scenario and answer the questions that follow.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
SCAN Benchmark
A research team is designing a task to evaluate a language model's ability to understand and execute novel combinations of familiar instructions. The model will be trained on a set of commands and their corresponding action sequences. Which of the following training and testing splits would provide the most rigorous and direct assessment of the model's compositional reasoning capabilities?
Diagnosing LLM Generalization Failure
Evaluating a Language Model's True Understanding