Learn Before
Compositional Generalization in LLMs
Compositional generalization refers to the capacity of a Large Language Model to understand and generate novel combinations of previously learned components. Specialized benchmarks, such as the SCAN tasks, have been developed to specifically assess this crucial ability in LLMs.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Direct Conclusion Generation with Hidden Reasoning
Single-Run Multi-Step Reasoning
Multi-Run Problem Decomposition for Complex Reasoning
Self-Refinement in LLMs
Predict-then-Verify Approaches in LLM Reasoning
Principle of Generating Longer Reasoning Paths
Modifying Decoding for Longer Reasoning Paths
Multi-Stage Generation for Incremental Reasoning
An engineer is building a system to solve complex logic puzzles. When a puzzle is submitted, the system sends a single, carefully crafted prompt to a large language model. The model's output is a complete, step-by-step explanation of how it solved the puzzle, followed by the final answer, all generated in one response. Which approach to multi-step reasoning does this system exemplify?
Prompting for a Reasoning Process to Mitigate Errors in Complex Tasks
Compositional Generalization in LLMs
Choosing a Reasoning Strategy for a Financial AI
You are designing systems that use a large language model to solve complex problems. Match each system description with the reasoning approach it employs.
Learn After
SCAN Tasks for Evaluating Compositional Generalization
Analyzing a Model's Command Interpretation Failure
A language model is trained on a dataset of simple commands. It successfully learns to execute individual actions like 'walk', 'run', and 'jump'. It also learns to apply the modifier 'twice' to the command 'run', correctly executing 'run twice'. However, when presented with the novel command 'jump twice', the model fails to produce the correct sequence of actions. This failure demonstrates a specific weakness in the model's ability for:
Evaluating Evidence of Generalization
Analyzing Model Performance on Novel Instructions