Choosing a Positional Bias Strategy for a Low-Resource Task
A research team is building a language model for a niche, low-resource task: analyzing 18th-century legal documents. The dataset is small and the team has a very limited computational budget for training. They are considering two approaches for incorporating relative word positions into the model's attention mechanism.
- Approach A: Learn the positional biases as part of the model's parameters directly from the small dataset.
- Approach B: Use a pre-defined, fixed set of positional biases based on a general rule about word distance, which does not require learning additional parameters.
Which approach would you recommend for this project? Justify your decision by evaluating the primary trade-off between these two approaches in the context of the project's specific constraints.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Choosing a Positional Bias Strategy for a Low-Resource Task
Selecting a Positional Bias Strategy for a Low-Data Scenario
A research team is developing a language model for a highly specialized domain with a very large, domain-specific training dataset. They hypothesize that the relationships between words in this domain follow unique, non-linear patterns that are not captured by simple distance metrics. Which implementation of relative positional biases would be most suitable for this project, and what is the primary reason?