Short Answer

Choosing a Positional Information Strategy

A development team is building a new language model with a very large, diverse dataset. They have a strict budget for computation, limiting the total training time and the number of trainable parameters. The model must also be able to generalize well to input sequences longer than any seen during training. Would a fixed, rule-based method for incorporating relative positional information be a more suitable choice for this project than a method that learns this information from the data? Justify your answer by explaining one key advantage of the fixed method in this specific context.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science