Learn Before
When training a model that groups various token-to-token offsets into a limited number of 'buckets' to learn relative positional information, continually increasing the number of buckets is a reliable strategy for improving the model's generalization performance on unseen data.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analyzing the Impact of Positional Bucket Size on Model Behavior
A machine learning engineer is training a T5-style model and observes that its performance on the training dataset is excellent, but its performance on a held-out validation dataset is poor. This suggests the model is overfitting. Based on the role of positional bias buckets as a regularization technique, which of the following actions would be the most appropriate first step to address this issue?
When training a model that groups various token-to-token offsets into a limited number of 'buckets' to learn relative positional information, continually increasing the number of buckets is a reliable strategy for improving the model's generalization performance on unseen data.