Case Study

Impact of ALiBi Bias Scalar on Model Performance

A research team is fine-tuning a language model for a text summarization task. The model's attention mechanism includes a bias term, β * (j - i), added to the attention scores, where (j - i) is the distance between tokens. The team trains two versions of the model with different settings for the scalar β and observes distinct behaviors on the validation set. Analyze the likely cause of each model's performance issues.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science