Diagnosing a Language Model's Performance Issue
A language model is being developed for a machine translation task. During testing, it is observed that while the model generates grammatically plausible sentences, it frequently scrambles the word order, leading to nonsensical translations, especially for longer input sentences. The engineering team has confirmed that the attention weights are being calculated using the formula shown in the case details. Based on an analysis of this formula, identify the most probable cause for the model's failure to preserve correct word order and explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Causal Attention
In a sequence processing model, the unnormalized attention score between a query at position
iand a key at positionjis calculated using the formula:Score(i, j) = (q_i ⋅ k_j + PE(i, j)) / √d. What is the primary function of thePE(i, j)term in this calculation?Analyzing Components of an Attention Score Formula
Diagnosing a Language Model's Performance Issue
Interpretation of Positional Bias as a Distance Penalty