Learn Before
Debugging a Chatbot Training Process
Based on the common practice of dividing sequences for model training, what is the fundamental error in the engineer's approach to calculating the loss? Explain how this error leads to the observed poor performance.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained on the sequence:
⟨s⟩ Translate to Spanish: The cat sat. El gato se sentó. ⟨/s⟩. To effectively teach the model how to perform the translation, on which part of the sequence should the training loss be calculated?Debugging a Chatbot Training Process
Rationale for Sub-sequence Loss Calculation