Learn Before
A machine learning team has just finished pre-training a language model using a two-part system. The first, smaller model corrupted text by replacing some words with plausible alternatives. The second, larger model was then trained to identify which words in the text were original and which were replacements. The team's ultimate goal is to use this work to build a system for classifying the sentiment of customer reviews. What is the most effective and standard next step for the team to take?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning team has just finished pre-training a language model using a two-part system. The first, smaller model corrupted text by replacing some words with plausible alternatives. The second, larger model was then trained to identify which words in the text were original and which were replacements. The team's ultimate goal is to use this work to build a system for classifying the sentiment of customer reviews. What is the most effective and standard next step for the team to take?
Impact of ALiBi Bias Scalar on Model Performance
A research team is fine-tuning a language model for a text summarization task. The model uses a positional encoding scheme where a scalar hyperparameter, β, adjusts the strength of a distance-based bias in the attention mechanism. The team experiments with different values for β and records the model's performance on a validation set using the ROUGE score (higher is better). The results are as follows:
β Value ROUGE Score 0.01 0.35 0.1 0.42 1.0 0.38 10.0 0.29 Based on this data, what is the most reasonable conclusion?
Geometric Progression for ALiBi's Scalar per Head