Learn Before
Probability Computation with Pre-trained Language Models
Once a pre-trained language model's parameters are optimized (denoted as ), the decoder model, , can be utilized to calculate the probability of a token appearing at any given position within a sequence. Specifically, it computes the conditional probability for the next token based on the preceding context.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Probability Computation with Pre-trained Language Models
A language model is being trained on a large dataset of text. After an initial training iteration, the model's performance is measured on three distinct sequences from the dataset, yielding the following loss values:
- Sequence 1: Loss = 8.4
- Sequence 2: Loss = 2.1
- Sequence 3: Loss = 5.5
Based on the fundamental objective of this training process, which of the following statements most accurately describes the model's overall goal?
Evaluating Model Training Progress
From Single Sequence to Full Dataset
The primary objective of pre-training a language model on a dataset is to find a unique, optimal set of model parameters for each individual text sequence within that dataset.
Pre-training Objective Formula
Learn After
Inference Process with a Fine-Tuned Model
Probability Distribution Formula for an Encoder-Softmax Language Model
A language model has been trained on a large corpus of English text. When given the sentence 'The chef carefully seasoned the soup with a pinch of ____.', which of the following best represents the direct output the model calculates for the blank position?
Evaluating Sentence Probability
Impact of Training Data on Probability