Learn Before
Generating Sequence Representations with a Pre-trained Encoder
A pre-trained sequence encoding model, with its parameters optimized to , transforms an input sequence of tokens, , into a numerical representation, . This output, , is a sequence of real-valued vectors, {h_ix_iHh_i$ as a row: The specific equation for this transformation is defined separately.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fine-Tuning LLMs for Context Representation Tasks
Generating Sequence Representations with a Pre-trained Encoder
Applying a Pre-trained Encoder to Downstream Tasks
Adapting a General Model for a Specific Task
Layer-wise Transformation of Hidden States
A data science team is tasked with creating a model to detect sarcastic sentiment in short online reviews. They start with a large, general-purpose sequence encoding model that was pre-trained on a vast collection of books and web articles. The team then further trains this model using a smaller, labeled dataset of sarcastic and non-sarcastic reviews. What is the most critical change that occurs within the model during this second training phase?
A machine learning engineer wants to adapt a large, pre-trained sequence encoding model to perform a specific text classification task (e.g., identifying spam emails). Arrange the following steps in the correct logical order to describe this adaptation process.
Learn After
Equation for Generating Sequence Representations
Probability Distribution Formula for an Encoder-Softmax Language Model
A pre-trained sequence encoding model processes the input sentence 'The quick fox'. After tokenization, the input is a sequence of 3 tokens: {'The', 'quick', 'fox'}. The model then generates a numerical representation, H, which is a matrix of real-valued vectors. Based on the typical function of such a model, which statement best describes the output matrix H?
Contextual Representation Analysis
Consider a pre-trained sequence encoding model that generates a numerical representation H = {h_0, h_1, ..., h_m} for an input sequence of tokens x = {x_0, x_1, ..., x_m}. The vector h_i representing the token x_i will be the same regardless of the other tokens that appear alongside it in the input sequence.