Learn Before
Schematic Example of a Sentence-Pair Classification Pipeline
The pipeline for a sentence-pair classification task, such as determining if one sentence follows another, is illustrated by the following schematic. The process begins with an input sequence formatted with special tokens, such as [CLS] It is raining . [SEP] I need an umbrella . [SEP]. Here, [CLS] serves as the start symbol (), and [SEP] separates the two sentences. Following the standard Transformer encoding procedure, each token is first converted into a corresponding embedding (). This sequence of embeddings, {e_0, ..., e_m}, is then fed into an encoder, which produces a sequence of contextualized output vectors, {h_0, ..., h_m}. Because the hidden state is generally used to represent the entire sequence, a Softmax layer is placed on top of it to perform the final binary classification.
token: [CLS] It is raining . [SEP] I need an umbrella . [SEP] embedding: e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ Encoder ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ encoding: h0 h1 h2 h3 h4 e5 h6 h7 h8 h9 h10 h11 ↓ Softmax ↓ Is Next or Not?

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Schematic Example of a Sentence-Pair Classification Pipeline
In a text-pair classification pipeline, two texts are formatted into a single input sequence, which includes a special token at the beginning designated for classification. After a Transformer encoder processes this sequence and generates a contextualized hidden state for every token, a single vector must be selected to represent the entire text pair for the final prediction. Which of the following best explains the standard method for selecting this vector and the reasoning behind it?
A language model is tasked with determining if two sentences are semantically equivalent. Arrange the following steps to correctly represent the end-to-end computational pipeline, from preparing the input to generating the final prediction.
Applying the Text-Pair Classification Pipeline
Learn After
In a pipeline designed for sentence-pair classification, an input like
[CLS] sentence A [SEP] sentence B [SEP]is processed by an encoder to produce a sequence of contextualized encodings, one for each token. For the final classification, only the encoding corresponding to the[CLS]token is passed to a Softmax layer. What is the most accurate reason for selecting this specific encoding to represent the entire input?A language model is being fine-tuned for a sentence-pair classification task (e.g., determining if one sentence is an entailment of another). Arrange the following steps into the correct sequence that describes the data processing pipeline, from the initial input to the final prediction.
Debugging a Sentence-Pair Classification Model