A machine learning engineer is building a model to classify sentences as either 'question' or 'statement'. They add a special classification token to the beginning of each input sentence before passing it to an encoder. The encoder then produces a final hidden state vector for every token in the input. For the final classification step, which hidden state vector should be used as the representative summary of the entire sentence?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sequence Classification Pipeline using the [CLS] Token Output
Evaluating a Sequence Representation Method
A machine learning engineer is building a model to classify sentences as either 'question' or 'statement'. They add a special classification token to the beginning of each input sentence before passing it to an encoder. The encoder then produces a final hidden state vector for every token in the input. For the final classification step, which hidden state vector should be used as the representative summary of the entire sentence?
Debugging a Sequence Classification Model
In a sequence classification task, the special token prepended to the input is designed so that its initial vector representation, before being processed by the main model, contains a summary of the entire sequence's meaning.
Classification on Sequence Representation