Short Answer

Choosing a Fine-Tuning Strategy for Sequence Summarization

A data scientist has a pre-trained language model and wants to adapt it for a new task: classifying legal documents into one of several categories (e.g., 'Contract', 'Pleading', 'Motion'). They consider two different fine-tuning approaches:

  1. Approach A: They add a classification layer that takes the model's final hidden state (corresponding to the last piece of the input text) and train the system to predict the document category.
  2. Approach B: They add a mechanism that averages all the model's output hidden states for the entire document, feeding this single averaged representation into a classification layer to predict the document category.

Which of these two approaches is more fundamentally aligned with the goal of creating a single, comprehensive representation of the entire document? Justify your reasoning by explaining the potential limitation of the approach you did not choose.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science