1Cademy - Choosing a Fine-Tuning Strategy for Sequence Summarization

Learn Before

Fine-Tuning LLMs for Context Representation Tasks

Short Answer

Choosing a Fine-Tuning Strategy for Sequence Summarization

A data scientist has a pre-trained language model and wants to adapt it for a new task: classifying legal documents into one of several categories (e.g., 'Contract', 'Pleading', 'Motion'). They consider two different fine-tuning approaches:

Approach A: They add a classification layer that takes the model's final hidden state (corresponding to the last piece of the input text) and train the system to predict the document category.
Approach B: They add a mechanism that averages all the model's output hidden states for the entire document, feeding this single averaged representation into a classification layer to predict the document category.

Which of these two approaches is more fundamentally aligned with the goal of creating a single, comprehensive representation of the entire document? Justify your reasoning by explaining the potential limitation of the approach you did not choose.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related