1Cademy - Structuring a Sample from Input and Output Segments

Learn Before

Input and Output Sequences in SFT

Formula

Structuring a Sample from Input and Output Segments

During fine-tuning, each data sample from the tuning dataset (denoted as $\mathcal{D}_{\mathrm{tune}}$ ) is conceptually divided into two distinct components: an input segment, represented as $\mathbf{x}_{\mathrm{sample}}$ , and an output segment, represented as $\mathbf{y}_{\mathrm{sample}}$ . These segments are then structured as a single sequence for model processing, typically by concatenating them, which is formally expressed as: $\mathrm{sample} = [\mathbf{y}_{\mathrm{sample}}, \mathbf{x}_{\mathrm{sample}}]$ .