Formula

General Formulation of a Sequence Model

Diverse NLP problems can be unified under a general sequence model structure, represented by the function o=g(x0,x1,...,xm;θ)\mathbf{o} = g(x_0, x_1, ..., x_m; \theta). In this formula, x0,x1,...,xmx_0, x_1, ..., x_m is the input token sequence, where x0x_0 is a special start-of-sequence symbol (such as s\langle s \rangle or [CLS]\mathbf{[\mathrm{CLS}]}). The function g(;θ)g(\cdot; \theta) (also written as gθ()g_{\theta}(\cdot)) is a neural network defined by parameters θ\theta, and o\mathbf{o} is the model's output. A common shorthand for the output is o=gθ(x0,x1,...,xm)\mathbf{o} = g_{\theta}(x_0, x_1, ..., x_m).

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Foundations of Large Language Models