1Cademy - Conditional Random Fields (CRFs)

Learn Before

Probabilistic models

Theory

Conditional Random Fields (CRFs)

$Yˆ = argmax_{Y∈y} P(Y|X)$ However, the CRF does not compute a probability for each tag at each time step. Instead, at each time step the CRF computes log-linear functions over a set of relevant features, and these local features are aggregated and normalized to produce a global probability for the whole sequence. In a CRF, the function $F$ maps an entire input sequence $X$ and an entire output sequence $Y$ to a feature vector. Let’s assume we have $K$ features, with a weight $w_{k}$ for each feature $F_{k}$ : $p(Y|X) = \frac{exp(\sum_{k=1}^K w_{k}F_{k}(X,Y))}{\sum_{Y'∈y} exp(\sum_{k=1}^Kw_{k}F_{k}(X,Y'))}$ We’ll call these $K$ functions $F_{k}(X,Y)$ global features, since each one is a property of the entire input sequence $X$ and output sequence $Y$ . We compute them by decomposing into a sum of local features for each position $i$ in $Y$ : F_{k}(X,Y) =sum_{i=1}^n f_{k}(y_{i−1}, y_{i},X,i) This constraint to only depend on the current and previous output tokens $y_{i}$ and $y_{i−1}$ are what characterizes a linear chain CRF. A general CRF allows a feature to make use of any output token, and are thus necessary for tasks in which the decision depend on distant output tokens. General CRFs require more complex inference, and are less commonly used for language processing.

0

1

Updated 2026-05-10

Contributors are:

Who are from:

University of Michigan - Ann Arbor

Learn Before

Related

Learn After