Learn Before
Conditional Random Fields (CRFs)
However, the CRF does not compute a probability for each tag at each time step. Instead, at each time step the CRF computes log-linear functions over a set of relevant features, and these local features are aggregated and normalized to produce a global probability for the whole sequence. In a CRF, the function F maps an entire input sequence X and an entire output sequence Y to a feature vector. Let’s assume we have K features, with a weight wk for each feature : We’ll call these K functions global features, since each one is a property of the entire input sequence X and output sequence Y. We compute them by decomposing into a sum of local features for each position i in Y: This constraint to only depend on the current and previous output tokens and are what characterizes a linear chain CRF. A general CRF allows a feature to make use of any output token, and are thus necessary for tasks in which the decision depend on distant output tokens. General CRFs require more complex inference, and are less commonly used for language processing.
0
1
Tags
Data Science