Learn Before
Conditional Random Fields (CRFs)
However, the CRF does not compute a probability for each tag at each time step. Instead, at each time step the CRF computes log-linear functions over a set of relevant features, and these local features are aggregated and normalized to produce a global probability for the whole sequence. In a CRF, the function maps an entire input sequence and an entire output sequence to a feature vector. Let’s assume we have features, with a weight for each feature : We’ll call these functions global features, since each one is a property of the entire input sequence and output sequence . We compute them by decomposing into a sum of local features for each position in : This constraint to only depend on the current and previous output tokens and are what characterizes a linear chain CRF. A general CRF allows a feature to make use of any output token, and are thus necessary for tasks in which the decision depend on distant output tokens. General CRFs require more complex inference, and are less commonly used for language processing.
0
1
Contributors are:
Who are from:
Tags
Data Science
Related
Applications of probabilistic models
Types of probabilistic models
The Deep Learning Approach to Structured Probabilistic Models
The Partition Function (introduction).
Graph Model Structure
Advantages of Structured Modeling
Training and Evaluation of Models with Intractable Partition Functions
Conditional Random Fields (CRFs)