Learn Before
Concept
Problem with Raw Frequency
Raw frequency is not the best measure of association between words, as it is very skewed and not very discriminative.
- We want to learn words that occur nearby frequently, such as "digital" and "information", as being more important than words that only appear once or twice.
- On the other hand, we also want to indicate words that are too frequent, such as "the" and "good", as unimportant.
Common solutions are the tf-idf algorithm and the PPMI algorithm.
0
1
Updated 2021-10-10
Tags
Data Science