Learn Before
Concept

Problem with Raw Frequency

Raw frequency is not the best measure of association between words, as it is very skewed and not very discriminative.

  • We want to learn words that occur nearby frequently, such as "digital" and "information", as being more important than words that only appear once or twice.
  • On the other hand, we also want to indicate words that are too frequent, such as "the" and "good", as unimportant.

Common solutions are the tf-idf algorithm and the PPMI algorithm.

0

1

Updated 2021-10-10

Tags

Data Science