Definition

TF-IDF Term Weighting

TF-IDF (term frequency–inverse document frequency) is a weighting scheme that assigns to each (term, document) pair the product of two factors. The term-frequency factor tf(t,d)\mathrm{tf}(t,d) counts (or sublinearly transforms) how often term tt occurs in document dd. The inverse-document-frequency factor idf(t)=logNdf(t)\mathrm{idf}(t)=\log\dfrac{N}{\mathrm{df}(t)} down-weights terms that appear in many of the NN documents in the corpus, since such terms discriminate poorly. The resulting weight wt,d=tf(t,d)idf(t)w_{t,d}=\mathrm{tf}(t,d)\cdot\mathrm{idf}(t) is high for terms that are frequent in a document but rare across the corpus.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

Science

Research Paper: Advanced Prompting

Learn After