Learn Before
Formula

TF-IDF Document Scoring Formula

The TF-IDF version of the document scoring formula computes the cosine similarity between the TF-IDF vector of a query qq and the TF-IDF vector of a document dd:

score(q,d)=tqtfidf(t,q)qiqtfidf2(qi,q)tfidf(t,d)didtfidf2(di,d)score(q,d) = \sum_{t \in q} \frac{tf-idf(t,q)}{\sqrt{\sum_{q_i \in q} tf-idf^2(q_i,q)}} \cdot \frac{tf-idf(t,d)}{\sqrt{\sum_{d_i \in d} tf-idf^2(d_i,d)}}

Where:

  • tt is a term present in the query qq and the document dd.
  • tfidf(t,q)tf-idf(t,q) is the TF-IDF weight of term tt in query qq.
  • tfidf(t,d)tf-idf(t,d) is the TF-IDF weight of term tt in document dd.
  • The denominators represent the Euclidean (L2L_2) norms of the query and document vectors, respectively, which normalize the score to be independent of document and query lengths.

0

1

Updated 2026-06-13

Tags

Data Science