Learn Before
Definition

TF-IDF Cosine Similarity

TF-IDF cosine similarity is the composite document-similarity score obtained by (1) representing each document as a vector whose components are TF-IDF weights wt,d=tf(t,d)idf(t)w_{t,d}=\mathrm{tf}(t,d)\cdot\mathrm{idf}(t) and (2) computing the cosine of the angle between two such vectors, sim(d1,d2)=twt,d1wt,d2twt,d12twt,d22\mathrm{sim}(d_1,d_2)=\dfrac{\sum_t w_{t,d_1}\,w_{t,d_2}}{\sqrt{\sum_t w_{t,d_1}^{2}}\,\sqrt{\sum_t w_{t,d_2}^{2}}}. The score lies in [0,1][0,1], equals 11 when the two documents have identical TF-IDF profiles, and equals 00 when they share no terms. Because rare but document-specific terms receive the largest TF-IDF weight and length normalization is built into the cosine, the metric quantifies the extent to which two texts share content-bearing vocabulary independently of how long either text is. It is the standard text-overlap baseline reported alongside ROUGE-L and n-gram Jaccard in stylometric and rewrite-evaluation studies.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

Science

Research Paper: Advanced Prompting

Related