Definition

Cosine Similarity Between Document Vectors

Cosine similarity measures the angular closeness of two vectors irrespective of their magnitudes. For document vectors u\vec{u} and v\vec{v} it is defined as cos(u,v)=uvu2v2=tutu2vtv2\cos(\vec{u},\vec{v})=\dfrac{\vec{u}\cdot\vec{v}}{\|\vec{u}\|_2\,\|\vec{v}\|_2}=\sum_t \dfrac{u_t}{\|\vec{u}\|_2}\,\dfrac{v_t}{\|\vec{v}\|_2}. The value lies in [0,1][0,1] for non-negative term-weight vectors, with 11 when the two documents have identical term-weight distributions (up to scale) and 00 when they share no terms. Length normalization makes the score insensitive to absolute document length, so a short and a long document with the same relative term emphasis are scored as highly similar.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

Science

Research Paper: Advanced Prompting