Learn Before
Definition
Bigram Jaccard Similarity
Bigram Jaccard similarity applies the Jaccard coefficient to the bigram (2-gram) representations of two texts. Given a text , let be the set of distinct ordered token bigrams obtained by sliding a window of length over its tokens; for tokens , . For two texts and the bigram Jaccard similarity is , taking values in . Compared with unigram (bag-of-words) Jaccard, the bigram variant rewards short-range word-order agreement: two texts must reuse the same adjacent-word pairs, not just the same vocabulary, to score highly. It is reported as a lexical-overlap baseline alongside TF-IDF cosine and ROUGE-L when comparing paired drafts.
0
1
Updated 2026-05-16
Tags
Science
Research Paper: Advanced Prompting