Learn Before
Bigram Jaccard Similarity
Bigram Jaccard similarity applies the Jaccard coefficient to the bigram (2-gram) representations of two texts. Given a text , let be the set of distinct ordered token bigrams obtained by sliding a window of length over its tokens; for tokens , . For two texts and the bigram Jaccard similarity is J_{2}(X,Y)=dfrac{|B(X)cap B(Y)|}{|B(X)cup B(Y)|}, taking values in [0,1]. Compared with unigram (bag-of-words) Jaccard, the bigram variant rewards short-range word-order agreement: two texts must reuse the same adjacent-word pairs, not just the same vocabulary, to score highly. It is reported as a lexical-overlap baseline alongside TF-IDF cosine and ROUGE-L when comparing paired drafts.
0
1
Tags
Science
Research Paper: Advanced Prompting