Learn Before
ColBERTv2 Late-Interaction Retriever
ColBERTv2 is a neural late-interaction retriever introduced by Santhanam et al. (NAACL 2022) that improves on ColBERT along two axes. First, it uses residual compression of the per-token document embeddings: each embedding is approximated by the nearest centroid from a learned codebook plus a low-bit quantized residual, shrinking the index footprint of late-interaction models by roughly with negligible quality loss. Second, it uses a denoised, distillation-based supervision pipeline that combines a cross-encoder teacher with hard negatives mined from a strong retriever to train the bi-encoder backbone. At query time, scoring still follows the MaxSim late-interaction rule over the (decompressed) multi-vector document representations, giving state-of-the-art zero-shot and in-domain retrieval quality on benchmarks such as MS MARCO and BEIR.
0
1
Tags
Science