all-MiniLM-L6-v2 Sentence Embedding Model
all-MiniLM-L6-v2 is a public sentence-embedding checkpoint released through the sentence-transformers library. It is built on the nreimers/MiniLM-L6-H384-uncased backbone — a -layer, hidden-size- Transformer produced by MiniLM-style self-attention distillation, with roughly M parameters — and is fine-tuned with an SBERT-style contrastive objective on more than billion sentence pairs aggregated from Reddit comments, S2ORC citation pairs, WikiAnswers duplicates, PAQ, Stack Exchange, MS MARCO, and many additional sources. The model maps a sentence or short paragraph into a -dimensional dense vector suitable for semantic search, clustering, and retrieval; inputs longer than word pieces are truncated. The released checkpoint is the canonical default encoder used in many downstream retrieval pipelines.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls