Concept

all-MiniLM-L6-v2 Sentence Embedding Model

all-MiniLM-L6-v2 is a public sentence-embedding checkpoint released through the sentence-transformers library. It is built on the nreimers/MiniLM-L6-H384-uncased backbone — a 66-layer, hidden-size-384384 Transformer produced by MiniLM-style self-attention distillation, with roughly 22.722.7M parameters — and is fine-tuned with an SBERT-style contrastive objective on more than 1.171.17 billion sentence pairs aggregated from Reddit comments, S2ORC citation pairs, WikiAnswers duplicates, PAQ, Stack Exchange, MS MARCO, and many additional sources. The model maps a sentence or short paragraph into a 384384-dimensional dense vector suitable for semantic search, clustering, and retrieval; inputs longer than 256256 word pieces are truncated. The released checkpoint is the canonical default encoder used in many downstream retrieval pipelines.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

Science

Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls