1Cademy - Needle-in-a-Haystack and Passkey Retrieval Tasks

Learn Before

Synthetic Tasks for Long-Context LLM Evaluation

Example

Needle-in-a-Haystack and Passkey Retrieval Tasks

The 'needle-in-a-haystack' and passkey retrieval tasks are synthetic evaluation methods that assess an LLM's ability to retrieve information from long contexts. The model is tasked with identifying and extracting a small, relevant piece of information that is intentionally hidden within a large volume of irrelevant text. The core assumption tested is that a model with effective long-context memory can remember details from early in the text while processing subsequent information, enabling it to locate sparse details.

Updated 2026-04-29

Contributors are: