LogoAIAny
Icon for item

OpenMementos-228K

A 228,557-example dataset of reasoning traces segmented into blocks with iterative, compressed "memento" summaries so LLMs can learn to manage long context. Includes a training-ready subset and a `full` subset with sentence/block-level annotations for research and SFT.

Introduction

Long chains of thought frequently exceed model context windows and force either very long token windows or brittle heuristics. This dataset reframes the problem: teach models to compress completed reasoning blocks into compact "mementos" and continue reasoning from summaries alone, trading raw token retention for structured, reusable state.

What Sets It Apart
  • Practical scale and format: 228,557 traces across math (54%), science (27%), and code (19%), with a training-ready default split and a full split that exposes sentences, block boundaries, and block summaries for analysis.
  • Memento-first pipeline: sentence splitting → boundary scoring → block segmentation → summary generation → judge-guided refinement (up to two rounds). The released data reports ~6× trace-level compression (from ~10,900 block tokens to ~1,850 memento tokens per trace) and median block-level compression of ~4–6× depending on domain.
  • Research-friendly annotations: each example includes block indices and iteratively refined summaries (in full), enabling re-segmentation, evaluation of summary quality, and experiments with block masking or context eviction during inference.
Who it's for + tradeoffs

Great fit if you want to fine-tune or evaluate models that must reason over very long multi-step traces (SFT for memento-style generation, experiments in context compression, or studying summary quality and iterative refinement). The dataset is already formatted for datasets.load_dataset and common Python toolchains.

Look elsewhere if you need raw, uninterpreted chain-of-thought tokens (this dataset intentionally compresses and evicts block content), if your domain is outside math/science/code, or if you require human-authored gold-standard proofs rather than LLM-judged iterative summaries.

Information

Categories