Why this matters
Large‑model distillation requires supervision that exposes the teacher's internal reasoning, not just a final summary. This dataset provides exactly that: 1,000 prompts paired with DeepSeek‑V4‑Pro's full reasoning traces (reasoning_content) and final responses, so a student model can be trained on reasoning→answer behavior rather than summary-only outputs.
What Sets It Apart
- Full chain-of-thought as supervision: each sample includes the requested reasoning trace and the final answer, which is the primary signal needed when distilling reasoning behavior into smaller models.
- Compact, reproducible subset: 1,000 JSONL samples drawn from an existing reasoning prompt pool (Jackrong/GLM-5.1-Reasoning-1M-Cleaned), making experiments inexpensive to iterate on (author notes ~ $5.46 cost for collection).
- Clear schema and usage stats: records include prompt, reasoning, response, model tag (deepseek-v4-pro), and token-usage fields, enabling token-aware fine-tuning and diagnostic filtering.
Who Should Use It (and Tradeoffs)
Great fit if you want to: fine-tune or distill a student LLM to replicate chain-of-thought behavior; run small-scale experiments that evaluate whether exposing CoT improves student correctness; or build analysis tools for reasoning traces.
Look elsewhere if you need: large-scale, diverse corpora (this release is 1,000 samples), human-verified ground truth (teacher traces may include mistakes or heuristics), or datasets that guarantee adversarial robustness — this release is explicitly a small distillation-quality dump for quick iteration.
Where It Fits
This dataset occupies the early-stage distillation / research niche: faster and cheaper than collecting proprietary teacher outputs at scale, and uniquely useful compared to provider APIs that only return summaries (e.g., many commercial LLM responses hide internal CoT). Use it for prototyping distillation pipelines, ablation studies on reasoning traces, or debugging student training behavior.
Notes on quality and provenance
Samples are generated by deepseek-v4-pro with reasoning_effort=max and thinking.enabled=true, and prompts were sampled from the train split of Jackrong/GLM-5.1-Reasoning-1M-Cleaned. The author provides basic metadata (created/modified timestamps, counts) and an Apache‑2.0 license. Because traces are model-generated, validate or filter samples before using them as high‑stakes supervision.
