LogoAIAny
Icon for item

MathNet v0 — Olympiad Math Reasoning & Retrieval

Provides a 30K+ problem multimodal, multilingual dataset of Olympiad-level math problems with expert solutions and a math-aware retrieval benchmark—includes images, hierarchical topics, provenance from official booklets, and LLM-assisted metadata (v0, CC BY 4.0).

Introduction

MathNet addresses a concrete bottleneck in evaluating and training models for high-quality mathematical reasoning: existing benchmarks are either small, monolingual, or lack real problem provenance. By combining scale (≈30K expert-authored problems), multilingual coverage (17 languages), inline figures, and a curated retrieval benchmark, MathNet creates a single testbed for generative problem solving, math-aware retrieval, and retrieval-augmented reasoning.

What Sets It Apart
  • Scale + provenance: assembled from official competition booklets across 47 countries (1985–2025), yielding 30,676 problems and preserving booklet provenance so evaluations use expert-origin content rather than crowd-sourced copies.
  • Multimodal & multilingual: problems include 7,541 embedded images (5,148 problems with figures) and cover 17 languages, enabling evaluation of vision+language and non-English reasoning capabilities.
  • Retrieval-first benchmark: curated pairs of mathematically equivalent/structurally similar problems enable measuring embedding/retrieval quality (Math-Aware Retrieval) and studying how retrieval quality affects downstream generative solving (RAG). Typical embedding recall is low in v0, highlighting an open challenge.
  • Careful pipeline and release: extraction used an OCR + LLM pipeline with automated checks and human review; v0 is LLM-assisted and intentionally flagged for metadata gaps that will be refined in v1.
  • Reproducible evaluation artifacts: dataset ships as HF parquet with country-configs, a dedicated train/test split used in the paper (MathNet-Solve), and an associated GitHub repo and website for browsing problems.
Who It's For and Tradeoffs

Great fit if you: evaluate or fine-tune LLMs/VLMs on hard mathematical reasoning, develop embedding models for structure-aware retrieval (RAG), or build curricula from topic-stratified, figure-rich Olympiad problems. Look elsewhere if you need contamination-free corpora (v0 warns about possible leakage), fully human-audited metadata (problem_type and final_answer are LLM-assisted in v0), or a small, lightweight dataset—MathNet targets scale and provenance over minimalism.

Where It Fits

MathNet complements smaller math benchmarks (GSM8K, MATH) and prior multimodal contest sets by combining Olympiad difficulty, image grounding, multilingual breadth, and an explicit retrieval benchmark—making it especially useful for research that bridges reasoning, multimodal understanding, and retrieval-augmented methods.

Information

  • Websitehuggingface.co
  • AuthorsShaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba
  • Published date2026/04/23

Categories