LogoAIAny
Icon for item

OpenAI/parameter-golf

A challenge repository for training the best language model that fits inside a 16,000,000‑byte (16MB) submission artifact; provides baseline training code, FineWeb bpb evaluation, a public leaderboard, and compute-grant instructions for short 8×H100 runs.

Introduction

Why this matters Optimizing for an extreme parameter cap (16MB compressed artifact) surfaces methods that rarely win in unconstrained settings: aggressive quantization, novel tokenizers, parameter tying, depth recurrence, and test‑time tricks. The repo packages a reproducible baseline, evaluation harness (tokenizer‑agnostic bits-per-byte on FineWeb), and a public leaderboard aimed at encouraging work on parameter‑efficient model craft. The challenge window ran March 18, 2026 — April 30, 2026 and includes guidance for both leaderboard (10 minutes on 8×H100) and non-record submissions.

What Sets It Apart
  • Constraint-first objective: The scoring optimizes compressed model size + compressed code under a 16,000,000 byte cap and measures validation performance by bits-per-byte on a fixed FineWeb validation split — so participants must consider compression and model design jointly, not as afterthoughts.
  • Short, verifiable runs: Leaderboard records must reproduce within a 10‑minute training limit on 8×H100 (SXM) which forces systems work and fast convergence techniques rather than only scaling compute.
  • Practical reproducibility: Included scripts for local Apple Silicon smoke tests and Runpod/H100 setup, plus explicit instructions for what counts toward the artifact (code bytes + compressed model bytes), lowering the friction for iteration and submission.
  • Community leaderboard + compute support: A public leaderboard highlights diverse approaches (recurrence, quantization, GPTQ, test‑time training), and OpenAI sponsored compute credits help lower entry barriers.
Who It's For & Trade-offs

Great fit if you want to explore parameter-efficient LLM design, compression, or model-architecture hacks under strict size and time constraints — ideal for researchers and engineers interested in quantization, compact tokenizers, and fast-converging training recipes. Look elsewhere if your goal is large-scale pretraining research, production-quality large models, or research that requires long multi-hour training runs: the challenge intentionally trades off capacity and dataset scale for extreme compactness and run-time verifiability.

Where It Fits

This repo sits between ML systems engineering (fast convergence, kernel/precision tricks) and model architecture research (recurrence, parameter tying, tokenizer design). Results are most relevant to teams working on on-device LLMs, codec-style compression of models, or those studying neural scaling laws under tight parameter budgets.

Information

  • Websitegithub.com
  • AuthorsOpenAI
  • Published date2026/02/09