Why this matters Optimizing for an extreme parameter cap (16MB compressed artifact) surfaces methods that rarely win in unconstrained settings: aggressive quantization, novel tokenizers, parameter tying, depth recurrence, and test‑time tricks. The repo packages a reproducible baseline, evaluation harness (tokenizer‑agnostic bits-per-byte on FineWeb), and a public leaderboard aimed at encouraging work on parameter‑efficient model craft. The challenge window ran March 18, 2026 — April 30, 2026 and includes guidance for both leaderboard (10 minutes on 8×H100) and non-record submissions.
What Sets It Apart
- Constraint-first objective: The scoring optimizes compressed model size + compressed code under a 16,000,000 byte cap and measures validation performance by bits-per-byte on a fixed FineWeb validation split — so participants must consider compression and model design jointly, not as afterthoughts.
- Short, verifiable runs: Leaderboard records must reproduce within a 10‑minute training limit on 8×H100 (SXM) which forces systems work and fast convergence techniques rather than only scaling compute.
- Practical reproducibility: Included scripts for local Apple Silicon smoke tests and Runpod/H100 setup, plus explicit instructions for what counts toward the artifact (code bytes + compressed model bytes), lowering the friction for iteration and submission.
- Community leaderboard + compute support: A public leaderboard highlights diverse approaches (recurrence, quantization, GPTQ, test‑time training), and OpenAI sponsored compute credits help lower entry barriers.
Who It's For & Trade-offs
Great fit if you want to explore parameter-efficient LLM design, compression, or model-architecture hacks under strict size and time constraints — ideal for researchers and engineers interested in quantization, compact tokenizers, and fast-converging training recipes. Look elsewhere if your goal is large-scale pretraining research, production-quality large models, or research that requires long multi-hour training runs: the challenge intentionally trades off capacity and dataset scale for extreme compactness and run-time verifiability.
Where It Fits
This repo sits between ML systems engineering (fast convergence, kernel/precision tricks) and model architecture research (recurrence, parameter tying, tokenizer design). Results are most relevant to teams working on on-device LLMs, codec-style compression of models, or those studying neural scaling laws under tight parameter budgets.
