nanochat is a full-stack, minimal codebase for training, fine-tuning, evaluating, and deploying a ChatGPT-like large language model (LLM) from scratch on a single 8xH100 GPU node for under $100.
nanochat is an innovative, open-source project that democratizes the creation of large language models (LLMs) by providing a complete, end-to-end pipeline in a single, clean, and hackable codebase. Developed by Andrej Karpathy, it enables users to build their own ChatGPT-style chatbot without the need for massive computational resources or complex frameworks. The project emphasizes simplicity and accessibility, running entirely on a modest 8xH100 GPU setup at a cost of approximately $100 for training a 1.9 billion parameter model on 38 billion tokens.
The repository includes everything from data preparation to inference and user interaction:
scripts.tok_train.py, ensures efficient text processing.scripts.base_train.py to train a base Transformer model (defined in nanochat.gpt.py) on datasets like FineWeb-Edu.scripts.mid_train.py and scripts.chat_sft.py adapt the model for chat interactions using synthetic or curated data.report.py, covering benchmarks such as CORE, ARC-Challenge, GSM8K, HumanEval, and MMLU, output in a report card (report.md).nanochat.engine.py with KV caching, plus a web UI (ui.html) for ChatGPT-like conversations via scripts.chat_web.py.The flagship speedrun.sh script orchestrates a full run: downloading data, training the tokenizer, pretraining, midtraining, supervised fine-tuning (SFT), and launching the UI—all in about 4 hours on an 8xH100 node costing $24/hour. Users can monitor progress via screen sessions and access the model post-training to generate stories, answer questions, or observe amusing hallucinations typical of smaller models.
The baseline $100 model (depth=20) achieves modest scores, like 0.2219 on CORE, outperforming 2019's GPT-2 but lagging behind modern LLMs like GPT-4. It behaves like a 'kindergartener'—naive, error-prone, and fun to interact with. For better results, users can scale to a $300 tier (depth=26, ~12 hours) or $1000 tier (~41 hours) by adjusting depth, batch size, and data shards in the scripts. Memory management is key; reduce device_batch_size for lower VRAM GPUs to avoid OOM errors, with automatic gradient accumulation compensating for smaller batches.
nanochat supports CPU and MPS (Mac) inference for testing, though training remains GPU-centric. It draws inspiration from nanoGPT and modded-nanoGPT, focusing on metrics-driven development without bloated configurations.
Users can personalize their model through discussions like infusing identity via synthetic data in midtraining/SFT, or add abilities such as letter counting in strawberry task. The codebase is fork-friendly, with only ~8K lines across 45 files, making it ideal for experimentation. Tools like files-to-prompt package the repo for querying with other LLMs, and DeepWiki integration allows natural language questions about the code.
As the capstone for Eureka Labs' LLM101n course, nanochat aims to advance accessible micro-models under $1000 budgets. It's licensed under MIT, with ongoing contributions welcomed—especially for repo management. Acknowledgements include Hugging Face for datasets and Lambda for compute. For research, cite the project as a 2025 GitHub release by Andrej Karpathy.
This project stands out for bridging theory and practice, empowering hobbyists and researchers to train capable LLMs affordably while understanding the full pipeline.