LogoAIAny
Icon for item

nanochat

nanochat is a full-stack, minimal codebase for training, fine-tuning, evaluating, and deploying a ChatGPT-like large language model (LLM) from scratch on a single 8xH100 GPU node for under $100.

Introduction

nanochat: The Best ChatGPT That $100 Can Buy

nanochat is an innovative, open-source project that democratizes the creation of large language models (LLMs) by providing a complete, end-to-end pipeline in a single, clean, and hackable codebase. Developed by Andrej Karpathy, it enables users to build their own ChatGPT-style chatbot without the need for massive computational resources or complex frameworks. The project emphasizes simplicity and accessibility, running entirely on a modest 8xH100 GPU setup at a cost of approximately $100 for training a 1.9 billion parameter model on 38 billion tokens.

Core Components and Workflow

The repository includes everything from data preparation to inference and user interaction:

  • Tokenization: A custom Rust-based BPE tokenizer, trained via scripts.tok_train.py, ensures efficient text processing.
  • Pretraining: Uses scripts.base_train.py to train a base Transformer model (defined in nanochat.gpt.py) on datasets like FineWeb-Edu.
  • Midtraining and Fine-Tuning: Scripts like scripts.mid_train.py and scripts.chat_sft.py adapt the model for chat interactions using synthetic or curated data.
  • Evaluation: Comprehensive metrics via report.py, covering benchmarks such as CORE, ARC-Challenge, GSM8K, HumanEval, and MMLU, output in a report card (report.md).
  • Inference and Serving: Efficient engine in nanochat.engine.py with KV caching, plus a web UI (ui.html) for ChatGPT-like conversations via scripts.chat_web.py.

The flagship speedrun.sh script orchestrates a full run: downloading data, training the tokenizer, pretraining, midtraining, supervised fine-tuning (SFT), and launching the UI—all in about 4 hours on an 8xH100 node costing $24/hour. Users can monitor progress via screen sessions and access the model post-training to generate stories, answer questions, or observe amusing hallucinations typical of smaller models.

Model Performance and Scaling

The baseline $100 model (depth=20) achieves modest scores, like 0.2219 on CORE, outperforming 2019's GPT-2 but lagging behind modern LLMs like GPT-4. It behaves like a 'kindergartener'—naive, error-prone, and fun to interact with. For better results, users can scale to a $300 tier (depth=26, ~12 hours) or $1000 tier (~41 hours) by adjusting depth, batch size, and data shards in the scripts. Memory management is key; reduce device_batch_size for lower VRAM GPUs to avoid OOM errors, with automatic gradient accumulation compensating for smaller batches.

nanochat supports CPU and MPS (Mac) inference for testing, though training remains GPU-centric. It draws inspiration from nanoGPT and modded-nanoGPT, focusing on metrics-driven development without bloated configurations.

Customization and Extensions

Users can personalize their model through discussions like infusing identity via synthetic data in midtraining/SFT, or add abilities such as letter counting in strawberry task. The codebase is fork-friendly, with only ~8K lines across 45 files, making it ideal for experimentation. Tools like files-to-prompt package the repo for querying with other LLMs, and DeepWiki integration allows natural language questions about the code.

Community and Future

As the capstone for Eureka Labs' LLM101n course, nanochat aims to advance accessible micro-models under $1000 budgets. It's licensed under MIT, with ongoing contributions welcomed—especially for repo management. Acknowledgements include Hugging Face for datasets and Lambda for compute. For research, cite the project as a 2025 GitHub release by Andrej Karpathy.

This project stands out for bridging theory and practice, empowering hobbyists and researchers to train capable LLMs affordably while understanding the full pipeline.

Information

  • Websitegithub.com
  • AuthorsAndrej Karpathy
  • Published date2025/10/13

More Items