nanoGPT: A Simple and Fast GPT Training Repository
nanoGPT is an open-source project by Andrej Karpathy that shows how to train GPT models with minimal, readable code. It is a streamlined successor to minGPT, removing unnecessary complexity and focusing on the core ideas behind GPT training.
The entire project is intentionally small. Most of the logic lives in just two files:
train.py, which handles the training loopmodel.py, which defines the GPT model
Together, they implement all the key components of a GPT, including transformers, multi-head attention, and tokenization. You can train a model from scratch on your own data or fine-tune pretrained GPT-2 weights with only small changes.
nanoGPT works on a wide range of hardware. Beginners can run small examples on a CPU, MacBook, or single GPU, while advanced users can scale up to multi-GPU and distributed training to reproduce GPT-2-level results. Built on PyTorch and compatible with PyTorch 2.0 optimizations, nanoGPT is ideal for learning, experimentation, and rapid prototyping.
