LogoAIAny
Icon for item

nanoGPT

nanoGPT is the simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of minGPT that prioritizes practicality over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training.

Introduction

nanoGPT: A Simple and Fast GPT Training Repository

nanoGPT is an open-source project by Andrej Karpathy that shows how to train GPT models with minimal, readable code. It is a streamlined successor to minGPT, removing unnecessary complexity and focusing on the core ideas behind GPT training.

The entire project is intentionally small. Most of the logic lives in just two files:

  • train.py, which handles the training loop
  • model.py, which defines the GPT model

Together, they implement all the key components of a GPT, including transformers, multi-head attention, and tokenization. You can train a model from scratch on your own data or fine-tune pretrained GPT-2 weights with only small changes.

nanoGPT works on a wide range of hardware. Beginners can run small examples on a CPU, MacBook, or single GPU, while advanced users can scale up to multi-GPU and distributed training to reproduce GPT-2-level results. Built on PyTorch and compatible with PyTorch 2.0 optimizations, nanoGPT is ideal for learning, experimentation, and rapid prototyping.

Information

  • Websitegithub.com
  • AuthorsAndrej Karpathy
  • Published date2022/12/29

Categories

More Items