nanoGPT

nanoGPT is the simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of minGPT that prioritizes practicality over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training.

Visit Website

Introduction

nanoGPT: A Simple and Fast GPT Training Repository

nanoGPT is an open-source project by Andrej Karpathy that shows how to train GPT models with minimal, readable code. It is a streamlined successor to minGPT, removing unnecessary complexity and focusing on the core ideas behind GPT training.

The entire project is intentionally small. Most of the logic lives in just two files:

train.py, which handles the training loop
model.py, which defines the GPT model

Together, they implement all the key components of a GPT, including transformers, multi-head attention, and tokenization. You can train a model from scratch on your own data or fine-tune pretrained GPT-2 weights with only small changes.

nanoGPT works on a wide range of hardware. Beginners can run small examples on a CPU, MacBook, or single GPU, while advanced users can scale up to multi-GPU and distributed training to reproduce GPT-2-level results. Built on PyTorch and compatible with PyTorch 2.0 optimizations, nanoGPT is ideal for learning, experimentation, and rapid prototyping.

Back

Information

Websitegithub.com
AuthorsAndrej Karpathy
Published date2022/12/29

More Items

Grok-1

2024

xai-org

Grok-1 is an open release from xai-org containing JAX example code for loading and running the Grok-1 open-weights model (a 314B-parameter Mixture-of-Experts LLM). The repository includes instructions for downloading model checkpoints (magnet link and Hugging Face Hub), example run scripts, model specs, and notes about hardware requirements and license.

grok xai foundation-model LLM huggingface+4

Tianshou

2018

Jiayi Weng, Huayu Chen +8

Tianshou is a high-performance, modular deep reinforcement learning library built on pure PyTorch. It offers both high-level APIs for easy application and detailed procedural APIs for algorithm development. Tianshou supports online/offline RL, multi-agent setups, many standard algorithms (DQN, PPO, SAC, TD3, CQL, etc.), vectorized environments, EnvPool integration, recurrent models, multi-GPU training, and logging integrations (TensorBoard, W&B). It emphasizes software quality and reproducible results.

RL ai-library github ai-train ai-framework+1

NautilusTrader

2018

Nautech Systems Pty Ltd

NautilusTrader is an open-source, high-performance event-driven algorithmic trading platform and backtester by Nautech Systems. Its Rust-based core with Python bindings provides parity between research/backtest and production/live deployments, supports multi-venue and multi-asset strategies, advanced order types, optional high-precision numeric modes, and is fast enough to be used to train AI trading agents (RL/ES).

mlops ai-train ai-development ai-library github+4