AIAny - LLMs-from-scratch

LLMs-from-scratch — Detailed introduction

LLMs-from-scratch is the official code repository that accompanies the book "Build A Large Language Model (From Scratch)" by Sebastian Raschka. Its primary goal is educational: to teach how modern GPT-like large language models (LLMs) work by building small-but-functional implementations from the ground up using PyTorch. The repository walks readers through the entire pipeline — from basic data handling and tokenization to attention mechanisms, implementing a GPT model, pretraining on unlabeled data, and multiple finetuning strategies.

Key contents and features

Chapter-aligned code: Each book chapter has a dedicated folder with a main notebook and supplementary examples (e.g., ch02 for data and tokenizers, ch03 for attention, ch04 for GPT implementation, ch05 for pretraining, ch06–ch07 for various finetuning workflows).
Educational focus: Code is written for clarity and learning; many notebooks include exercises and solutions (Appendix C) to reinforce concepts.
Practical recipes: Examples include building BPE tokenizers from scratch, multi-head attention implementations, KV cache, FLOPs analysis, memory-efficient weight loading, and performance tips for training LLMs in PyTorch.
Pretraining and finetuning: The repo contains scripts and notebooks for pretraining small models and finetuning for classification, instruction-following, and preference-based alignment (including DPO examples and LoRA-based parameter-efficient finetuning).
Bonus and extension material: Numerous optional notebooks and folders provide additional experiments and conversions (e.g., Llama/Qwen/Gemma/Olmo-from-scratch examples, dataset utilities, UI examples, user-interface code for interacting with trained models).
Companion resources: A 17-hour+ video course is provided as a code-along companion, and the author published a follow-up book/project "Build A Reasoning Model (From Scratch)" that focuses on reasoning, distillation, and reinforcement learning methods for improving model reasoning.

Intended audience and use cases

Learners who want an in-depth, hands-on understanding of how LLMs are built and trained.
Researchers or engineers who want minimal, from-scratch PyTorch reference implementations of key LLM components.
Educators using the book and course for teaching concepts in model architecture, tokenization, pretraining, and finetuning.

Practical notes

Hardware: Examples are designed to run on conventional laptops for small models and will use GPUs automatically when available; some bonus material discusses multi-GPU and DDP setups.
Get started: Clone the repository, follow the setup/README for environment setup, then open the chapter notebooks to follow the step-by-step implementations.
Citation & provenance: The repository is explicitly tied to the Manning-published book (2024) and provides citation information in the README.

Why it matters

By offering clear, well-documented from-scratch implementations, LLMs-from-scratch lowers the barrier to understanding complex transformer-based models. Rather than relying on opaque libraries, readers can inspect and modify core components, which aids learning, experimentation, and teaching.

LLMs-from-scratch

Introduction

LLMs-from-scratch — Detailed introduction

Key contents and features

Intended audience and use cases

Practical notes

Why it matters

Information

Categories

Tags

More Items

Anthropic's Interactive Prompt Engineering Tutorial

openai-cookbook

Hands-On Large Language Models