AI Research•February 14, 2026•🇰🇷 한국어

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

How microgpt.py's 15-line backward() works. From high school calculus to chain rule, computation graphs, topological sort, and backpropagation.

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

The backward() function in microgpt.py is 15 lines long. But these 15 lines are a complete implementation of the core algorithm that underpins all of deep learning -- backpropagation.

This post connects "why do we need topological sort?" and "what is the chain rule?" starting from high school calculus all the way to the backward() function in microgpt.py.

The Central Question of Deep Learning

Training a neural network means this:

🔒

Sign in to continue reading

Create a free account to access the full content.

AI Research

MiniMax M2.5: Opus-Level Performance at $1 per Hour

MiniMax M2.5 achieves SWE-bench 80.2% using only 10B active parameters from a 230B MoE architecture. 1/20th the cost of Claude Opus with comparable coding performance. Forge RL framework, benchmark analysis, pricing comparison.

AI Research

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

A line-by-line dissection of microgpt.py -- a pure Python GPT implementation with zero dependencies. Training, inference, and autograd in 150 lines.

AI Research

Diffusion LLM Part 4: LLaDA 2.0 -> 2.1 -- Breaking 100B with MoE + Token Editing

MoE scaling, Token Editing (T2T+M2T), S-Mode/Q-Mode, RL Framework -- how LLaDA 2.X makes diffusion LLMs practical.

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

The Central Question of Deep Learning

Sign in to continue reading

Related Posts

MiniMax M2.5: Opus-Level Performance at $1 per Hour

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

Diffusion LLM Part 4: LLaDA 2.0 -> 2.1 -- Breaking 100B with MoE + Token Editing