Diffusion LLM Part 1: Diffusion Fundamentals -- From DDPM to Score Matching
Forward/Reverse Process, ELBO, Simplified Loss, Score Function -- the mathematical principles of diffusion models explained intuitively.

Diffusion LLM Part 1: Diffusion Fundamentals -- From DDPM to Score Matching
To understand Diffusion-based language models, you first need to understand Diffusion models themselves. In this post, we cover the core principles of Diffusion that have been proven in image generation. There is some math involved, but I have included intuitive explanations alongside the formulas, so you can follow the flow even if the equations feel unfamiliar.
This is the first installment of the Diffusion LLM series. See the Hub post for a series overview.
The Core Idea Behind Diffusion
The idea behind Diffusion models is surprisingly simple.
Related Posts

MiniMax M2.5: Opus-Level Performance at $1 per Hour
MiniMax M2.5 achieves SWE-bench 80.2% using only 10B active parameters from a 230B MoE architecture. 1/20th the cost of Claude Opus with comparable coding performance. Forge RL framework, benchmark analysis, pricing comparison.

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort
How microgpt.py's 15-line backward() works. From high school calculus to chain rule, computation graphs, topological sort, and backpropagation.

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines
A line-by-line dissection of microgpt.py -- a pure Python GPT implementation with zero dependencies. Training, inference, and autograd in 150 lines.