AI Research•February 12, 2026•🇰🇷 한국어

Diffusion LLM Part 2: Discrete Diffusion -- How to Add Noise to Text

D3PM, Transition Matrices, Absorbing States, MDLM -- how to bring diffusion from continuous space to discrete tokens.

Diffusion LLM Part 2: Discrete Diffusion -- How Do You Add Noise to Text?

In Part 1, we explored the principles of Diffusion operating in continuous space. Adding Gaussian noise to image pixels is natural, but text tokens are discrete data. What happens if you add noise of 0.3 to "hello"?

In this post, we cover how to bring Diffusion into discrete space. Starting from D3PM's Transition Matrix and arriving at MDLM's Masked Diffusion -- the direct ancestors of LLaDA.

D3PM: Diffusion in Discrete Space

Austin et al. (2021) raise a fundamental question in D3PM (Discrete Denoising Diffusion Probabilistic Models): how do you define a forward process for discrete data where you can't add Gaussian noise?

🔒

Sign in to continue reading

Create a free account to access the full content.

AI Research

MiniMax M2.5: Opus-Level Performance at $1 per Hour

MiniMax M2.5 achieves SWE-bench 80.2% using only 10B active parameters from a 230B MoE architecture. 1/20th the cost of Claude Opus with comparable coding performance. Forge RL framework, benchmark analysis, pricing comparison.

AI Research

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

How microgpt.py's 15-line backward() works. From high school calculus to chain rule, computation graphs, topological sort, and backpropagation.

AI Research

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines

A line-by-line dissection of microgpt.py -- a pure Python GPT implementation with zero dependencies. Training, inference, and autograd in 150 lines.

Diffusion LLM Part 2: Discrete Diffusion -- How Do You Add Noise to Text?

D3PM: Diffusion in Discrete Space

Sign in to continue reading

Related Posts

MiniMax M2.5: Opus-Level Performance at $1 per Hour

Backpropagation From Scratch: Chain Rule, Computation Graphs, and Topological Sort

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines