Models & Algorithms

SDE vs ODE: Mathematical Foundations of Score-based Diffusion

Stochastic vs Deterministic. A deep dive into Score-based SDEs and Probability Flow ODEs, the theoretical foundations of DDPM and DDIM.

SDE vs ODE: Mathematical Foundations of Score-based Diffusion

SDE vs ODE: Mathematical Foundations of Score-based Diffusion

Stochastic vs Deterministic. Same distribution, different paths.

TL;DR

  • SDE (Stochastic DE): Probabilistic paths with noise, theoretical basis of DDPM
  • ODE (Ordinary DE): Deterministic paths, basis of DDIM and Flow Matching
  • Probability Flow ODE: An ODE with the same marginal distribution as SDE
  • Key Difference: SDE = more diversity, less speed; ODE = less diversity, more speed

1. Why Differential Equations?

The Essence of Diffusion

Diffusion models are transformations between two distributions:

  • Forward: Data $p_{\text{data}}$ → Noise $\mathcal{N}(0, I)$
  • Reverse: Noise $\mathcal{N}(0, I)$ → Data $p_{\text{data}}$

Modeling this transformation in continuous time gives us differential equations.

Discrete vs Continuous

DDPM (discrete):

xt1=1αt(xt1αt1αˉtϵθ(xt,t))+σtzx_{t-1} = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}}\epsilon_\theta(x_t, t)\right) + \sigma_t z

Continuous-time SDE:

dx=f(x,t)dt+g(t)dwdx = f(x, t)dt + g(t)dw

The continuous-time view is more flexible and enables various sampler designs.

2. Forward SDE: From Data to Noise

Variance Preserving SDE (VP-SDE)

The continuous SDE corresponding to DDPM:

dx=12β(t)xdt+β(t)dwdx = -\frac{1}{2}\beta(t)x \, dt + \sqrt{\beta(t)} \, dw

Where:

  • $\beta(t)$: noise schedule (noise intensity over time)
  • $dw$: Wiener process (Brownian motion)

Variance Exploding SDE (VE-SDE)

The SDE corresponding to SMLD/NCSN:

dx=d[σ2(t)]dtdwdx = \sqrt{\frac{d[\sigma^2(t)]}{dt}} \, dw

Where $\sigma(t)$ is the noise scale increasing over time.

Solution of Forward Process

For VP-SDE, the distribution at time $t$ is:

xt=αˉtx0+1αˉtϵx_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon

Where $\bar{\alpha}_t = e^{-\int_0^t \beta(s)ds}$

This exactly matches DDPM's forward process!

3. Reverse SDE: From Noise to Data

Anderson's Theorem

A remarkable fact: Running the forward SDE backwards in time is also an SDE!

Forward:

dx=f(x,t)dt+g(t)dwdx = f(x, t)dt + g(t)dw

Reverse:

dx=[f(x,t)g(t)2xlogpt(x)]dt+g(t)dwˉdx = [f(x, t) - g(t)^2 \nabla_x \log p_t(x)]dt + g(t)d\bar{w}

Where:

  • $\nabla_x \log p_t(x)$: Score function (the key!)
  • $d\bar{w}$: Reverse-time Wiener process

What is the Score Function?

xlogpt(x)=ϵ1αˉt\nabla_x \log p_t(x) = -\frac{\epsilon}{\sqrt{1-\bar{\alpha}_t}}

The score is the "gradient pointing toward data from current position."

Relationship between DDPM's noise prediction $\epsilon_\theta$ and score:

sθ(xt,t)=ϵθ(xt,t)1αˉts_\theta(x_t, t) = -\frac{\epsilon_\theta(x_t, t)}{\sqrt{1-\bar{\alpha}_t}}

4. Probability Flow ODE

The Key Discovery

A crucial finding by Song et al. (2021):

There exists a **deterministic ODE** with the **same marginal distribution** $p_t(x)$ as the SDE!

dx=[f(x,t)12g(t)2xlogpt(x)]dtdx = \left[f(x, t) - \frac{1}{2}g(t)^2 \nabla_x \log p_t(x)\right]dt

The noise term $g(t)dw$ disappears, only the drift is modified.

Probability Flow ODE for VP-SDE

dx=[12β(t)x12β(t)xlogpt(x)]dtdx = \left[-\frac{1}{2}\beta(t)x - \frac{1}{2}\beta(t)\nabla_x \log p_t(x)\right]dt

Substituting score with $\epsilon_\theta$:

dx=[12β(t)x+β(t)21αˉtϵθ(x,t)]dtdx = \left[-\frac{1}{2}\beta(t)x + \frac{\beta(t)}{2\sqrt{1-\bar{\alpha}_t}}\epsilon_\theta(x, t)\right]dt

This is identical to DDIM with $\eta=0$!

5. SDE vs ODE: Characteristic Comparison

Sampling Paths

PropertySDE (Reverse)ODE (Probability Flow)
PathStochastic (different each time)Deterministic (always same)
NoiseAdded at each stepNone
DiversityHighLow (same z → same x)
SpeedSlow (small steps needed)Fast (large steps possible)

Mathematical Relationship

python
          SDE                    ODE
     ┌───────────┐         ┌───────────┐
z ~  │ Reverse   │   z ~   │ Probability│
N(0,I)│ SDE      │  N(0,I) │ Flow ODE  │
     │           │         │           │
     └─────┬─────┘         └─────┬─────┘
           │                     │
           ▼                     ▼
        x ~ p_data           x ~ p_data

     Same marginal distribution, different paths!

DDPM vs DDIM

ModelBasisCharacteristics
DDPMReverse SDE$\eta=1$, stochastic
DDIMProbability Flow ODE$\eta=0$, deterministic
DDIM (general)Mixture of both$0 \leq \eta \leq 1$

DDIM's $\eta$ parameter:

  • $\eta = 0$: Pure ODE (deterministic)
  • $\eta = 1$: Pure SDE (same as DDPM)
  • $0 < \eta < 1$: Interpolation

6. Score Matching: Learning the Score

Denoising Score Matching

Learning the score function directly is difficult. Instead, we use Denoising Score Matching:

L=Et,x0,ϵ[ϵϵθ(xt,t)2]\mathcal{L} = \mathbb{E}_{t, x_0, \epsilon}\left[\|\epsilon - \epsilon_\theta(x_t, t)\|^2\right]

This is identical to DDPM's training objective!

Equivalence of Score and Noise Prediction

Score: sθ(x,t)xlogpt(x)\text{Score: } s_\theta(x, t) \approx \nabla_x \log p_t(x)

Noise: ϵθ(x,t)ϵ\text{Noise: } \epsilon_\theta(x, t) \approx \epsilon

Relationship:

sθ=ϵθσts_\theta = -\frac{\epsilon_\theta}{\sigma_t}

Thus noise prediction = score prediction (only scale differs)

7. Numerical Solvers

SDE Solvers

Euler-Maruyama (most basic):

xtΔt=xt+f(xt,t)Δt+g(t)Δtzx_{t-\Delta t} = x_t + f(x_t, t)\Delta t + g(t)\sqrt{\Delta t} \cdot z

Predictor-Corrector (Song et al.):

  1. Predictor: Euler step
  2. Corrector: Refine with Langevin dynamics

ODE Solvers

Euler (1st order):

xtΔt=xt+f(xt,t)Δtx_{t-\Delta t} = x_t + f(x_t, t)\Delta t

Heun (2nd order):

x~=xt+f(xt,t)Δt\tilde{x} = x_t + f(x_t, t)\Delta t

xtΔt=xt+12[f(xt,t)+f(x~,tΔt)]Δtx_{t-\Delta t} = x_t + \frac{1}{2}[f(x_t, t) + f(\tilde{x}, t-\Delta t)]\Delta t

DPM-Solver (specialized higher-order solver):

  • Exploits structure of diffusion ODE
  • High quality with 10-20 steps

Solver Comparison

SolverOrderStepsCharacteristics
Euler-Maruyama11000+Basic SDE
DDPM11000Discrete SDE
DDIM150-100ODE
DPM-Solver2-310-25Higher-order ODE
DPM-Solver++2-310-20Improved version

8. Connection to Flow Matching

Conditional Flow Matching

Flow Matching is also ODE-based:

dx=vθ(x,t)dtdx = v_\theta(x, t)dt

Differences:

  • Diffusion ODE: Drift derived from score
  • Flow Matching: Directly learn velocity

Same Result, Different Paths

Both transform $p_{\text{noise}} \to p_{\text{data}}$ but:

PropertyDiffusion ODEFlow Matching
PathCurved (score-based)Straight (optimal transport)
DerivationDerived from SDEDirectly defined
Training Target$\epsilon$ prediction$v$ prediction

9. Practical Selection Guide

When to Use SDE?

  • When diversity is important
  • When sufficient compute is available
  • When stochastic refinement is needed (e.g., inpainting)

When to Use ODE?

  • When speed is important
  • When deterministic results are needed (reproducibility)
  • When latent interpolation is needed

Choices of Modern Models

ModelChoiceReason
DALL-E 2SDE (DDPM)Quality priority
Stable DiffusionODE (DDIM/DPM)Speed-quality balance
SD3/FLUXFlow ODEFast generation with straight paths

10. Advanced Topics

Continuous Normalizing Flows (CNF)

From the ODE perspective, diffusion is a type of Normalizing Flow:

logp0(x0)=logpT(xT)0Tdiv(f(xt,t))dt\log p_0(x_0) = \log p_T(x_T) - \int_0^T \text{div}(f(x_t, t)) dt

This enables likelihood computation as well.

Optimal Transport Perspective

Probability Flow ODE connects to Optimal Transport:

  • "Shortest path" between two distributions
  • Related to Wasserstein distance

Guidance in SDE vs ODE

Classifier-Free Guidance applies to both SDE and ODE:

s~(x,t)=s(x,t)+w(s(x,tc)s(x,t))\tilde{s}(x, t) = s(x, t) + w \cdot (s(x, t | c) - s(x, t))

Conclusion

ConceptSDEODE
Formula$dx = f dt + g dw$$dx = f dt$
Representative ModelsDDPMDDIM, Flow Matching
PathStochasticDeterministic
AdvantagesDiversity, theoretical foundationSpeed, reproducibility
DisadvantagesSlowReduced diversity

Key Insight: SDE and ODE solve the same problem in different ways. Thanks to Probability Flow ODE, we can maintain the theoretical advantages of SDE while gaining the practical benefits of ODE.

References

  1. Song, Y., et al. "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021)
  2. Ho, J., et al. "Denoising Diffusion Probabilistic Models" (NeurIPS 2020)
  3. Song, J., et al. "Denoising Diffusion Implicit Models" (ICLR 2021)
  4. Lipman, Y., et al. "Flow Matching for Generative Modeling" (ICLR 2023)
  5. Lu, C., et al. "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling" (NeurIPS 2022)

Stay Updated

Follow us for the latest posts and tutorials

Subscribe to Newsletter

Related Posts