Models & Algorithms

Flow Matching vs DDPM: Why ODE Beats SDE in Diffusion Models

DDPM needs 1000 steps, Flow Matching needs 10. The mathematics of straight-line generation. Comparing SDE curved paths vs ODE straight paths.

Flow Matching vs DDPM: Why ODE Beats SDE in Diffusion Models

Flow Matching vs DDPM: Why ODE Beats SDE in Diffusion Models

DDPM needs 1000 steps, Flow Matching needs 10. The mathematics of straight-line generation.

TL;DR

  • DDPM: Remove noise gradually via stochastic process. Random perturbations at each step create curved paths
  • Flow Matching: Move directly toward data via deterministic process. Straight paths enable fast generation
  • Key Difference: DDPM predicts "noise", Flow Matching predicts "velocity field"

1. Problem Setup: The Path from Noise to Data

The goal of generative models is simple:

Noise zN(0,I)Data xpdata\text{Noise } z \sim \mathcal{N}(0, I) \quad \longrightarrow \quad \text{Data } x \sim p_{\text{data}}

How do we achieve this transformation? Two paradigms emerge.

DDPM's Approach: "Slowly and Stochastically"

DDPM defines a Markov chain:

xt=αˉtx0+1αˉtϵ,ϵN(0,I)x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

where $\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s$ and $\alpha_t = 1 - \beta_t$.

As time progresses ($t \to T$), information about data $x_0$ vanishes, leaving only pure noise $\epsilon$.

Flow Matching's Approach: "Straight and Deterministic"

Flow Matching uses linear interpolation:

xt=(1t)x0+tϵ,t[0,1]x_t = (1 - t) x_0 + t \epsilon, \quad t \in [0, 1]

At $t=0$, we have $x_0$ (data). At $t=1$, we have $\epsilon$ (noise). Everything in between is a straight line.

2. Different Training Objectives

DDPM: Noise Prediction

DDPM trains a neural network $\epsilon_\theta$ to predict the added noise:

LDDPM=Ex0,ϵ,t[ϵϵθ(xt,t)2]\mathcal{L}_{\text{DDPM}} = \mathbb{E}_{x_0, \epsilon, t} \left[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2 \right]

Why predict noise? To recover $x_{t-1}$ in the reverse process:

xt1=1αt(xtβt1αˉtϵθ(xt,t))+σtzx_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \epsilon_\theta(x_t, t) \right) + \sigma_t z

where $z \sim \mathcal{N}(0, I)$ is fresh randomness added at each step. This is what curves the path.

Flow Matching: Velocity Prediction

Flow Matching trains a neural network $v_\theta$ to predict the velocity field:

LFM=Ex0,ϵ,t[vt(xt)vθ(xt,t)2]\mathcal{L}_{\text{FM}} = \mathbb{E}_{x_0, \epsilon, t} \left[ \| v_t(x_t) - v_\theta(x_t, t) \|^2 \right]

The target velocity field is the time derivative of the conditional path:

vt(xtx0,ϵ)=ddtxt=ddt[(1t)x0+tϵ]=ϵx0v_t(x_t | x_0, \epsilon) = \frac{d}{dt} x_t = \frac{d}{dt} \left[ (1-t)x_0 + t\epsilon \right] = \epsilon - x_0

This velocity is constant! Regardless of time $t$, we always move in the $\epsilon - x_0$ direction at constant speed.

3. Sampling: SDE vs ODE

DDPM Sampling: SDE-Based

DDPM's reverse process follows a Stochastic Differential Equation (SDE):

dx=[f(x,t)g(t)2xlogpt(x)]dt+g(t)dwˉdx = \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x) \right] dt + g(t) d\bar{w}

where:

  • $f(x, t)$: drift coefficient
  • $g(t)$: diffusion coefficient (noise magnitude)
  • $d\bar{w}$: reverse-time Brownian motion

The Problem: The $g(t) d\bar{w}$ term adds randomness at every step. The path meanders like Brownian motion, requiring many small steps to reach the target.

Flow Matching Sampling: ODE-Based

Flow Matching follows an Ordinary Differential Equation (ODE):

dxdt=vθ(x,t)\frac{dx}{dt} = v_\theta(x, t)

No stochastic term. We move deterministically along the learned velocity field.

Sampling:

python
# Euler method
x = torch.randn(batch_size, dim)  # Start: pure noise
dt = 1.0 / num_steps

for t in torch.linspace(1, 0, num_steps):
    v = model(x, t)  # Predict velocity
    x = x - v * dt   # Move along straight line

4. Why is Flow Matching Faster?

Mathematical Intuition

Consider the expected path length for DDPM. Due to Brownian motion characteristics:

E[Path Length]=O(T)\mathbb{E}\left[ \text{Path Length} \right] = \mathcal{O}(\sqrt{T})

where $T$ is the number of steps. More steps mean longer paths.

For Flow Matching's straight-line path:

Path Length=ϵx0=O(1)\text{Path Length} = \| \epsilon - x_0 \| = \mathcal{O}(1)

Independent of step count. Shortest possible distance.

Empirical Evidence

MethodStepsFID (CIFAR-10)
DDPM10003.17
DDPM10013.51
DDIM504.67
Flow Matching103.42

DDPM requires 1000 steps for quality results; Flow Matching achieves comparable quality with just 10.

5. Rectified Flow: Evolution of Flow Matching

Rectified Flow advances Flow Matching further.

Core Idea: Reflow

The learned flow may not be perfectly straight. Reflow "straightens" it:

  1. Generate $(z, x_0)$ pairs using the learned model
  2. Train a new straight-line path on these pairs
  3. Repeat to progressively straighten the trajectory

Lreflow=E(z,x0)πk[(x0z)vθ(xt,t)2]\mathcal{L}_{\text{reflow}} = \mathbb{E}_{(z, x_0) \sim \pi_k} \left[ \| (x_0 - z) - v_\theta(x_t, t) \|^2 \right]

Combined with Distillation

For 1-step generation, apply distillation:

Ldistill=Ez[x0teacherGθ(z)2]\mathcal{L}_{\text{distill}} = \mathbb{E}_{z} \left[ \| x_0^{\text{teacher}} - G_\theta(z) \|^2 \right]

where $G_\theta(z)$ generates data in a single forward pass.

6. Implementation Comparison

DDPM Forward Process

python
def ddpm_forward(x0, t, noise_schedule):
    """
    x_t = sqrt(alpha_bar_t) * x0 + sqrt(1 - alpha_bar_t) * epsilon
    """
    alpha_bar = noise_schedule.alpha_bar[t]
    epsilon = torch.randn_like(x0)

    x_t = torch.sqrt(alpha_bar) * x0 + torch.sqrt(1 - alpha_bar) * epsilon
    return x_t, epsilon

Flow Matching Forward Process

python
def flow_matching_forward(x0, t):
    """
    x_t = (1 - t) * x0 + t * epsilon
    """
    epsilon = torch.randn_like(x0)

    # Reshape t for broadcasting
    t = t.view(-1, 1, 1, 1)  # for images

    x_t = (1 - t) * x0 + t * epsilon
    velocity = epsilon - x0  # target velocity

    return x_t, velocity

Training Loop Comparison

python
# DDPM
for x0 in dataloader:
    t = torch.randint(0, T, (batch_size,))
    x_t, epsilon = ddpm_forward(x0, t, noise_schedule)

    epsilon_pred = model(x_t, t)
    loss = F.mse_loss(epsilon_pred, epsilon)

# Flow Matching
for x0 in dataloader:
    t = torch.rand(batch_size)  # uniform [0, 1]
    x_t, velocity = flow_matching_forward(x0, t)

    velocity_pred = model(x_t, t)
    loss = F.mse_loss(velocity_pred, velocity)

7. When to Use What?

Choose DDPM/DDIM When:

  • Leveraging existing pretrained models (Stable Diffusion, etc.)
  • High diversity is critical
  • Stochastic sampling is required

Choose Flow Matching When:

  • Fast inference is the priority
  • Training from scratch
  • Simple, intuitive implementation is valued

Choose Rectified Flow When:

  • 1-step or few-step generation is the goal
  • Real-time applications
  • Mobile/edge device deployment

8. Mathematical Connection: Score and Velocity

DDPM's score function and Flow Matching's velocity are closely related.

The score function is the gradient of log probability:

sθ(x,t)=xlogpt(x)s_\theta(x, t) = \nabla_x \log p_t(x)

Relationship between score and noise prediction in DDPM:

sθ(xt,t)=ϵθ(xt,t)1αˉts_\theta(x_t, t) = -\frac{\epsilon_\theta(x_t, t)}{\sqrt{1 - \bar{\alpha}_t}}

Relationship between velocity and score via probability flow ODE:

vθ(x,t)=f(x,t)12g(t)2sθ(x,t)v_\theta(x, t) = f(x, t) - \frac{1}{2} g(t)^2 s_\theta(x, t)

Therefore, a well-trained DDPM model can be converted to a Flow Matching model. This is one reason Stable Diffusion 3 transitioned to Rectified Flow.

Conclusion

PropertyDDPM (SDE)Flow Matching (ODE)
PathCurved (Brownian)Straight
Prediction TargetNoise $\epsilon$Velocity $v$
SamplingStochasticDeterministic
Required Steps100-10005-50
ImplementationModerateSimple

Flow Matching emerged from asking "Why take the long way?" The simple insight that straight lines are shortest has dramatically improved generation efficiency.

References

  1. Ho, J., et al. "Denoising Diffusion Probabilistic Models" (NeurIPS 2020)
  2. Lipman, Y., et al. "Flow Matching for Generative Modeling" (ICLR 2023)
  3. Liu, X., et al. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow" (ICLR 2023)
  4. Song, Y., et al. "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021)