Part 1: Evolution of Diffusion, Generative AI (Part 1)

October 10, 2024

This short article is to provide you with a comprehensive glance into the five big categories and general concepts of planning methods

Generated by my flux LoRA model.

Since 2020, diffusion has become the de facto standard in generative AI, capable of producing various outputs such as images, videos, sounds, and even animations. One of the most notable implementations is Stable Diffusion by Stability AI. Its widespread impact can be seen through open-source platforms like Automatic1111 and ComfyUI. However, the field is evolving so rapidly that the term "diffusion" can seem outdated to researchers.

This article will be helpful for you if you are in one of these situations:

Aspiring beginners wanting to study diffusion but unsure where to start or in what order to study.
Those who have studied various diffusion-related papers and want to organize their knowledge.

I will walk you through seven landmark papers that have shaped diffusion-based generative AI and highlight the key characteristics of each. You might notice a recurring theme: more recent papers (let's call them A) often argue that the theories in previous papers (B) are special cases of A.

Fig. Drawn by Felipe Jeon

Summary of Diffusion in One Sentence:

Finding a bidirectional route between the data distribution and a simpler distribution from which we can efficiently sample (like a Gaussian).

Route forward: Data distribution $p_{\text{data}}(x) \rightarrow \text{Gaussian } N(0, I)$ . This is called the diffusion process, forward process, SDE, or ODE.
Route backward: Gaussian → data. This is called the denoising process, reverse process, or solving SDE/ODE. This is the generative process.

The differences in the seven papers lie in how they construct this route within various frameworks (diffusion, Langevin, SDE, optimal transport, flow). All figures were brought from corresponding papers.

Paper 1: Generative Modeling by Estimating Gradients of the Data Distribution (SMLD)

Conference: NeurIPS 2019
Author: Yang Song
Links: paper, blog
Keywords: Langevin Dynamics, score matching, data manifold

Fig. Brought from the author's blog.

If you can estimate the direction in which data probability density $p_{\text{data}}(x)$ increases (gradient), a particle starting from $N(0, I)$ can eventually reach $p_{\text{data}}(x)$ using Langevin Dynamics. This paper presents how to effectively learn this direction (or score) despite the sparse distribution of training images (referred to as low-dimensional manifold in the paper) in the pixel space. The method, called annealing, fills the sparse regions of the data distribution by perturbing data with various noise levels. Thus, it shows how to generate an image via Langevin Dynamics (update rule) starting from pure noise once the score is learned.

Paper 2: Denoising Diffusion Probabilistic Models (DDPM)

Conference: NeurIPS 2020
Author: Jonathon Ho
Link: paper
Keywords: noise prediction, diffusion process, scheduling

This well-known diffusion paper focuses on transforming $p_{\text{data}}(x)$ into $N(0, I)$ by gradually injecting Gaussian noise at each time step, a process known as the diffusion or forward process, which looks like:
image → noised image → more noised image → ... → pure noise.

Fig. Brought from the author's paper.

If you can predict the noise at each step, you can recover the image from pure noise in the denoising or backward process. Training a complex UNet to predict this noise is called noise prediction training. The way to shape the forward process (e.g., how fast the image should be destructed) is called scheduling.
This paper formulates the probability distribution of intermediate noised images, making the back-and-forth random process more stable than in SMLD (paper 1). DDPM uses pixel space, demanding high memory and computation, which led to methods like latent diffusion in more compressed spaces. Backed by Stability AI, Stable Diffusion 1.5 was developed based on latent diffusion.

Paper 3: Score-Based Generative Modeling through Stochastic Differential Equations (SDE)

Author: Yang Song
Date: November 26, 2020
Conference: ICLR 2021
Link: paper
Keywords: SDE, VE-SDE, VP-SDE, Probability Flow ODE

Fig. The data x(0) moves to x(T) by ODE or SDE, and arrives simple Gaussian. Reversing the flow leads to original distrituion.

Yang Song, author of SMLD, introduced this paper as a counter to the previous ones, attempting to combine them within a single paradigm: stochastic differential equations (SDE), extending the discrete steps into a continuous domain. The data $x(0)$ moves to $x(T)$ by ODE or SDE, and arrives at a simple Gaussian. Reversing the flow leads back to the original distribution.

Conceptually, you can consider this as a movement described as $dx = v \cdot dt$ , where $dx$ is displacement and $v$ is velocity, forming an ordinary differential equation (ODE). If randomness $w$ is added to the movement, then $dx = v \cdot dt + w$ , forming an SDE. Reversing the time to obtain the trajectory $x$ is referred to as solving SDE.
In this paper, the images $x$ in $p_{\text{data}}(x)$ flow along some SDE until $N(0, I)$ , then solving the SDE takes us in the opposite direction, corresponding to sample generation. The authors demonstrated that generation can be improved by using existing SDE solvers instead of relying on the denoised process in the DDPM paper.

They also introduced ODE to establish a deterministic routing between $p_{\text{data}}$ and $N(0, I)$ by removing randomness from the generation process. This will be discussed in the next paper.

All for What?

So far, we have observed three papers, all of which have the following in common:

Transferring real data distribution into a simple distribution through rules like annealing, diffusion, and SDE (or ODE).
Sampling from the simple distribution and reversing the path via Langevin Dynamics, the denoising process, and solving SDE (or ODE).

Although these concepts were referred to by different terms and within different frameworks, they share a common goal. From the next paper onward, we discuss how researchers have worked to make these processes faster and more accurate.