Part 1: Evolution of Diffusion, Generative AI (Part 1)

Generated by my flux LoRA model.
Since 2020, diffusion has become the de facto standard in generative AI, capable of producing various outputs such as images, videos, sounds, and even animations. One of the most notable implementations is Stable Diffusion by Stability AI. Its widespread impact can be seen through open-source platforms like Automatic1111 and ComfyUI. However, the field is evolving so rapidly that the term "diffusion" can seem outdated to researchers.
This article will be helpful for you if you are in one of these situations:
- Aspiring beginners wanting to study diffusion but unsure where to start or in what order to study.
- Those who have studied various diffusion-related papers and want to organize their knowledge.
I will walk you through seven landmark papers that have shaped diffusion-based generative AI and highlight the key characteristics of each. You might notice a recurring theme: more recent papers (let's call them A) often argue that the theories in previous papers (B) are special cases of A.

Fig. Drawn by Felipe Jeon
Summary of Diffusion in One Sentence:
Finding a bidirectional route between the data distribution and a simpler distribution from which we can efficiently sample (like a Gaussian).
- Route forward: Data distribution . This is called the diffusion process, forward process, SDE, or ODE.
- Route backward: Gaussian → data. This is called the denoising process, reverse process, or solving SDE/ODE. This is the generative process.
The differences in the seven papers lie in how they construct this route within various frameworks (diffusion, Langevin, SDE, optimal transport, flow). All figures were brought from corresponding papers.
Paper 1: Generative Modeling by Estimating Gradients of the Data Distribution (SMLD)
- Conference: NeurIPS 2019
- Author: Yang Song
- Links: paper, blog
- Keywords: Langevin Dynamics, score matching, data manifold

Fig. Brought from the author's blog.
If you can estimate the direction in which data probability density increases (gradient), a particle starting from can eventually reach using Langevin Dynamics. This paper presents how to effectively learn this direction (or score) despite the sparse distribution of training images (referred to as low-dimensional manifold in the paper) in the pixel space. The method, called annealing, fills the sparse regions of the data distribution by perturbing data with various noise levels. Thus, it shows how to generate an image via Langevin Dynamics (update rule) starting from pure noise once the score is learned.
Paper 2: Denoising Diffusion Probabilistic Models (DDPM)
- Conference: NeurIPS 2020
- Author: Jonathon Ho
- Link: paper
- Keywords: noise prediction, diffusion process, scheduling
This well-known diffusion paper focuses on transforming into by gradually injecting Gaussian noise at each time step, a process known as the diffusion or forward process, which looks like:
image → noised image → more noised image → ... → pure noise.

Fig. Brought from the author's paper.
If you can predict the noise at each step, you can recover the image from pure noise in the denoising or backward process. Training a complex UNet to predict this noise is called noise prediction training. The way to shape the forward process (e.g., how fast the image should be destructed) is called scheduling.
This paper formulates the probability distribution of intermediate noised images, making the back-and-forth random process more stable than in SMLD (paper 1). DDPM uses pixel space, demanding high memory and computation, which led to methods like latent diffusion in more compressed spaces. Backed by Stability AI, Stable Diffusion 1.5 was developed based on latent diffusion.
Paper 3: Score-Based Generative Modeling through Stochastic Differential Equations (SDE)
- Author: Yang Song
- Date: November 26, 2020
- Conference: ICLR 2021
- Link: paper
- Keywords: SDE, VE-SDE, VP-SDE, Probability Flow ODE

Fig. The data x(0) moves to x(T) by ODE or SDE, and arrives simple Gaussian. Reversing the flow leads to original distrituion.
Yang Song, author of SMLD, introduced this paper as a counter to the previous ones, attempting to combine them within a single paradigm: stochastic differential equations (SDE), extending the discrete steps into a continuous domain. The data moves to by ODE or SDE, and arrives at a simple Gaussian. Reversing the flow leads back to the original distribution.
Conceptually, you can consider this as a movement described as , where is displacement and is velocity, forming an ordinary differential equation (ODE). If randomness is added to the movement, then , forming an SDE. Reversing the time to obtain the trajectory is referred to as solving SDE.
In this paper, the images in flow along some SDE until , then solving the SDE takes us in the opposite direction, corresponding to sample generation. The authors demonstrated that generation can be improved by using existing SDE solvers instead of relying on the denoised process in the DDPM paper.
They also introduced ODE to establish a deterministic routing between and by removing randomness from the generation process. This will be discussed in the next paper.
All for What?
So far, we have observed three papers, all of which have the following in common:
- Transferring real data distribution into a simple distribution through rules like annealing, diffusion, and SDE (or ODE).
- Sampling from the simple distribution and reversing the path via Langevin Dynamics, the denoising process, and solving SDE (or ODE).
Although these concepts were referred to by different terms and within different frameworks, they share a common goal. From the next paper onward, we discuss how researchers have worked to make these processes faster and more accurate.