Giung Nam, 2026-03-20
Continuous Diffusion
For continuous data , the most basic choice for the forward process is a Gaussian distribution. In DDPM, the forward process allows us to sample any latent state directly from the data , or transition between two intermediate steps , as follows:
where is a strictly decreasing function in with and , and . The reverse posterior is then given by:
Denoising Diffusion Implicit Model (DDIM)
DDIM introduces a hyperparameter for the variance of the reverse posterior, generalizing the stochastic process while maintaining the same marginals . The reverse transition is defined by:
While setting recovers the standard reverse posterior, DDIM generally defines a non-Markovian forward process; the forward transition now depends on :
Using a neural network , parameterized by , the approximated reverse process can be written in the same form. The discrete-time and continuous-time diffusion losses are given by:
Continuous-time formulation (SDE, ODE)
DDIM injects noise on a discrete-time grid:
Transitioning to continuous-time involves reducing the time interval to an infinitesimal , turning difference equations into differential equations. By introducing , , and , the forward step and its approximation are given by:
Using the score-matching identity, , we obtain the increment:
which corresponds to the continuous-time SDE evolving forward and reverse in time, respectively:
where is the drift coefficient, is the diffusion coefficient, and and are standard Wiener processes in forward and reverse time, respectively.
Remark. Here, corresponds to the deterministic DDIM setup where ; the forward and reverse SDEs collapse into a single PF-ODE, establishing a unique, invertible path between the noise and the data. On the other hand, corresponds to the stochastic DDPM setup. It is worth noting that even in this stochastic regime, the reverse-time SDE does not inject arbitrary random noise; for , the diffusion term remains strictly coupled with the score-driven drift term.
Remark. Introducing a neural network for -prediction, i.e., , is more algebraically concise than -prediction in this formulation, as the -prediction network directly yields the score function via the score-matching identity. Indeed, it is standard practice for diffusion models to adopt an -prediction strategy.
Discrete Diffusion
For discrete data represented as a one-hot vector, the most basic choice for the forward process is a Categorical distribution. The forward process allows us to sample any latent state directly from the data , or transition between two intermediate steps , as follows:
where is a strictly decreasing function in with and , and . The reverse posterior is then given by:
Masked Diffusion Model (MDM)
MDM defines the category as a special [MASK] token, which serves as an absorbing state with a stationary distribution .
In the forward process, tokens either remain themselves or transition to the mask.
For , the reverse posterior is given by:
Using a neural network , parameterized by and ensured to be , the approximated reverse process can be written in the same form. The discrete-time and continuous-time diffusion losses are given by:
ReMasking Diffusion Model (ReMDM)
ReMDM introduces a hyperparameter to MDM, enabling transitions to the mask during the reverse process and generalizing the stochastic process while maintaining the same marginals . The reverse transition is defined by:
While setting recovers the reverse posterior of MDM, ReMDM generally defines a non-Markovian forward process; the forward transition now depends on :
The discrete-time diffusion loss is given by:
Uniform Diffusion Model (UDM)
UDM defines the stationary distribution as a uniform across all categories, . In this regime, the forward process allows a token to transition to any other category (including itself), eventually erasing all signal as it converges to uniform noise. For , the reverse posterior is given by:
Using a neural network , parameterized by , the approximated reverse process can be written in the same form. The continuous-time diffusion loss is given by:
where , , and .
Continuous-time formulation (CTMC)
Markovian discrete diffusion models randomly transit states on a discrete-time grid:
Transitioning to continuous-time involves reducing the time interval to an infinitesimal , turning difference equations into differential equations. By introducing , the transition probability matrix is:
As the state remains unchanged over zero time, i.e., , we obtain the increment:
which corresponds to the continuous-time CTMC evolving forward in time with the transition-rate matrix:
The forward evolution of the marginal probability distribution is governed by the Kolmogorov forward equation:
The relationship between the forward rate matrix and the reverse rate matrix is governed by the entry-wise probability balance:
Reward-tilted Diffusion
The ultimate goal of a diffusion model is to generate samples following the data distribution . However, in many applications, we seek samples under a specific reward function . To this end, we define the reward-tilted distribution by reweighting the original density with an exponential reward term:
Local search via guidance
In continuous diffusion, steering toward high reward can be achieved by modifying the score function. By defining an intermediate reward-tilted density , involving an intermediate reward function , the corresponding score is:
where the second term acts as a steering gradient. Since the reward is typically only defined at , one can define the intermediate reward as the expected tilted reward conditioned on the current noisy state:
For discrete diffusion, where gradients are unavailable, we tilt the transition rates of the CTMC. The reverse evolution is governed by a modified rate matrix via Doob's -transform:
Global search via SMC
While local guidance provides point-wise steering, SMC methods offer a robust global search strategy. Using the reverse transition of the original diffusion process as the proposal distribution, SMC evolves a population of particles via:
Here, the unnormalized and normalized incremental weights are:
Continuous Diffusion
For continuous data $\bm{x} \in \mathbb{R}^{K}$, the most basic choice for the forward process is a Gaussian distribution. In DDPM, the forward process allows us to sample any latent state $\bm{z}{t}$ directly from the data $\bm{z}{0} = \bm{x}$, or transition between two intermediate steps $s<t$, as follows: $$ \begin{align} q(\bm{z}{t} \mid \bm{z}{0}) &= \mathcal{N}\left( \bm{z}{t} ; \sqrt{\alpha{t}} \bm{z}{0}, (1-\alpha{t})\bm{I} \right), \ q(\bm{z}{t} \mid \bm{z}{s}) &= \mathcal{N}\left( \bm{z}{t} ; \sqrt{\alpha{t \mid s}} \bm{z}{s}, (1-\alpha{t \mid s})\bm{I} \right). \end{align} $$ where $\alpha_{t} \in [0, 1]$ is a strictly decreasing function in $t$ with $\alpha_{0} \approx 1$ and $\alpha_{1} \approx 0$, and $\alpha_{t \mid s} = \alpha_{t} / \alpha_{s}$. The reverse posterior is then given by: $$ \begin{align}\textstyle q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) = \mathcal{N}\left( \bm{z}{s} ; \frac{ \sqrt{\alpha_{t \mid s}}(1-\alpha_{s})\bm{z}{t} + \sqrt{\alpha{s}}(1-\alpha_{t \mid s})\bm{z}{0} }{1-\alpha{t}}, \frac{(1-\alpha_{s})(1-\alpha_{t \mid s})}{1-\alpha_{t}}\bm{I} \right). \end{align} $$
Denoising Diffusion Implicit Model (DDIM)
DDIM introduces a hyperparameter $\sigma_{s}$ for the variance of the reverse posterior, generalizing the stochastic process while maintaining the same marginals $q(\bm{z}{t} \mid \bm{z}{0})$. The reverse transition is defined by: $$ \begin{align}\textstyle q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) = \mathcal{N}\left( \bm{z}{s} ; \sqrt{\alpha_{s}}\bm{z}{0} + \sqrt{1 - \alpha{s} - \sigma_{s}^{2}} \frac{ \bm{z}{t} - \sqrt{\alpha{t}}\bm{z}{0} }{\sqrt{1 - \alpha{t}}}, \sigma_{s}^{2}\bm{I} \right). \end{align} $$ While setting $\sigma_{s}^{2} = \frac{(1-\alpha_{s})(1-\alpha_{t \mid s})}{(1-\alpha_{t})}$ recovers the standard reverse posterior, DDIM generally defines a non-Markovian forward process; the forward transition now depends on $\bm{z}{0}$: $$ \begin{align}\textstyle q(\bm{z}{t} \mid \bm{z}{s}, \bm{z}{0}) = \mathcal{N}\left( \bm{z}{t} ; \sqrt{\alpha{t}}\bm{z}{0} + \sqrt{1-\alpha{t}}\sqrt{1-\alpha_{s}-\sigma_{s}^{2}}\frac{\bm{z}{s} - \sqrt{\alpha{s}}\bm{z}{0}}{1-\alpha{s}}, \frac{\sigma_{s}^{2}(1-\alpha_{t})}{1-\alpha_{s}} \bm{I} \right). \end{align} $$ Using a neural network $\bm{x}{\phi}(\bm{z}{t}, t) \approx \bm{z}{0}$, parameterized by $\phi$, the approximated reverse process $p{\phi}(\bm{z}{s} \mid \bm{z}{t})$ can be written in the same form. The discrete-time and continuous-time diffusion losses are given by: $$ \begin{align} \mathcal{L}(\phi ; \bm{z}{0}, T) &= \textstyle \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T},\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ T D_{\text{KL}}\left( q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) \parallel p{\phi}(\bm{z}{s} \mid \bm{z}{t}) \right) \right] \ &= \textstyle \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T},\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ \frac{T}{2 \sigma{s}^{2}} \left( \sqrt{\alpha_{s}} - \sqrt{\alpha_{t}}\sqrt{\frac{1 - \alpha_{s} - \sigma_{s}^{2}}{1-\alpha_{t}}} \right)^{2} \left\lVert \bm{z}{0} - \bm{x}{\phi}(\bm{z}{t}, t) \right\rVert^{2} \right], \ \lim{T \to \infty} \mathcal{L}(\phi ; \bm{z}{0}, T) &= \textstyle \mathbb{E}{t \sim \mathcal{U}[0, 1]} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ \frac{1}{2\sigma_{t}^{2}} \left( \frac{\alpha_{t}^{\prime}}{2 \sqrt{\alpha_{t}} (1-\alpha_{t})} \left( \sqrt{\alpha_{t}} \sigma_{t}^{2} - \frac{1}{\sqrt{\alpha_{t}}} \right) \right)^{2} \left\lVert \bm{z}{0} - \bm{x}{\phi}(\bm{z}_{t}, t) \right\rVert^{2} \right]. \end{align} $$
Continuous-time formulation (SDE, ODE)
DDIM injects noise on a discrete-time grid: $$ \begin{align}\textstyle \bm{z}{t} = \sqrt{\alpha{t}} \bm{z}{0} + \sqrt{1 - \alpha{t} - \sigma_{s}^{2}} \frac{\bm{z}{s} - \sqrt{\alpha{s}}\bm{z}{0}}{\sqrt{1 - \alpha{s}}} + \sigma_{s}\bm{\varepsilon}, \text{ where } \bm{\varepsilon} \sim \mathcal{N}(\bm{0}, \bm{I}). \end{align} $$ Transitioning to continuous-time involves reducing the time interval $\Delta t = t - s$ to an infinitesimal $\mathrm{d}t$, turning difference equations into differential equations. By introducing $\bm{\varepsilon}{s} = \frac{\bm{z}{s} - \sqrt{\alpha_{s}} \bm{z}{0}}{\sqrt{1 - \alpha{s}}}$, $\beta_t = \frac{1 - \alpha_{t \mid s}}{\Delta t}$, and $\sigma_{s}^{2} = \frac{(1-\alpha_{s}) (1-\alpha_{t \mid s}) \eta^{2}}{(1-\alpha_{t})}$, the forward step and its approximation are given by: $$ \begin{align} \bm{z}{t} &\textstyle = \sqrt{1 - \beta{t} \Delta t} \bm{z}{s} + \left( \sqrt{1 - \alpha{t} - \sigma_{s}^{2}} - \sqrt{1 - \beta_{t} \Delta t} \sqrt{1 - \alpha_{s} - \sigma_{s}^{2}} \right) \bm{\varepsilon}{s} + \eta \sqrt{\beta{t} \Delta t} \bm{\varepsilon} \ &\textstyle \approx \left( 1 - \frac{1}{2} \beta_{t} \Delta t \right) \bm{z}{s} + \left( \frac{1 - \eta^{2}}{2 \sqrt{1 - \alpha{s}}} \beta_{t} \Delta t \right) \bm{\varepsilon}{s} + \eta \sqrt{\beta{t} \Delta t} \bm{\varepsilon}. \end{align} $$ Using the score-matching identity, $\bm{\varepsilon}{s} = -\sqrt{1 - \alpha{s}} \nabla_{\bm{z} = \bm{z}{s}} \log{q(\bm{z} \mid \bm{z}{0})}$, we obtain the increment: $$ \begin{align}\textstyle \bm{z}{t} - \bm{z}{t - \Delta t} \approx -\frac{1}{2} \beta_{t} \Delta t \bm{z}{t - \Delta t} -\frac{1 - \eta^{2}}{2} \beta{t} \Delta t \nabla_{\bm{z} = \bm{z}{t - \Delta t}} \log{q(\bm{z}{t - \Delta t} \mid \bm{z}{0})} +\eta \sqrt{\beta{t} \Delta t} \bm{\varepsilon}, \end{align} $$ which corresponds to the continuous-time SDE evolving forward and reverse in time, respectively: $$ \begin{align} \mathrm{d}\bm{z}(t) &\textstyle = \left[ \bm{f}(\bm{z}(t), t) - \frac{1 - \eta^{2}}{2}g^{2}(t) \nabla_{\bm{z}{t} = \bm{z}(t)} \log{q(\bm{z}{t} \mid \bm{z}{0})} \right] \mathrm{d}t + \eta g(t) \mathrm{d}\bm{w}(t), \ \mathrm{d}\bar{\bm{z}}(t) &\textstyle = \left[ \bm{f}(\bar{\bm{z}}(t), t) - \frac{1 + \eta^{2}}{2} g^{2}(t) \nabla{\bm{z}{t} = \bar{\bm{z}}(t)} \log{q(\bm{z}{t})} \right] \mathrm{d}t + \eta g(t) \mathrm{d}\bar{\bm{w}}(t), \end{align} $$ where $\bm{f}(\bm{z}(t), t) = -\frac{1}{2} \beta_{t} \bm{z}(t)$is the drift coefficient, $g(t) = \sqrt{\beta_{t}}$ is the diffusion coefficient, and $\bm{w}(t)$ and $\bar{\bm{w}}(t) = \bm{w}(1-t) - \bm{w}(1)$ are standard Wiener processes in forward and reverse time, respectively.
Remark. Here, $\eta = 0$ corresponds to the deterministic DDIM setup where $\sigma_{s} = 0$; the forward and reverse SDEs collapse into a single PF-ODE, establishing a unique, invertible path between the noise and the data. On the other hand, $\eta=1$ corresponds to the stochastic DDPM setup. It is worth noting that even in this stochastic regime, the reverse-time SDE does not inject arbitrary random noise; for $g(t) \neq 0$, the diffusion term remains strictly coupled with the score-driven drift term.
Remark. Introducing a neural network for $\bm{\varepsilon}$-prediction, i.e., $\bm{\varepsilon}{\phi}(\bm{z}{t}, t) \approx \bm{\varepsilon}_{0}$, is more algebraically concise than $\bm{x}$-prediction in this formulation, as the $\bm{\varepsilon}$-prediction network directly yields the score function via the score-matching identity. Indeed, it is standard practice for diffusion models to adopt an $\bm{\varepsilon}$-prediction strategy.
Discrete Diffusion
For discrete data $\bm{x} \in {0, 1}^{K}$ represented as a one-hot vector, the most basic choice for the forward process is a Categorical distribution. The forward process allows us to sample any latent state $\bm{z}{t}$ directly from the data $\bm{z}{0} = \bm{x}$, or transition between two intermediate steps $s<t$, as follows: $$ \begin{align} q(\bm{z}{t} \mid \bm{z}{0}) &= \mathrm{Cat}\left( \bm{z}{t} ; \alpha{t} \bm{z}{0} + (1-\alpha{t}) \bm{\pi} \right), \ q(\bm{z}{t} \mid \bm{z}{s}) &= \mathrm{Cat}\left( \bm{z}{t} ; \alpha{t \mid s} \bm{z}{s} + (1-\alpha{t \mid s}) \bm{\pi} \right). \end{align} $$ where $\alpha_{t} \in [0, 1]$ is a strictly decreasing function in $t$ with $\alpha_{0} \approx 1$ and $\alpha_{1} \approx 0$, and $\alpha_{t \mid s} = \alpha_{t} / \alpha_{s}$. The reverse posterior is then given by: $$ \begin{align}\textstyle q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) = \mathrm{Cat}\left( \bm{z}{s} ; \frac{ \left[ \alpha_{t \mid s} \bm{z}{t} + (1-\alpha{t \mid s}) \langle \bm{z}{t}, \bm{\pi} \rangle \bm{1} \right] \odot \left[ \alpha{s} \bm{z}{0} + (1-\alpha{s}) \bm{\pi} \right] }{ \alpha_{t} \langle \bm{z}{t}, \bm{z}{0} \rangle + (1-\alpha_{t}) \langle \bm{z}_{t}, \bm{\pi} \rangle } \right). \end{align} $$
Masked Diffusion Model (MDM)
MDM defines the $K^\text{th}$ category as a special [MASK] token, which serves as an absorbing state with a stationary distribution $\bm{\pi} = \bm{m} = [0,\dots,0,1]^\top$.
In the forward process, tokens either remain themselves or transition to the mask.
For $s < t$, the reverse posterior is given by:
$$
\begin{align}
q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) =
\begin{cases}
\mathrm{Cat}\left( \bm{z}{s} ; \bm{z}{0} \right) & \bm{z}{t} \neq \bm{m}, \
\mathrm{Cat}\left( \bm{z}{s} ; \frac{(1-\alpha{s})\bm{m} + (\alpha_{s} - \alpha_{t})\bm{z}{0}}{1-\alpha{t}} \right) & \bm{z}{t} = \bm{m}.
\end{cases}
\end{align}
$$
Using a neural network $\bm{x}{\phi}(\bm{z}{t}, t) \approx \bm{z}{0}$, parameterized by $\phi$ and ensured to be $\langle \bm{x}{\phi}, \bm{m} \rangle = 0$, the approximated reverse process $p{\phi}(\bm{z}{s} \mid \bm{z}{t})$ can be written in the same form.
The discrete-time and continuous-time diffusion losses are given by:
$$
\begin{align}
\mathcal{L}(\phi ; \bm{z}{0}, T)
&=\textstyle \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T}\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[
T D_{\text{KL}}\left( q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) \parallel p{\phi}(\bm{z}{s} \mid \bm{z}{t}) \right)
\right] \
&=\textstyle \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T}\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[
\frac{T(\alpha{t} - \alpha_{s})}{1 - \alpha_{t}} \log{\langle \bm{x}{\phi}(\bm{z}{t}, t), \bm{z}{0} \rangle}
\right], \
\lim{T \to \infty} \mathcal{L}(\phi ; \bm{z}{0}, T)
&=\textstyle \mathbb{E}{t \sim \mathcal{U}[0, 1]} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[
\frac{\alpha_{t}^\prime}{1 - \alpha_{t}} \log{\langle \bm{x}{\phi}(\bm{z}{t}, t), \bm{z}_{0} \rangle}
\right].
\end{align}
$$
ReMasking Diffusion Model (ReMDM)
ReMDM introduces a hyperparameter $\sigma_{t}$ to MDM, enabling transitions to the mask during the reverse process and generalizing the stochastic process while maintaining the same marginals $q(\bm{z}{t} \mid \bm{z}{0})$. The reverse transition is defined by: $$ \begin{align} q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) = \begin{cases} \mathrm{Cat}\left( \bm{z}{s} ; \sigma_{t} \bm{m} + (1 - \sigma_{t}) \bm{z}{0} \right) & \bm{z}{t} \neq \bm{m}, \ \mathrm{Cat}\left( \bm{z}{s} ; \frac{(1 - \alpha{s} - \sigma_{t}\alpha_{t}) \bm{m} + (\alpha_{s} - \alpha_{t} + \sigma_{t}\alpha_{t}) \bm{z}{0}}{1-\alpha{t}} \right) & \bm{z}{t} = \bm{m}. \end{cases} \end{align} $$ While setting $\sigma{t} = 0$ recovers the reverse posterior of MDM, ReMDM generally defines a non-Markovian forward process; the forward transition now depends on $\bm{z}{0}$: $$ \begin{align} q(\bm{z}{t} \mid \bm{z}{s}, \bm{z}{0}) = \begin{cases} \mathrm{Cat}\left( \bm{z}{t} ; \frac{ (1 - \sigma{t}) \alpha_{t} \bm{z}{0} + (\alpha{s} - \alpha_{t} + \sigma_{t}\alpha_{t}) \bm{m} }{\alpha_{s}} \right) & \bm{z}{s} \neq \bm{m}, \ \mathrm{Cat}\left( \bm{z}{t} ; \frac{ \sigma_{t} \alpha_{t} \bm{z}{0} + (1 - \alpha{s} - \sigma_{t} \alpha_{t})\bm{m} }{1-\alpha_{s}} \right) & \bm{z}{s} = \bm{m}. \end{cases} \end{align} $$ The discrete-time diffusion loss is given by: $$ \begin{align} \mathcal{L}(\phi ; \bm{z}{0}, T) &=\textstyle \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T}\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ T D{\text{KL}}\left( q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) \parallel p{\phi}(\bm{z}{s} \mid \bm{z}{t}) \right) \right] \ &=\textstyle \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T}\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ \frac{T(\alpha{t} - \alpha_{s} - \sigma_{t}\alpha_{t})}{1 - \alpha_{t}} \log{\langle \bm{x}{\phi}(\bm{z}{t}, t), \bm{z}_{0} \rangle} \right]. \end{align} $$
Uniform Diffusion Model (UDM)
UDM defines the stationary distribution as a uniform across all categories, $\bm{\pi} = \bm{1} / K$. In this regime, the forward process allows a token to transition to any other category (including itself), eventually erasing all signal as it converges to uniform noise. For $s < t$, the reverse posterior is given by: $$ \begin{align}\textstyle q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) = \mathrm{Cat}\left( \bm{z}{s} ; \frac{ \left[ K \alpha_{t \mid s} \bm{z}{t} + (1 - \alpha{t \mid s}) \bm{1} \right] \odot \left[ K \alpha_{s} \bm{z}{0} + (1 - \alpha{s}) \bm{1} \right] }{K \alpha_{t} \langle \bm{z}{t}, \bm{z}{0} \rangle + (1-\alpha_{t})} \right). \end{align} $$ Using a neural network $\bm{x}{\phi}(\bm{z}{t}, t) \approx \bm{z}{0}$, parameterized by $\phi$, the approximated reverse process can be written in the same form. The continuous-time diffusion loss is given by: $$ \begin{align} \lim{T \to \infty} \mathcal{L}(\phi ; \bm{z}{0}, T) &=\textstyle \lim{T \to \infty} \mathbb{E}{t \sim {\frac{1}{T},\frac{2}{T}\dots,1}} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ T D{\text{KL}}\left( q(\bm{z}{s} \mid \bm{z}{t}, \bm{z}{0}) \parallel p{\phi}(\bm{z}{s} \mid \bm{z}{t}) \right) \right] \ &=\textstyle \mathbb{E}{t \sim \mathcal{U}[0, 1]} \mathbb{E}{\bm{z}{t} \sim q(\bm{z}{t} \mid \bm{z}{0})} \left[ \frac{\alpha{t}^{\prime}}{K \alpha_{t}} \left[ \frac{K}{\bar{\bm{x}}^{(i)}} - \frac{K}{\bar{\bm{x}}{\phi}^{(i)}} - \sum{j \text{ s.t. } \bm{z}{t}^{(j)} = 0} \left( \frac{\bar{\bm{x}}^{(j)}}{\bar{\bm{x}}^{(i)}} \log{\frac{\bar{\bm{x}}{\phi}^{(i)} \bar{\bm{x}}^{(j)}}{\bar{\bm{x}}{\phi}^{(j)} \bar{\bm{x}}^{(i)}}} \right) \right] \right], \end{align} $$ where $\bar{\bm{x}}^{(i)} = K \alpha{t} \bm{z}{0}^{(i)} + (1 - \alpha{t})$, $\bar{\bm{x}}{\phi}^{(i)} = K \alpha{t} \bm{x}{\phi}^{(i)}(\bm{z}{t}, t) + (1 - \alpha_{t})$, and $i = \argmax_{j \in [K]} \bm{z}_{t}^{(j)}$.
Continuous-time formulation (CTMC)
Markovian discrete diffusion models randomly transit states on a discrete-time grid: $$ \begin{align} \bm{z}{t} = (1 - b{t}) \bm{z}{s} + b{t} \bm{v}{t}, \text{ where } b{t} \sim \mathrm{Bern}(1 - \alpha_{t \mid s}) \text{ and } \bm{v}{t} \sim \mathrm{Cat}(\bm{\pi}). \end{align} $$ Transitioning to continuous-time involves reducing the time interval $\Delta t = t - s$ to an infinitesimal $\mathrm{d}t$, turning difference equations into differential equations. By introducing $\beta{t} = \frac{1 - \alpha_{t \mid s}}{\Delta t}$, the transition probability matrix is: $$ \begin{align} \bm{P}(s, t) = (1 - \beta_{t} \Delta t) \bm{I} + \beta_{t} \Delta t \bm{\pi} \bm{1}^\top. \end{align} $$ As the state remains unchanged over zero time, i.e., $\bm{P}(s, s) = \bm{I}$, we obtain the increment: $$ \begin{align} \bm{P}(s, t) - \bm{P}(s, s) = \beta_{t} \Delta t \left( \bm{\pi}\bm{1}^\top - \bm{I} \right), \end{align} $$ which corresponds to the continuous-time CTMC evolving forward in time with the transition-rate matrix: $$ \begin{align} \bm{Q}(t) = \beta_{t} (\bm{\pi}\bm{1}^\top - \bm{I}). \end{align} $$ The forward evolution of the marginal probability distribution $\bm{q}(t) = \alpha_{t} \bm{z}{0} + (1 - \alpha{t}) \bm{\pi}$ is governed by the Kolmogorov forward equation: $$ \begin{align}\textstyle \dot{\bm{q}}(t) = \bm{Q}(t) \bm{q}(t). \end{align} $$
The relationship between the forward rate matrix $\bm{Q}(t)$ and the reverse rate matrix $\bar{\bm{Q}}(t)$ is governed by the entry-wise probability balance: $$ \begin{align} \bar{Q}{ij}(t) = Q{ji}(t) \frac{q_{j}(t)}{q_{i}(t)}. \end{align} $$
Reward-tilted Diffusion
The ultimate goal of a diffusion model is to generate samples following the data distribution $q(\bm{z}{0})$. However, in many applications, we seek samples under a specific reward function $r$. To this end, we define the reward-tilted distribution by reweighting the original density with an exponential reward term: $$ \begin{align}\textstyle q^{r}(\bm{z}{0}) \propto q(\bm{z}{0}) \exp{\left( \beta r(\bm{z}{0}) \right)}. \end{align} $$
Local search via guidance
In continuous diffusion, steering toward high reward can be achieved by modifying the score function. By defining an intermediate reward-tilted density $q^{r}(\bm{z}{t}) \propto q(\bm{z}{t}) \exp{\left( \beta r_{t}(\bm{z}{t}) \right)}$, involving an intermediate reward function $r{t}$, the corresponding score is: $$ \begin{align}\textstyle \nabla_{\bm{z}{t}} \log{q^{r}(\bm{z}{t})} = \nabla_{\bm{z}{t}} \log{q(\bm{z}{t})} + \beta \nabla_{\bm{z}{t}} r{t}(\bm{z}{t}), \end{align} $$ where the second term acts as a steering gradient. Since the reward is typically only defined at $t=0$, one can define the intermediate reward $r{t}$ as the expected tilted reward conditioned on the current noisy state: $$ \begin{align}\textstyle r_{t}(\bm{z}{t}) := \frac{1}{\beta} \log{\mathbb{E}{\bm{z}{0} \sim q(\cdot \mid \bm{z}{t})} \left[ \exp{\left( \beta r(\bm{z}_{0}) \right)} \right]}. \end{align} $$
For discrete diffusion, where gradients are unavailable, we tilt the transition rates of the CTMC. The reverse evolution is governed by a modified rate matrix $\bar{\bm{Q}}^{r}(t)$ via Doob's $h$-transform: $$ \begin{align}\textstyle \bar{Q}^{r}{ij}(t) = \bar{Q}{ij}(t) \frac{h(j,t)}{h(i,t)}, \text{ where } h(\bm{z}{t},t) := \exp{\left( \beta r{t}(\bm{z}_{t}) \right)}. \end{align} $$
Global search via SMC
While local guidance provides point-wise steering, SMC methods offer a robust global search strategy. Using the reverse transition of the original diffusion process as the proposal distribution, SMC evolves a population of $M$ particles via: $$ \begin{align}\textstyle \text{(Sampling)} &\quad \forall m \in [M] : \bm{z}{s}^{m} \sim p(\bm{z}{s} \mid \bm{z}{t}^{m}), \ \text{(Resampling)} &\quad \forall m \in [M] : \bm{z}{s}^{m} \gets \bm{z}{s}^{m^{\prime}}, \text{ where } m^\prime \sim \mathrm{Cat}\left( \left{ \bar{G}(\bm{z}{t}^{i}, \bm{z}{s}^{i}) \right}{i=1}^{M} \right). \end{align} $$ Here, the unnormalized and normalized incremental weights are: $$ \begin{align}\textstyle G(\bm{z}{t}, \bm{z}{s}) = \frac{h(\bm{z}{s}, s)}{h(\bm{z}{t}, t)}, \quad \bar{G}(\bm{z}{t}^{i}, \bm{z}{s}^{i}) = \frac{G(\bm{z}{t}^{i}, \bm{z}{s}^{i})}{\sum_{j=1}^{M}G(\bm{z}{t}^{j}, \bm{z}{s}^{j}) }. \end{align} $$