Diffusion
Gaussian Noise Basics
\[\epsilon \sim \mathcal{N}(0, \mathbf{I})\]
\[\mathbf{x} = \mu + \sigma \epsilon\]
\[\mathcal{N}(\mathbf{x}; \mu, \Sigma)\]
Forward Diffusion Process
\[q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}\left(\mathbf{x}_t; \sqrt{1-\beta_t}\mathbf{x}_{t-1}, \beta_t \mathbf{I}\right)\]
\[\alpha_t = 1 - \beta_t\]
\[\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s\]
\[q(\mathbf{x}_t \mid \mathbf{x}_0) = \mathcal{N}\left(\mathbf{x}_t; \sqrt{\bar{\alpha}_t}\mathbf{x}_0, (1-\bar{\alpha}_t)\mathbf{I}\right)\]
\[\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t}\epsilon\]
Reverse Denoising Process
\[p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \mu_\theta(\mathbf{x}_t, t), \Sigma_\theta(\mathbf{x}_t, t))\]
\[\mu_\theta(\mathbf{x}_t, t) =
\frac{1}{\sqrt{\alpha_t}}
\left(
\mathbf{x}_t -
\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(\mathbf{x}_t, t)
\right)\]
Training Objective
\[\mathcal{L}_{\mathrm{simple}} =
\mathbb{E}_{\mathbf{x}_0, \epsilon, t}
\left[
\left\lVert
\epsilon - \epsilon_\theta(\mathbf{x}_t, t)
\right\rVert_2^2
\right]\]
Sampling Step
\[\mathbf{x}_{t-1} =
\frac{1}{\sqrt{\alpha_t}}
\left(
\mathbf{x}_t -
\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(\mathbf{x}_t, t)
\right)
+
\sigma_t \mathbf{z}\]
\[\mathbf{z} \sim \mathcal{N}(0, \mathbf{I}), \quad t > 1\]
Classifier-Free Guidance
\[\hat{\epsilon}_{\mathrm{cfg}}(\mathbf{x}_t, t, c) =
\epsilon_\theta(\mathbf{x}_t, t, \varnothing)
+
w \left(
\epsilon_\theta(\mathbf{x}_t, t, c) -
\epsilon_\theta(\mathbf{x}_t, t, \varnothing)
\right)\]
\[w \geq 1\]