Paper Notes -- Consistency Models

Notes for this paper

Consistency Training

In typical setups, we rely on a pre-trained score model, sϕ(x,t)\boldsymbol{s}_\phi(\mathbf{x}, t), to approximate the true score function logpt(x)\nabla \log p_t(\mathbf{x}). However, we can bypass this pre-trained model by using the following unbiased estimator:

logpt(xt)=E[xtxt2xt]\nabla \log p_t\left(\mathbf{x}_t\right)=-\mathbb{E}\left[\left.\frac{\mathbf{x}_t-\mathbf{x}}{t^2} \right\rvert\, \mathbf{x}_t\right]

where xpdata\mathbf{x} \sim p_{\text{data}} and xtN(x;t2I)\mathbf{x}_t \sim \mathcal{N}(\mathbf{x}; t^2 \boldsymbol{I}). This implies that, given x\mathbf{x} and xt\mathbf{x}_t, we can estimate logpt(xt)\nabla \log p_t(\mathbf{x}_t) as (xtx)/t2-\left(\mathbf{x}_t - \mathbf{x}\right) / t^2.

This unbiased estimate serves as a sufficient replacement for the pre-trained diffusion model in consistency distillation, particularly when using the Euler method as the ODE solver in the limit of NN \rightarrow \infty.


A key trick commonly used is:

logf=ff\nabla \log f = \frac{{\nabla f}}{f}

or equivalently,

f=flogf\nabla f = f \cdot \nabla \log f