11 Markov Chain Monte Carlo (January 8 2001)11.3 Variants of basic MCMC 11.5 The diffusion heuristic for optimal scaling of high dimensional Metropolis

11.4 A little theory

The chains designed for MCMC in previous sections are reversible, and therefore the theory of reversible chains developed in this book is available. Unfortunately there is very little extra to say – in that sense, there is no “theory of MCMC”. What follows is rather fragmentary observations.

11.4.1 Comparison methods

Consider the Metropolis chain

p^{{\rm Metro}}_{xy}=k_{xy}\min\left(1,\frac{\pi_{y}k_{yx}}{\pi_{x}k_{xy}}% \right),\quad y\neq x.

The requirement that a step of a chain be constructible as a proposal from $K$ followed by acceptance/rejection, is the requirement that $p_{xy}\leq k_{xy},\ y\neq x$ . Recall the asymptotic variance rate

\sigma^{2}(P,f):=\lim_{t}t^{-1}{\rm var}\ \sum_{s=1}^{t}f(X_{s}).

Lemma 11.1 (Peskun’s Theorem [280])

Given $K$ and $\pi$ , let $P$ be a reversible chain with $p_{xy}\leq k_{xy},\ y\neq x$ and with stationary distribution $\pi$ . Then $\sigma^{2}(P,f)\geq\sigma^{2}(P^{{\rm Metro}},f)\ \forall f$ .

Proof. Reversibility of $P$ implies

p_{xy}=\frac{\pi_{y}p_{yx}}{\pi_{x}}\leq\frac{\pi_{y}k_{yx}}{\pi_{x}}=k_{xy}\ % \frac{\pi_{y}k_{yx}}{\pi_{x}k_{xy}}

and hence

p_{xy}=p^{{\rm Metro}}_{xy}\beta_{xy}

where $\beta_{xy}=\beta_{yx}\leq 1,\ y\neq x$ . So the result follows directly from Peskun’s lemma (yyy Lemma 11.5, to be moved elsewhere). $\Box$

This result can be interpreted as saying that the Metropolis rates (11.5) are the optimal way of implementing a proposal-rejection scheme. Loosely speaking, a similar result holds in any natural Metropolis-like construction of a reversible chain using a $\max(1,\cdot)$ acceptance probability.

It is important to notice that Lemma 11.1 does not answer the following question, which (except for highly symmetric graphs) seems intractable.

Question. Given a connected graph and a probability distribution $\pi$ on its vertices, consider the class of reversible chains with stationary distribution $\pi$ and with transitions only across edges of the graph. Within that class, which chain has smallest relaxation time?

Unfortunately, standard comparison theorems don’t take us much further in comparing MCMC methods. To see why, consider Metropolis on $R^{d}$ with isotropic Normal $(0,\sigma^{2}{\bf I}_{d})$ proposal steps. This has some relaxation time $\tau_{2}(f,\sigma)$ , where $f$ is the target density. For $\sigma_{1}<\sigma_{2}$ , the normal densities $g_{\sigma}(x)$ satisfy $g_{\sigma_{2}}(x)/g_{\sigma_{1}}(x)\geq(\sigma_{1}/\sigma_{2})^{d}$ . So the comparison theorem (Chapter 3 Lemma 29) (yyy 9/2/94 version) shows

\tau_{2}(f,\sigma_{2})\geq(\sigma_{1}/\sigma_{2})^{d}\tau_{2}(f,\sigma_{2}),% \quad\sigma_{1}<\sigma_{2}.

But this is no help in determining the optimal $\sigma$ .

11.4.2 Metropolis with independent proposals

Though unrealistic in practical settings, the specialization of the Metropolis chain to the case where the proposal chain is i.i.d., that is where $k_{xy}=k_{y}$ , is mathematically a natural object of study. In this setting the transition matrix (11.5) becomes

p_{xy}=k_{y}\min\left(1,w_{y}/w_{x}\right),\quad y\neq x

where $w_{x}:=\pi_{w}/k_{x}$ . It turns out there is a simple and sharp coupling analysis, based on the trick of labeling states as $1,2,\ldots,n$ so that $w_{1}\geq w_{2}\geq\ldots\geq w_{n}$ (Liu [236] used this trick to give an eigenvalue analysis, extending part (b) below). Let $\rho$ be the chance that a proposed step from state $1$ is rejected (count a proposed step from state $1$ to $1$ as always accepted). So

\rho=\sum_{i=1}^{n}k_{i}(1-{\textstyle\frac{w_{i}}{w_{1}}})<1.

Proposition 11.2

For the Metropolis chain over independent proposals, with states ordered as above,

(a) $\bar{d}(t)\leq\rho^{t}$

(b) The relaxation time $\tau_{2}=(1-\rho)^{-1}$ .

Proof. For the chain started at state $1$ , the time $T$ of the first acceptance of a proposed step satisfies

P(T>t)=\rho^{t}.

Recall from (yyy Chapter 4-3 section 1; 10/11/99 version) the notion of coupling. For this chain a natural coupling is obtained by using the same $U(0,1)$ random variable to implement the accept/reject step (accept if $U<P(\mbox{accept})$ ) in two versions of the chain. It is easy to check this coupling $(X_{t},X^{\prime}_{t})$ respects the ordering: if $X_{0}\leq X^{\prime}_{0}$ then $X_{t}\leq X^{\prime}_{t}$ . At time $T$ the fact that a proposed jump from $1$ is accepted implies that a jump from any other state must be accepted. So $T$ is a coupling time, and the coupling inequality (yyy Chapter 4-3 section 1.1; 10/11/99 version) implies $\bar{d}(t)\leq P(T>t)$ . This establishes (a), and the general inequality $\bar{d}(t)=\Omega(\lambda_{2}^{t})$ implies $\lambda_{2}\leq\rho$ . On the other hand, for the chain started at state $1$ , on $\{T=1\}$ the time- $1$ distribution is $\pi$ ; in other words

P_{1}(X_{1}\in\cdot)=\rho\delta_{1}(\cdot)+(1-\rho)\pi(\cdot).

But this says that $\rho$ is an eigenvalue of $P$ (corresponding to the eigenvector $\delta_{1}-\pi$ ), establishing (b). $\Box$

In the continuous-space setting, with a proposal distribution uniform on $[0,1]$ and target density $f$ with $f^{*}:=\max_{x}f(x)$ , part (b) implies the relaxation time $\tau_{2}$ equals $f^{*}$ . So (unsurprisingly) Metropolis-over-independent is comparable to the basic rejection sampling scheme (section 11.1.2), which gives an exact sample in mean $f^{*}$ steps.

11.3 Variants of basic MCMC Bibliography 11.5 The diffusion heuristic for optimal scaling of high dimensional Metropolis

Generated on Mon Jun 2 14:23:48 2014 by LaTeXML [LOGO]