In any Metropolis scheme for sampling from a target distribution on , there arises the question of how large to take the steps of the proposal chain. One can answer this for isotropic-proposal schemes in high dimensions, in the setting where the target is a product distribution, and the result in this (very artificial) setting provides a heuristic for more realistic settings exemplified by (11.1).
Fix a probability density function on . For large consider the i.i.d. product distribution for . Suppose we want to sample from using Metropolis or Gibbs; what is the optimal scaling (as a function of ) for the step size of the proposal chain, and how does the relaxation time scale?
For the Gibbs sampler this question is straightforward. Consider the one-dimensional case, and take the proposal step increments to be Normal. Then (under technical conditions on – we omit technical conditions here and in Theorem 11.3) the Gibbs chain will have some finite relaxation time depending on and , and choosing the optimal gives a relaxation time , say. The Gibbs sampler chain in which we choose a random coordinate and propose changing only that coordinate (using the optimal above) is a product chain in the sense of Chapter 4 section 6.2 (yyy 10/11/94 version), and so the relaxation time of this product chain is .
Though the argument above is very simple, it is unsatisfactory because there is no simple expression for relaxation time as a function of or for the optimal . It turns out that this difficulty is eliminated in the isotropic-proposal Metropolis chain. In the Gibbs sampler above, the variance of the length of a proposed step is , so we retain this property by specifying the steps of the proposal chain to have Normal distribution. One expects the relaxation time to grow linearly in in this setting also. The following result of Roberts et al [292] almost proves this, and has other useful corollaries.
Fix . Let be the Metropolis chain for sampling from product measure on based on a proposal random walk with step distribution Normal. Write for the first coordinate of , and let be this coordinate process speeded up by a factor , for continuous . Suppose has the stationary distribution . Then
(11.9) |
where the limit process is the stationary one-dimensional diffusion
(11.10) |
for standard Brownian motion , where
Moreover, as the proportion of accepted proposals in the stationary chain tends to .
We outline the proof in section 11.5.3. The result may look complicated, so one piece of background may be helpful. Given a probability distribution on the integers, there is a Metropolis chain for sampling from it based on the simple random walk proposal chain. As a continuous-space analog, given a density on there is a “Metropolis diffusion” with stationary density based on (for arbitrary constant ) as “proposal diffusion”, and this Metropolis diffusion is exactly the diffusion (11.10): see Notes to (yyy final Chapter).
Thus the appearance of the limit diffusion is not unexpected; what is important is the explicit formula for in terms of and . Note that the parameter affects the process only as a speed parameter. That is, if is the process (11.10) with then the general process can be represented as . In particular, the relaxation time scales as . Thus we seek to maximize as a function of the underlying step variance , and a simple numerical calculation shows this is maximized by taking , giving .
Thus Theorem 11.3 suggests that for the Metropolis chain , the optimal variance is , and suggests that the relaxation time scales as
(11.11) |
In writing (11.11) we are pretending that the Metropolis chain is a product chain (so that its relaxation time is the relaxation time of its individual components) and that relaxation time can be passed to the limit in (11.9). Making a rigorous proof of (11.11) seems hard.
Continuing the discussion above, Theorem 11.3 says that the long-run proportion of proposed moves which are accepted is . At the optimal value we find this proportion is a “pure number” , which does not depend on . To quote [292]
This result gives rise to the useful heuristic for random walk Metropolis in practice:
Tune the proposal variance so that the average acceptance rate is roughly .
We call this the diffusion heuristic for proposal-step scaling. Intuitively one might hope that the heuristic would be effective for fairly general unimodel target densities on , though it clearly has nothing to say about the problem of passage between modes in a multimodal target. Note also that to invoke the diffusion heuristic in a combinatorial setting, where the proposal chain is random walk on a graph, one needs to assume that the target distribution is “smooth” in the sense that for a typical edge . In this case one can make a Metropolis chain in which the proposal chain jumps edges in one step, and seek to optimize . See Roberts [294] for some analysis in the context of smooth distributions on the -cube. However, such smoothness assumptions seem inapplicable to most practical combinatorial MCMC problems.
Write a typical step of the proposal chain as
Write
The step is accepted with probability . So the increment of the first coordinate of the Metropolis chain has mean and mean-square and . The essential issue in the proof is to show that, for “typical” values of ,
(11.12) | |||||
(11.13) |
This identifies the asymptotic drift and variance rates of with those of .
Write . Since
the desired estimates (11.12,11.13) can be rewritten as
(11.14) | |||||
(11.15) |
Now if has Normal distribution then for sufficiently regular we have
Since has approximately Normal distribution, proving (11.14,11.15) reduces to proving
(11.16) | |||||
(11.17) |
We shall argue
(11.18) |
Taking the first two terms in the expansion of gives
Write . Summing the previous approximation over , the first sum on the right has approximately Normal distribution, and (using the weighted law of large numbers) the second term is approximately . So the distribution of is approximately Normal. But by the law of large numbers, for a typical drawn from the product distribution we have , giving (11.18).