11 Markov Chain Monte Carlo (January 8 2001)

11.7 Notes on Chapter MCMC

Liu [237] provides a nice combination of examples and carefully-described methodology in MCMC, emphasizing statistical applications but also covering some statistical physics. Other statistically-oriented books include [91, 165, 290]. We should reiterate that most MCMC “design” ideas originated in statistical physics; see the extensive discussion by Sokal [311]. Neal [267] focuses on neural nets but contains useful discussion of MCMC variants.

Section 11.1.1. In the single-run setting, the variance of sample means (11.3) could be estimated by classical methods of time series [63].

The phrase metastability error is our coinage – though the idea is standard, there seems no standard phrase.

Elaborations of the multiple-runs method are discussed by Gelman and Rubin [161]. The applied literature has paid much attention to diagnostics: for reviews see Cowles and Carlin [102] or Robert [291].

Section 11.1.2. Devroye [111] gives the classical theory of sampling from one-dimensional and other specific distributions.

Section 11.2.1. The phrase “Metropolis algorithm” is useful shorthand for “MCMC sampling, where the Markov chain is based on a proposal-acceptance scheme like those in section 11.2.1”. The idea comes from the 1953 paper by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller [262] in the context of statistical physics, and the variant with general proposal matrix is from the 1970 paper of Hastings [180]. Of course the word “algorithm” means a definite rule for attaining some goal; the arbitraryness of proposal matrix, and vagueness about when to stop, makes it an extreme stretch to use the word for the Metropolis scheme.

The map KPK\to P in the Metropolis-Hastings construction (11.5) has an interpretation as a minimum-length projection in a certain L1L^{1} space of matrices – see Billera and Diaconis [50].

Section 11.2.2. The Gibbs sampler was popularized in 1984 by Geman and Geman [162] in the context of Bayesian image analysis. The idea is older in statistical physics, under the name heat bath. Hit-and-run was introduced in 1984 by Smith [310]. General line-sampling schemes go back to Goodman and Sokal [170].

Section 11.3.1. Terminology for this type of construction is not standard. What we call “Metropolized line sampling” is what Besag and Greene [46] call an auxiliary variable construction, and this type of construction goes back to Edwards and Sokal [139] in statistical physics.

Section 11.3.2. One can also define MTM using a general proposal matrix KK [235], though (in contrast to Metropolis) the specialization of the general case to the symmetric case is different from the symmetric case described in the text. Liu et al [235] discuss the use of MTM as an ingredient in other variations of MCMC.

Other MCMC variations. In statistical physics, it is natural to think of particles in RdR^{d} having position and velocity. This suggests MCMC schemes in which velocity is introduced as an auxiliary variable. In particular one can use deterministic equations of motion to generate proposal steps for Metropolis, an idea called hybrid Monte Carlo – see Neal [268].

Section 11.4. The survey by Diaconis and Saloff-Coste [119] has further pieces of theory, emphasizing the low-dimensional discrete setting. For target densities on RdR^{d} one needs some regularity conditions to ensure τ2\tau_{2} is finite; see Roberts and Tweedie [293] for results of this type.

Section 11.4.1. As background to Peskun’s theorem, one might think (by vague physical analogy) that it would be desirable to have acceptance probabilities behave as some “smooth” function; e.g. in the symmetric-proposal case, instead of min(1,πy/πx)\min(1,\pi_{y}/\pi_{x}) take πyπx+πy\frac{\pi_{y}}{\pi_{x}+\pi_{y}}. Lemma 11.1 shows this intuition is wrong, at least using asymptotic variance rate or relaxation time as a criterion. Liu [237] section 12.3 gives further instances where Peskun’s Theorem can be applied. As usual, it is hard to do such comparison arguments for τ1\tau_{1}.

Section 11.4.2. The coupling here is an instance of a one-dimensional monotone coupling, which exists for any stochastically monotone chain.

Section 11.5.2. Discussion of practical aspects of the diffusion heuristic can be found in Roberts et al [160], and discussion in the more complicated setting of Gibbs distributions of (Xv;vZd)(X_{v};v\in Z^{d}) is in Breyer and Roberts [60].