9 A Second Look at General Markov Chains (April 21, 1995)

9.7 Notes on Chapter 9

Section 9.1. The idea of a maximal coupling goes back to Goldstein [169]: see Lindvall [234] for further history. Strong stationary times were studied in detail by Diaconis - Fill [113, 112] and Fill [150, 151], with particular attention to the case of one-dimensional stochastically monotone chains where there is some interesting “duality” theory. The special case of random walks on groups had previously been studied in Aldous - Diaconis [7, 8], and the idea is implicit in the regenerative approach to time-asymptotics for general state space chains, discussed at xxx. The theory surrounding Theorem 9.4 goes back to Rost [301]. This is normally regarded as part of the potential theory of Markov chains, which emphasizes analogous results in the transient setting, and the recurrent case is rather a sideline in that setting. See Revuz [289] sec. 2.5 or Dellacherie - Meyer [108] Chapter 9 sec. 3 for textbook treatments in the general-space setting. The observation that the theory applied in simple finite examples such as those in section 9.1.3 was made in Lovasz - Winkler [240], from whom we borrowed the phrase halting state. Monotonicity properties like that in the statement of Corollary 9.5 were studied in detail by Brown [73] from the viewpoint of approximate exponentiality of hitting times.

Section 9.2. A slightly more sophisticated and extensive textbook treatment of these topics is in Lyons [249]. The nomenclature reflects my taste: Theorem 9.10 is “the underlying theorem” which implies “the formula” for the stationary distribution in terms of weighted spanning trees. Different textbooks (e.g. [178] p. 340 xxx more refs) give rather different historical citations for the Markov chain tree formula, and in talks I often call it “the most often rediscovered result in probability theory”: it would be an interesting project to track down the earliest explicit statement. Of course it can be viewed as part of a circle of ideas (including the matrix-tree theorem for the number of spanning trees in a graph) which is often traced back to Kirchoff. The fact that Theorem 9.10 underlies the formula was undoubtably folklore for many years (Diaconis attributes it to Peter Doyle, and indeed it appears in an undergraduate thesis [317] of one of his students), but was apparently not published until the paper of Anantharam and Tsoucas [30]. The fact that the Markov chain tree theorem can be interpreted as an algorithm for generating uniform random spanning trees was observed by Aldous [19] and Broder [70], both deriving from conversations with Diaconis. [19] initiated study of theoretical properties of uniform random spanning trees, proving e.g. the following bounds on the diameter Δ\Delta of the random tree in a regular nn-vertex graph.

n1/2K1τ2lognEΔK2τ21/2n1/2logn\frac{n^{1/2}}{K_{1}\tau_{2}\log n}\leq E\Delta\leq K_{2}\tau_{2}^{1/2}n^{1/2}\log n (9.24)

where K1K_{1} and K2K_{2} are absolute constants. Loosely, “in an expander, a random spanning tree has diameter n1/2±o(1)n^{1/2\pm o(1)}”. Results on asymptotic Poisson distribution for the degrees in a random spanning tree are given in Aldous [19], Pemantle [279] and Pemantle and Burton [83]. Pemantle [278] discusses the analog of uniform random spanning trees on the infinite dd-dimensional lattice, and Aldous and Larget [9] give simulation results on quantitative behavior on the dd-dimensional torus.

Section 9.2.2. As described in Pemantle [279] and Burton and Pemantle [83], the key to deeper study of random spanning trees is

Theorem 9.28 (Transfer-impedance theorem)

Fix a graph GG. There is a symmetric function H(e1,e2)H(e_{1},e_{2}) on pairs of edges in GG such that for any edges (e1,,er)(e_{1},\ldots,e_{r})

P(ei𝐓 for all 1ir)=detM(e1,,er)P(e_{i}\in{\bf T}\mbox{ for all }1\leq i\leq r)={\rm det}\ M(e_{1},\ldots,e_{r})

where M(e1,,er)M(e_{1},\ldots,e_{r}) is the matrix with entries H(ei,ej),1i,jrH(e_{i},e_{j}),1\leq i,j\leq r.

Section 9.3. The first “pure simulation” algorithm for sampling exactly from the stationary distribution was given by Asmussen et al [34], using a quite different idea, and lacking explicit time bounds.

Section 9.3.1. In our discussion of these algorithms, we are assuming that we have a list of all states. Lovasz - Winkler [241] gave the argument in a slightly different setting, where the algorithm can only “address” a single state, and their bound involved maxijEiTj\max_{ij}E_{i}T_{j} in place of τ1*\tau_{1}^{*}.

Section 9.3.3. Letac [229] gives a survey of the “backwards coupling” method for establishing convergence of continuous-space chains: it suffices to show there exists a r.x. X-X^{-\infty} such that X0(x,s)X-X^{(x,s)}_{0}\rightarrow X^{-\infty} a.s. as s-s\rightarrow-\infty, for each state xx. This method is especially useful in treating matrix-valued chains of the form Xt=AtXt-1+BtX_{t}=A_{t}X_{t-1}+B_{t}, where (At,Bt),t1(A_{t},B_{t}),t\geq 1 are i.i.d. random matrices. See Barnsley and Elton [43] for a popular application.

Section 9.4.1. One result on spectra and reversibilizations is the following. For a transition matrix 𝐏{\bf P} write

τ(𝐏)=sup{11-|λ|:  λ1 an eigenvalue of 𝐏}.\tau({\bf P})=\sup\{\frac{1}{1-|\lambda|}:\ \lambda\neq 1\mbox{ an eigenvalue % of }{\bf P}\}.

Then for the additive reversibilization 𝐐(1)=12(𝐏+𝐏*){\bf Q}^{(1)}=\frac{1}{2}({\bf P}+{\bf P}^{*}) we have (e.g. [316] Proposition 1)

τ(𝐏)2τ(𝐐(1)).\tau({\bf P})\leq 2\tau({\bf Q}^{(1)}).