3 Reversible Markov Chains (September 10, 2002)

3.8 Notes on Chapter 3

Textbooks. Almost all the results have long been known to (different groups of) experts, but it has not been easy to find accessible textbook treatments. Of the three books on reversible chains at roughly the same level of sophistication as ours, Kelly [213] emphasizes stationary distributions of stochastic networks; Keilson [212] emphasizes mathematical properties such as complete monotonicity; and Chen [88] discusses those aspects useful in the study of interacting particle systems.

Section 3.1. In abstract settings reversible chains are called symmetrizable, but that’s a much less evocative term. Elementary textbooks often give Kolmogorov’s criterion ([213] Thm 1.7) for reversibility, but we have never found it to be useful.

The margin: I put a pointer from the first section to here - DA following figure may be helpful in seeing why πiEiTjπjEjTi\pi_{i}E_{i}T_{j}\neq\pi_{j}E_{j}T_{i} for a general reversible chain, even if π\pi is uniform. Run such a chain (Xt)(X_{t}) for -<t<-\infty<t<\infty and record only the times when the chain is in state ii or state jj. Then EiTjE_{i}T_{j} is the long-run empirical average of the passage times from ii to jj, indicated by arrows \longrightarrow; and EjTiE_{j}T_{i} is the long-run empirical average of the passage times from jj to ii in the reverse time direction, indicated by arrows \longleftarrow. One might think these two quantities were averaging the same empirical intervals, but a glance at the figure shows they are not.


Section 3.1.1. Though probabilists would regard the “cyclic tour” Lemma 3.2 as obvious, László Lovász pointed out a complication, that with a careful definition of starts and ends of tours these times are not invariant under time-reversal. The sophisticated fix is to use doubly-infinite stationary chains and observe that tours in reversed time just interleave tours in forward time, so by ergodicity their asymptotic rates are equal. Tetali [326] shows that the cyclic tour property implies reversibility. Tanushev and Arratia [321] show that the distributions of forward and reverse tour times are equal.

Cat-and-mouse game 1 is treated more opaquely in Coppersmith et al [99], whose deeper results are discussed in Chapter 9 margin: 9/1/99 versionSection 4.4. Underlying the use of the optional sampling theorem in game 2 is a general result about optimal stopping, but it’s much easier to prove what we want here than to appeal to general theory. Several algorithmic variations on Proposition 3.3 are discussed in Coppersmith et al [101] and Tetali and Winkler [324].

Section 3.2. Many textbooks on Markov chains note the simple explicit form of the stationary distribution for random walks on graphs. An historical note (taken from [107]) is that the first explicit treatment of random walk on a general finite graph was apparently given in 1935 by Bottema [58], who proved the convergence theorem. Amongst subsequent papers specializing Markov theory to random walks on graphs let us mention Gobel and Jagers [168], which contains a variety of the more elementary facts given in this book, for instance the unweighted version of Lemma 3.9. Another observation from [168] is that for a reversible chain the quantity

βijlπj-1Ei(number of visits to jj before time TlT_{l})\beta_{ijl}\equiv\pi_{j}^{-1}E_{i}(\mbox{number of visits to~{}$j$ before time% ~{}$T_{l}$})

satisfies βijl=βjil\beta_{ijl}=\beta_{jil}. Indeed, by Chapter 2 margin: 9/10/99 versionLemma 9 we have


and so the result follows from the cyclic tour property.

Just as random walks on undirected graphs are as general as reversible Markov chains, so random walks on directed graphs are as general as general Markov chains. In particular, one usually has no simple expression like (3.15) for the stationary distribution. The one tractable case is a balanced directed graph, where the in-degree dvd_{v} of each vertex vv equals its out-degree.

Section 3.2.1. Yet another way to associate a continuous-time reversible chain with a weighted graph is to set qij=wij/wiwjq_{ij}=w_{ij}/\sqrt{w_{i}w_{j}}. This construction was used by Chung and Yau [93] as the simplest way to set up discrete analogs of certain results from differential geometry.

Another interpretation of continuous-time random walk on a weighted graph is to write wij=1/Lijw_{ij}=1/L_{ij} and interpret LijL_{ij} as edge-length. Then run Brownian motion on the edges of the graph. Starting from vertex ii, the chance that jj is the first vertex other than ii visited is wij/kwikw_{ij}/\sum_{k}w_{ik}, so the embedded discrete chain is the usual discrete random walk. This construction could be used as an intermediate step in the context of approximating Brownian motion on a manifold by random walk on a graph embedded in the manifold.

Section 3.3. Doyle and Snell [131] gave a detailed elementary textbook exposition of Proposition 3.10 and the whole random walk / electrical network connection. Previous brief textbook accounts were given by Kemeny et al [214] and Kelly [213]. Our development follows closely that of Chapter 2 in Lyons and Peres [249]. As mentioned in the text, the first explicit use (known to us) of the mean commute interpretation was given by Chandra et al [85]. One can combine the commute formula with the general identities of Chapter 2 margin: 9/10/99 versionto obtain numerous identities relating mean hitting times and resistances, some of which are given (using bare-hands proofs instead) in Tetali [325]. The connection between Foster’s theorem and Lemma 3.9 was noted in [99].

Section 3.4. The spectral theory is of course classical. In devising a symmetric matrix one could use πipij\pi_{i}p_{ij} or pijπj-1p_{ij}\pi^{-1}_{j} instead of πi1/2pijπj-1/2\pi^{1/2}_{i}p_{ij}\pi^{-1/2}_{j}—there doesn’t seem any systematic advantage to a particular choice. We learned the eigentime identity from Andrei Broder who used it in [65], and Lemma 3.15 from David Zuckerman who used it in [341]. Apparently no-one has studied whether Lemma 3.15 holds for general chains. Mark Brown (personal communication) has noted several variations on the theme of Lemma 3.15, for example that the unweighted average of (EiTj;i,jA)(E_{i}T_{j};i,j\in A) is bounded by the unweighted average of (EπTj;jA)(E_{\pi}T_{j};j\in A). The name eigentime identity is our own coinage: once we call 1/λ21/\lambda_{2} the relaxation time it is natural to start thinking of the other 1/λm1/\lambda_{m} as “eigentimes”.

Section 3.5. We regard complete monotonicity as a name for “mixtures of exponentials”, and have not used the analytic characterization via derivatives of alternating signs. Of course the CM property is implicit in much analysis of reversible Markov processes, but we find it helpful to exhibit explicitly its use in obtaining inequalities. This idea in general, and in particular the “stochastic ordering of exit times” result (Proposition 3.21), were first emphasized by Keilson [212] in the context of reliability and queueing models. Brown [73] gives other interesting consequences of monotonicity.

Section 3.5.1. Parts of Proposition 3.18 have been given by several authors, e.g., Broder and Karlin [65] Corollary 18 give (3.53). One can invent many variations. Consider for instance minimaxjEiTj\min_{i}\max_{j}E_{i}T_{j}. On the complete graph this equals n-1n-1, but this is not the minimum value, as observed by Erik Ordentlich in a homework exercise. If we take the complete graph, distinguish a vertex i0i_{0}, let the edges involving i0i_{0} have weight ε\varepsilon and the other edges have weight 11, then as ε0\varepsilon\rightarrow 0 we have (for ji0j\neq i_{0})

Ei0Tj1n-1+n-2n-1(1+(n-2))=n-2+1n-1.E_{i_{0}}T_{j}\rightarrow{\textstyle\frac{1}{n-1}}+{\textstyle\frac{n-2}{n-1}}% (1+(n-2))=n-2+{\textstyle\frac{1}{n-1}}.

By the random target lemma and (3.53), the quantity under consideration is at least τ0n-2+1n\tau_{0}\geq n-2+\frac{1}{n}, so the example is close to optimal.

Section 3.5.4. The simple result quoted as Proposition 3.22 is actually weaker than the result proved in Brown [72]. The ideas in the proof of Proposition 3.23 are in Aldous [18] and in Brown [73], the latter containing a shorter Laplace transform argument for (3.73). Aldous and Brown [6] give a more detailed account of the exponential approximation, including the following result which is useful in precisely the situation where Proposition 3.23 is applicable, that is, when EπTAE_{\pi}T_{A} is large compared to τ2\tau_{2}.

Theorem 3.43

Let αA\alpha_{A} be the quasistationary distribution on AcA^{c} defined at (3.82). Then

Pπ(TA>t)\displaystyle P_{\pi}(T_{A}>t) \displaystyle\geq (1-τ2EαATA)exp(-tEαATA), t>0\displaystyle\left(1-\frac{\tau_{2}}{E_{\alpha_{A}}T_{A}}\right)\exp\left(% \frac{-t}{E_{\alpha_{A}}T_{A}}\right),\ \ t>0
EπTA\displaystyle E_{\pi}T_{A} \displaystyle\geq EαATA-τ2.\displaystyle E_{\alpha_{A}}T_{A}-\tau_{2}.

Using this requires only a lower bound on EαTAE_{\alpha}T_{A}, which can often be obtained using the extremal characterization (3.84). Connections with “interleaving of eigenvalues” results are discussed in Brown [74].

For general chains, explicit bounds on exponential approximation are much messier: see Aldous [11] for a bound based upon total variation mixing and Iscoe and McDonald [193] for a bound involving spectral gaps.

Section 3.6.1. Dirichlet forms were developed for use with continuous-space continuous-time Markov processes, where existence and uniqueness questions can be technically difficult—see, e.g., Fukushima [159]. Their use subsequently trickled down to the discrete world, influenced, e.g., by the paper of Diaconis and Stroock [122]. Chen [88] is the most accessible introduction.

Section 3.6.2. Since mean commute times have two dual extremal characterizations, as sups over potential functions and as infs over flows, it is natural to ask

Open Problem 3.44

Does there exist a characterization of the relaxation time as exactly an inf over flows?

We will see in Chapter 4 margin: 10/11/94 versionTheorem 32 an inequality giving an upper bound on the relaxation time in terms of an inf over flows, but it would be more elegant to derive such inequalities from some exact characterization.

Section 3.6.4. Lemma 3.32 is sometimes used to show, by comparison with the i.i.d. chain,

if mini,j:ij(pij/πj)=δ>0 then τ2δ-1.\mbox{if\ }\min_{i,j:i\neq j}(p_{ij}/\pi_{j})=\delta>0\mbox{\ then\ }\tau_{2}% \leq\delta^{-1}.

But this is inefficient: direct use of submultiplicitivity of variation distance gives a stronger conclusion.

Section 3.6.5. Quasistationary distributions for general chains have long been studied in applied probability, but the topic lacks a good survey article. Corollary 3.34 is a good example of a repeatedly-rediscovered simple-yet-useful result which defies attempts at attribution.

Kahale [203] Corollary 6.1 gives a discrete time variant of Lemma 3.35, that is to say an upper bound on Pπ(TAt)P_{\pi}(T_{A}\geq t), and Alon et al [26] Proposition 2.4 give a lower bound in terms of the smallest eigenvalue (both results are phrased in the context of random walk on an undirected graph). margin: details in previous version now deleted In studying bounds on TAT_{A} such as Lemma 3.35 we usually have in mind that π(A)\pi(A) is small. One is sometimes interested in exit times from a set AA with π(A)\pi(A) small, i.e., hitting times on AcA^{c} where π(Ac)\pi(A^{c}) is near 11. In this setting one can replace inequalities using τ2\tau_{2} or τc\tau_{c} (parameters which involve the whole chain) by inequalities involving analogous parameters for the chain restricted to AA and its boundary. See Babai [36] for uses of such bounds.

Section 3.7.1. Use of Thompson’s principle and the Dirichlet principle to study transience / recurrence of countably infinite state space chains is given an elementary treatment in Doyle and Snell [131] and more technical treatments in papers of Nash-Williams [266], Griffeath and Liggett [171] and Lyons [251]. Some reformulations of Thompson’s principle are discussed by Berman and Konsowa [45].

We learned about the work of Borre and Meissl [57] on leveling networks from Persi Diaconis. Here is another characterization of effective resistance that is of a similar spirit. Given a weighted graph, assign independent Normal(0,wij)(0,w_{ij}) random variables XijX_{ij} to the edges (i,j)(i,j), with Xji=-XijX_{ji}=-X_{ij}. Then condition on the event that the sum around any cycle vanishes. The conditional process (Yij)(Y_{ij}) is still Gaussian. Fix a reference vertex v*v_{*} and for each vertex vv let SvS_{v} be the sum of the YY-values along a path from v*v_{*} to vv. (The choice of path doesn’t matter, because of the conditioning event.) Then (obviously) SvS_{v} is mean-zero Normal but (not obviously) its variance is the effective resistance between v*v_{*} and vv. This is discussed and proved in Janson [194] Section 9.4.

Section 3.7.2. We have never seen in the literature an explicit statement of the extremal characterizations for mean hitting times from a stationary start (Proposition 3.41), but these are undoubtedly folklore, at least in the “potential” form. Iscoe et al [192] implicitly contains the analogous characterization margin: Initial distribution is indeed π\pi – yes (DA) of Eπexp(-θTA)E_{\pi}\exp(-\theta T_{A}). Steve Evans once showed us an argument for Corollary 3.42 based on the usual Dirichlet principle, and that motivated us to present the “natural explanation” given by Proposition 3.41.