Textbooks. Almost all the results have long been known to (different groups of) experts, but it has not been easy to find accessible textbook treatments. Of the three books on reversible chains at roughly the same level of sophistication as ours, Kelly [213] emphasizes stationary distributions of stochastic networks; Keilson [212] emphasizes mathematical properties such as complete monotonicity; and Chen [88] discusses those aspects useful in the study of interacting particle systems.
Section 3.1. In abstract settings reversible chains are called symmetrizable, but that’s a much less evocative term. Elementary textbooks often give Kolmogorov’s criterion ([213] Thm 1.7) for reversibility, but we have never found it to be useful.
The ††margin: I put a pointer from the first section to here - DA following figure may be helpful in seeing why for a general reversible chain, even if is uniform. Run such a chain for and record only the times when the chain is in state or state . Then is the long-run empirical average of the passage times from to , indicated by arrows ; and is the long-run empirical average of the passage times from to in the reverse time direction, indicated by arrows . One might think these two quantities were averaging the same empirical intervals, but a glance at the figure shows they are not.
Section 3.1.1. Though probabilists would regard the “cyclic tour” Lemma 3.2 as obvious, László Lovász pointed out a complication, that with a careful definition of starts and ends of tours these times are not invariant under time-reversal. The sophisticated fix is to use doubly-infinite stationary chains and observe that tours in reversed time just interleave tours in forward time, so by ergodicity their asymptotic rates are equal. Tetali [326] shows that the cyclic tour property implies reversibility. Tanushev and Arratia [321] show that the distributions of forward and reverse tour times are equal.
Cat-and-mouse game 1 is treated more opaquely in Coppersmith et al [99], whose deeper results are discussed in Chapter 9 ††margin: 9/1/99 versionSection 4.4. Underlying the use of the optional sampling theorem in game 2 is a general result about optimal stopping, but it’s much easier to prove what we want here than to appeal to general theory. Several algorithmic variations on Proposition 3.3 are discussed in Coppersmith et al [101] and Tetali and Winkler [324].
Section 3.2. Many textbooks on Markov chains note the simple explicit form of the stationary distribution for random walks on graphs. An historical note (taken from [107]) is that the first explicit treatment of random walk on a general finite graph was apparently given in 1935 by Bottema [58], who proved the convergence theorem. Amongst subsequent papers specializing Markov theory to random walks on graphs let us mention Gobel and Jagers [168], which contains a variety of the more elementary facts given in this book, for instance the unweighted version of Lemma 3.9. Another observation from [168] is that for a reversible chain the quantity
satisfies . Indeed, by Chapter 2 ††margin: 9/10/99 versionLemma 9 we have
and so the result follows from the cyclic tour property.
Just as random walks on undirected graphs are as general as reversible Markov chains, so random walks on directed graphs are as general as general Markov chains. In particular, one usually has no simple expression like (3.15) for the stationary distribution. The one tractable case is a balanced directed graph, where the in-degree of each vertex equals its out-degree.
Section 3.2.1. Yet another way to associate a continuous-time reversible chain with a weighted graph is to set . This construction was used by Chung and Yau [93] as the simplest way to set up discrete analogs of certain results from differential geometry.
Another interpretation of continuous-time random walk on a weighted graph is to write and interpret as edge-length. Then run Brownian motion on the edges of the graph. Starting from vertex , the chance that is the first vertex other than visited is , so the embedded discrete chain is the usual discrete random walk. This construction could be used as an intermediate step in the context of approximating Brownian motion on a manifold by random walk on a graph embedded in the manifold.
Section 3.3. Doyle and Snell [131] gave a detailed elementary textbook exposition of Proposition 3.10 and the whole random walk / electrical network connection. Previous brief textbook accounts were given by Kemeny et al [214] and Kelly [213]. Our development follows closely that of Chapter 2 in Lyons and Peres [249]. As mentioned in the text, the first explicit use (known to us) of the mean commute interpretation was given by Chandra et al [85]. One can combine the commute formula with the general identities of Chapter 2 ††margin: 9/10/99 versionto obtain numerous identities relating mean hitting times and resistances, some of which are given (using bare-hands proofs instead) in Tetali [325]. The connection between Foster’s theorem and Lemma 3.9 was noted in [99].
Section 3.4. The spectral theory is of course classical. In devising a symmetric matrix one could use or instead of —there doesn’t seem any systematic advantage to a particular choice. We learned the eigentime identity from Andrei Broder who used it in [65], and Lemma 3.15 from David Zuckerman who used it in [341]. Apparently no-one has studied whether Lemma 3.15 holds for general chains. Mark Brown (personal communication) has noted several variations on the theme of Lemma 3.15, for example that the unweighted average of is bounded by the unweighted average of . The name eigentime identity is our own coinage: once we call the relaxation time it is natural to start thinking of the other as “eigentimes”.
Section 3.5. We regard complete monotonicity as a name for “mixtures of exponentials”, and have not used the analytic characterization via derivatives of alternating signs. Of course the CM property is implicit in much analysis of reversible Markov processes, but we find it helpful to exhibit explicitly its use in obtaining inequalities. This idea in general, and in particular the “stochastic ordering of exit times” result (Proposition 3.21), were first emphasized by Keilson [212] in the context of reliability and queueing models. Brown [73] gives other interesting consequences of monotonicity.
Section 3.5.1. Parts of Proposition 3.18 have been given by several authors, e.g., Broder and Karlin [65] Corollary 18 give (3.53). One can invent many variations. Consider for instance . On the complete graph this equals , but this is not the minimum value, as observed by Erik Ordentlich in a homework exercise. If we take the complete graph, distinguish a vertex , let the edges involving have weight and the other edges have weight , then as we have (for )
By the random target lemma and (3.53), the quantity under consideration is at least , so the example is close to optimal.
Section 3.5.4. The simple result quoted as Proposition 3.22 is actually weaker than the result proved in Brown [72]. The ideas in the proof of Proposition 3.23 are in Aldous [18] and in Brown [73], the latter containing a shorter Laplace transform argument for (3.73). Aldous and Brown [6] give a more detailed account of the exponential approximation, including the following result which is useful in precisely the situation where Proposition 3.23 is applicable, that is, when is large compared to .
Let be the quasistationary distribution on defined at (3.82). Then
Using this requires only a lower bound on , which can often be obtained using the extremal characterization (3.84). Connections with “interleaving of eigenvalues” results are discussed in Brown [74].
For general chains, explicit bounds on exponential approximation are much messier: see Aldous [11] for a bound based upon total variation mixing and Iscoe and McDonald [193] for a bound involving spectral gaps.
Section 3.6.1. Dirichlet forms were developed for use with continuous-space continuous-time Markov processes, where existence and uniqueness questions can be technically difficult—see, e.g., Fukushima [159]. Their use subsequently trickled down to the discrete world, influenced, e.g., by the paper of Diaconis and Stroock [122]. Chen [88] is the most accessible introduction.
Section 3.6.2. Since mean commute times have two dual extremal characterizations, as sups over potential functions and as infs over flows, it is natural to ask
Does there exist a characterization of the relaxation time as exactly an inf over flows?
We will see in Chapter 4 ††margin: 10/11/94 versionTheorem 32 an inequality giving an upper bound on the relaxation time in terms of an inf over flows, but it would be more elegant to derive such inequalities from some exact characterization.
Section 3.6.4. Lemma 3.32 is sometimes used to show, by comparison with the i.i.d. chain,
But this is inefficient: direct use of submultiplicitivity of variation distance gives a stronger conclusion.
Section 3.6.5. Quasistationary distributions for general chains have long been studied in applied probability, but the topic lacks a good survey article. Corollary 3.34 is a good example of a repeatedly-rediscovered simple-yet-useful result which defies attempts at attribution.
Kahale [203] Corollary 6.1 gives a discrete time variant of Lemma 3.35, that is to say an upper bound on , and Alon et al [26] Proposition 2.4 give a lower bound in terms of the smallest eigenvalue (both results are phrased in the context of random walk on an undirected graph). ††margin: details in previous version now deleted In studying bounds on such as Lemma 3.35 we usually have in mind that is small. One is sometimes interested in exit times from a set with small, i.e., hitting times on where is near . In this setting one can replace inequalities using or (parameters which involve the whole chain) by inequalities involving analogous parameters for the chain restricted to and its boundary. See Babai [36] for uses of such bounds.
Section 3.7.1. Use of Thompson’s principle and the Dirichlet principle to study transience / recurrence of countably infinite state space chains is given an elementary treatment in Doyle and Snell [131] and more technical treatments in papers of Nash-Williams [266], Griffeath and Liggett [171] and Lyons [251]. Some reformulations of Thompson’s principle are discussed by Berman and Konsowa [45].
We learned about the work of Borre and Meissl [57] on leveling networks from Persi Diaconis. Here is another characterization of effective resistance that is of a similar spirit. Given a weighted graph, assign independent Normal random variables to the edges , with . Then condition on the event that the sum around any cycle vanishes. The conditional process is still Gaussian. Fix a reference vertex and for each vertex let be the sum of the -values along a path from to . (The choice of path doesn’t matter, because of the conditioning event.) Then (obviously) is mean-zero Normal but (not obviously) its variance is the effective resistance between and . This is discussed and proved in Janson [194] Section 9.4.
Section 3.7.2. We have never seen in the literature an explicit statement of the extremal characterizations for mean hitting times from a stationary start (Proposition 3.41), but these are undoubtedly folklore, at least in the “potential” form. Iscoe et al [192] implicitly contains the analogous characterization ††margin: Initial distribution is indeed – yes (DA) of . Steve Evans once showed us an argument for Corollary 3.42 based on the usual Dirichlet principle, and that motivated us to present the “natural explanation” given by Proposition 3.41.