Section 4.3.1. The definition of involves the idea of a stopping time such that has distribution and is independent of the starting position. This idea is central to the standard modern theory of Harris-recurrent Markov chains, i.e. chains on continuous space which mimic the asymptotic behavior of discrete recurrent chains, and does not require reversibility. See [133] sec. 5.6 for an introduction, and [35, 263] for more comprehensive treatments. In that field, researchers have usually been content to obtain some finite bound on , and haven’t faced up to our issue of the quantitative dependence of the bound on the underlying chain.
Separation and strong stationary times were introduced in Aldous and Diaconis [8], who gave some basic theory. These constructions can be used to bound convergence times in examples, but in practice are used in examples with much special structure, e.g. non-necessarily-symmetric random walks on groups. Examples can be found in [7, 8] and Matthews [256]. Development of theory, mostly for stochastically monotone chains on -dimensional state space, is in Diaconis and Fill [112, 113], Fill [150, 151] and Matthews [260].
The recurrent balayage theorem (Chapter 1 yyy) can be combined with the mean hitting time formula to get
(4.47) |
Curiously, this elegant result doesn’t seem to help much with the inequalities in Theorem 4.6.
What happens with the -family of parameters for general chains remains rather obscure. Some counter-examples to equivalence, and weaker inequalities containing factors, can be found in [12]. Recently, Lovasz and Winkler [240] initiated a detailed study of for general chains which promises to shed more light on this question.
Our choice of as the “representative” of the family of ’s is somewhat arbitrary. One motivation was that it gives the constant “1” in the inequality . It would be interesting to know whether the constants in other basic inequalities relating the -family to other parameters could be made “1”:
(a) Is ?
(b) Is ?
Much of recent sophisticated theory xxx refs bounds by bounding and appealing to Lemma 4.7(b). But it is not clear whether there is an analog of Theorem 4.6 relating the -threshold to other quantities.
Section 4.3.2. The parts of Theorem 4.6 involving and are implicit rather than explicit in [12]. That paper had an unnecessarily complicated proof of Lemma 4.13. The proof of (4.15) in [12] gives a constant . It would be interesting to obtain a smaller constant! Failing this, a small constant in the inequality would be desirable. As a weaker result, it is easy to show
(4.48) |
which has some relevance to later examples (yyy).
Section 4.3.3. The analog of Open Problem 4.17 in which we measure distance from stationarity by instead of is straightforward, using the “CM proxy” property of discrete time chains:
Open Problem 4.17 itself seems deeper, though the weaker form in which we require only that can probably be proved by translating the proof of (4.15) into discrete time and using the CM proxy property.
Section 4.3.4. The cat-and-mouse game was treated briefly in Aleliunas et al [25], who gave a bare-hands proof of a result like (4.21). Variations in which the cat is also allowed an arbitrary strategy have been called “princess and monster” games – see Isaacs [190] for results in a different setting.
Section 4.3.5. Sinclair [307] points out that “hard” results of Leighton and Rao [224] on multicommodity flow imply
(4.49) |
This follows from Corollary 4.22 and Lemma 4.23 when is uniform, but Sinclair posed
Section 4.4. As an example of historical interest, before this topic became popular Fiedler [147] proved
For random walk on a -vertex weighted graph where the stationary distribution is uniform,
where is the minimum cut defined at (4.4).
This upper bound is sharp. On the other hand, Proposition 4.2 gave the same upper bound (up to the numerical constant) for the a priori larger quantity , and so is essentially a stronger result.
Section 4.4.1. In the non-reversible case the definition of the maximal correlation makes sense, and there is similar asymptotic behavior:
where is the “spectral gap”. But we cannot pull back from asymptotia to the real world so easily: it is not true that can be bounded by for universal . A dramatic example from Aldous [17] section 4 has for each an -state chain with spectral gap bounded away from but with also bounded away from , instead of being exponentially small. So implicit claims in the literature that estimates of the spectral gap for general chains have implications for finite-time behavior should be treated with extreme skepticism.
It is not surprising that the classical Berry-Esseen Theorem for i.i.d. sums ([133] Thm. 2.4.10) has an analog for chains. Write for the asymptotic variance rate in Proposition 4.29 and write for a standard Normal r.v.
There is a constant , depending on the chain, such that
for all and all standardized .
This result is usually stated for infinite-state chains satisfying various mixing conditions, which are automatically satisfied by finite chains. See e.g. Bolthausen [55]. At first sight the constant depends on the function as well as the chain, but a finiteness argument shows that the dependence on can be removed. Unfortunately the usual proofs don’t give any useful indications of how depends on the chain, and so don’t help with Open Problem 4.30.
The variance results in Proposition 4.29 are presumably classical, being straightforward consequences of the spectral representation. Their use in algorithmic settings such as Corollary 4.31 goes back at least to [16].
Section 4.4.3. Systematic study of the optimal choice of weights in the Cauchy-Schwarz argument for Theorem 4.32 may lead to improved bounds in examples. Alan Sokal has unpublished notes on this subject.
Section 4.5.1. The quantity , or rather this quantity with the alternate definition of mentioned in the text, has been called conductance. I avoid that term, which invites unnecessary confusion with the electrical network terminology. However, the subscript can be regarded as standing for “Cheeger” or “conductance”.
In connection with Open Problem 4.38 we mention the following result. Suppose that in the definition (section 4.4.1) of the maximal correlation function we considered only events, i.e. suppose we defined
Then , but in fact the two definitions are equivalent in the sense that there is a universal function as such that . This is a result about “measures of dependence” which has nothing to do with Markovianness – see e.g. Bradley et al [59].
Section 4.5.2. The history of Cheeger-type inequalities up to 1987 is discussed in [223] section 6. Briefly, Cheeger [87] proved a lower bound for the eigenvalues of the Laplacian on a compact Riemannian manifold, and this idea was subsequently adapted to different settings – in particular, by Alon [29] to the relationship between eigenvalues and expansion properties of graphs. Lawler and Sokal [223], and independently Jerrum and Sinclair [309], were the first to discuss the relationship between and at the level of reversible Markov chains. Their work was modified by Diaconis and Stroock [122], whose proof we followed for Lemmas 4.39 and 4.41. The only novelty in my presentation is talking explicitly about quasistationary distributions, which makes the relationships easier to follow.