4 Hitting and Convergence Time, and Flow Rate, Parameters for Reversible Markov Chains (October 11, 1994)

4.7 Notes on Chapter 4

Section 4.3.1. The definition of τ1(2)\tau^{(2)}_{1} involves the idea of a stopping time UU such that XUX_{U} has distribution π\pi and is independent of the starting position. This idea is central to the standard modern theory of Harris-recurrent Markov chains, i.e. chains on continuous space which mimic the asymptotic behavior of discrete recurrent chains, and does not require reversibility. See [133] sec. 5.6 for an introduction, and [35, 263] for more comprehensive treatments. In that field, researchers have usually been content to obtain some finite bound on EUEU, and haven’t faced up to our issue of the quantitative dependence of the bound on the underlying chain.

Separation and strong stationary times were introduced in Aldous and Diaconis [8], who gave some basic theory. These constructions can be used to bound convergence times in examples, but in practice are used in examples with much special structure, e.g. non-necessarily-symmetric random walks on groups. Examples can be found in [7, 8] and Matthews [256]. Development of theory, mostly for stochastically monotone chains on 11-dimensional state space, is in Diaconis and Fill [112, 113], Fill [150, 151] and Matthews [260].

The recurrent balayage theorem (Chapter 1 yyy) can be combined with the mean hitting time formula to get

τ1(2)=maxij-Zijπj.\tau_{1}^{(2)}=\max_{ij}\frac{-Z_{ij}}{\pi_{j}}. (4.47)

Curiously, this elegant result doesn’t seem to help much with the inequalities in Theorem 4.6.

What happens with the τ1\tau_{1}-family of parameters for general chains remains rather obscure. Some counter-examples to equivalence, and weaker inequalities containing log1/π*\log 1/\pi_{*} factors, can be found in [12]. Recently, Lovasz and Winkler [240] initiated a detailed study of τ1(2)\tau_{1}^{(2)} for general chains which promises to shed more light on this question.

Our choice of τ1\tau_{1} as the “representative” of the family of τ1(i)\tau_{1}^{(i)}’s is somewhat arbitrary. One motivation was that it gives the constant “1” in the inequality τ2τ1\tau_{2}\leq\tau_{1}. It would be interesting to know whether the constants in other basic inequalities relating the τ1\tau_{1}-family to other parameters could be made “1”:

Open Problem 4.48

(a) Is τ1τ0\tau_{1}\leq\tau_{0}?

(b) Is τ2τ1(2)\tau_{2}\leq\tau_{1}^{(2)}?

Much of recent sophisticated theory xxx refs bounds d(t)d(t) by bounding d^(t)\hat{d}(t) and appealing to Lemma 4.7(b). But it is not clear whether there is an analog of Theorem 4.6 relating the d^\hat{d}-threshold to other quantities.

Section 4.3.2. The parts of Theorem 4.6 involving τ1(1)\tau_{1}^{(1)} and τ1(3)\tau_{1}^{(3)} are implicit rather than explicit in [12]. That paper had an unnecessarily complicated proof of Lemma 4.13. The proof of (4.15) in [12] gives a constant Ke13K\approx e^{13}. It would be interesting to obtain a smaller constant! Failing this, a small constant in the inequality τ1(1)Kτ1(3)\tau_{1}^{(1)}\leq K\tau_{1}^{(3)} would be desirable. As a weaker result, it is easy to show

τ1(1)10minjmaxiEiTj\tau_{1}^{(1)}\leq 10\min_{j}\max_{i}\ E_{i}T_{j} (4.48)

which has some relevance to later examples (yyy).

Section 4.3.3. The analog of Open Problem 4.17 in which we measure distance from stationarity by d^\hat{d} instead of d(t)d(t) is straightforward, using the “CM proxy” property of discrete time chains:

Pi(X2t=i)+Pi(X2t+1=i)0 as t.P_{i}(X_{2t}=i)+P_{i}(X_{2t+1}=i)\downarrow 0\mbox{ as }t\rightarrow\infty.

Open Problem 4.17 itself seems deeper, though the weaker form in which we require only that ϕ(t)=O(t)\phi(t)=O(t) can probably be proved by translating the proof of (4.15) into discrete time and using the CM proxy property.

Section 4.3.4. The cat-and-mouse game was treated briefly in Aleliunas et al [25], who gave a bare-hands proof of a result like (4.21). Variations in which the cat is also allowed an arbitrary strategy have been called “princess and monster” games – see Isaacs [190] for results in a different setting.

Section 4.3.5. Sinclair [307] points out that “hard” results of Leighton and Rao [224] on multicommodity flow imply

inf𝐟ψ(𝐟)Kτ2logn.\inf_{{\bf f}}\psi({\bf f})\leq K\tau_{2}\log n. (4.49)

This follows from Corollary 4.22 and Lemma 4.23 when π\pi is uniform, but Sinclair posed

Open Problem 4.49

(i) Is there a simple proof of (4.49) in general?

(ii) Does (4.49) hold with the diameter Δ\Delta in place of logn\log n?

Section 4.4. As an example of historical interest, before this topic became popular Fiedler [147] proved

Proposition 4.50

For random walk on a nn-vertex weighted graph where the stationary distribution is uniform,

τ2w4ncsin2π2nwnπ2c\tau_{2}\leq\frac{w}{4nc\sin^{2}\frac{\pi}{2n}}\sim\frac{wn}{\pi^{2}c}

where cc is the minimum cut defined at (4.4).

This upper bound is sharp. On the other hand, Proposition 4.2 gave the same upper bound (up to the numerical constant) for the a priori larger quantity τ*\tau^{*}, and so is essentially a stronger result.

Section 4.4.1. In the non-reversible case the definition of the maximal correlation ρ(t)\rho(t) makes sense, and there is similar asymptotic behavior:

ρ(t)cexp(-λt) as t\rho(t)\sim c\exp(-\lambda t)\mbox{ as }t\rightarrow\infty

where λ\lambda is the “spectral gap”. But we cannot pull back from asymptotia to the real world so easily: it is not true that ρ(t)\rho(t) can be bounded by Kexp(-λt)K\exp(-\lambda t) for universal KK. A dramatic example from Aldous [17] section 4 has for each nn an nn-state chain with spectral gap bounded away from 00 but with ρ(n)\rho(n) also bounded away from 00, instead of being exponentially small. So implicit claims in the literature that estimates of the spectral gap for general chains have implications for finite-time behavior should be treated with extreme skepticism.

It is not surprising that the classical Berry-Esseen Theorem for i.i.d. sums ([133] Thm. 2.4.10) has an analog for chains. Write σ2\sigma^{2} for the asymptotic variance rate in Proposition 4.29 and write ZZ for a standard Normal r.v.

Proposition 4.51

There is a constant KK, depending on the chain, such that

supx|Pπ(Stσt1/2x)-P(Zx)|Kt-1/2\sup_{x}|P_{\pi}(\frac{S_{t}}{\sigma t^{1/2}}\leq x)-P(Z\leq x)|\leq Kt^{-1/2}

for all t1t\geq 1 and all standardized gg.

This result is usually stated for infinite-state chains satisfying various mixing conditions, which are automatically satisfied by finite chains. See e.g. Bolthausen [55]. At first sight the constant KK depends on the function gg as well as the chain, but a finiteness argument shows that the dependence on gg can be removed. Unfortunately the usual proofs don’t give any useful indications of how KK depends on the chain, and so don’t help with Open Problem 4.30.

The variance results in Proposition 4.29 are presumably classical, being straightforward consequences of the spectral representation. Their use in algorithmic settings such as Corollary 4.31 goes back at least to [16].

Section 4.4.3. Systematic study of the optimal choice of weights in the Cauchy-Schwarz argument for Theorem 4.32 may lead to improved bounds in examples. Alan Sokal has unpublished notes on this subject.

Section 4.5.1. The quantity 1/τc1/\tau_{c}, or rather this quantity with the alternate definition of τc\tau_{c} mentioned in the text, has been called conductance. I avoid that term, which invites unnecessary confusion with the electrical network terminology. However, the subscript cc can be regarded as standing for “Cheeger” or “conductance”.

In connection with Open Problem 4.38 we mention the following result. Suppose that in the definition (section 4.4.1) of the maximal correlation function ρ(t)\rho(t) we considered only events, i.e. suppose we defined

ρ~(t)supA,Bcor(1(X0A),1(XtB)).\tilde{\rho}(t)\equiv\sup_{A,B}\mbox{cor}(1_{(X_{0}\in A)},1_{(X_{t}\in B)}).

Then ρ~(t)ρ(t)\tilde{\rho}(t)\leq\rho(t), but in fact the two definitions are equivalent in the sense that there is a universal function ψ(x)0\psi(x)\downarrow 0 as x0x\downarrow 0 such that ρ(t)ψ(ρ~(t))\rho(t)\leq\psi(\tilde{\rho}(t)). This is a result about “measures of dependence” which has nothing to do with Markovianness – see e.g. Bradley et al [59].

Section 4.5.2. The history of Cheeger-type inequalities up to 1987 is discussed in [223] section 6. Briefly, Cheeger [87] proved a lower bound for the eigenvalues of the Laplacian on a compact Riemannian manifold, and this idea was subsequently adapted to different settings – in particular, by Alon [29] to the relationship between eigenvalues and expansion properties of graphs. Lawler and Sokal [223], and independently Jerrum and Sinclair [309], were the first to discuss the relationship between τc\tau_{c} and τ2\tau_{2} at the level of reversible Markov chains. Their work was modified by Diaconis and Stroock [122], whose proof we followed for Lemmas 4.39 and 4.41. The only novelty in my presentation is talking explicitly about quasistationary distributions, which makes the relationships easier to follow.

xxx give forward pointer to results of [238, 158].

Section 4.6.2. See Efron-Stein [140] for the origin of their inequality. Inequality (4.45), or rather the variant mentioned above Corollary 4.47 involving the 2n2n i.i.d. variables