Reversible Markov Chains and Random Walks on Graphs

Chapter 8 Advanced L2L^{2} Techniques for Bounding Mixing Times (May 19 1999)

xxx In next revision, we should change the definition [in Chapter 4, yyy:(14)] of d^(t){\hat{d}}(t) so that what is now d^(2t)\sqrt{{\hat{d}}(2t)} becomes d^(t){\hat{d}}(t).

This chapter concerns advanced L2L^{2}-based techniques, developed mainly by Persi Diaconis and Laurent Saloff-Coste [115, 116, 117, 118] for bounding mixing times for (finite, irreducible) reversible Markov chains. For convenience, we will work in continuous time throughout this chapter, unless otherwise noted. Many of the results are conveniently expressed in terms of an L2L^{2} threshold time” τ^{\hat{\tau}} (xxx use different notation?) defined by

τ^:=inf{t>0:maxiPi(Xt)-π()2e-1}.{\hat{\tau}}:=\inf\{t>0:\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq e^% {-1}\}. (8.1)

xxx For NOTES: Discussion of discrete time, esp. negative eigenvalues.

Several preliminary comments are in order here. First, the definition of the L2L^{2} distance Pi(Xt)-π()2\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2} may be recalled from Chapter 2 section yyy:6.2, and Chapter 3 yyy:(55) and the spectral representation give useful reexpressions:

Pi(Xt)-π()22\displaystyle\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}_{2} =\displaystyle= jπj(pij(t)πj-1)2\displaystyle\sum_{j}\pi_{j}\left(\frac{p_{ij}(t)}{\pi_{j}}-1\right)^{2} (8.2)
=\displaystyle= pii(2t)πi-1\displaystyle\frac{p_{ii}(2t)}{\pi_{i}}-1
=\displaystyle= πi-1m=2nexp(-2λmt)uim2.\displaystyle\pi^{-1}_{i}\sum_{m=2}^{n}\exp(-2\lambda_{m}t)u^{2}_{im}.

Second, from (8.2) and Chapter 4 yyy:(14) we may also write the maximum L2L^{2} distance appearing in (8.1) using

maxiPi(Xt)-π()22=maxipii(2t)πi-1=maxi,jpij(2t)πj-1=d^(2t).\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}_{2}=\max_{i}\frac{p_{ii}(2t)}{% \pi_{i}}-1=\max_{i,j}\frac{p_{ij}(2t)}{\pi_{j}}-1={\hat{d}}(2t).

Third, by the application of the Cauchy–Schwarz lemma in Chapter 4 Lemma yyy:8, variation distance can be bounded by L2L^{2} distance:

4di2(t)\displaystyle 4d^{2}_{i}(t) :=\displaystyle:= 4Pi(Xt)-π()2Pi(Xt)-π()22,\displaystyle 4\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}\leq\|P_{i}(X_{t}\in% \cdot)-\pi(\cdot)\|^{2}_{2}, (8.3)
4d2(t)\displaystyle 4d^{2}(t) :=\displaystyle:= 4maxiPi(Xt)-π()2d^(2t);\displaystyle 4\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}\leq{\hat{d}}(2t); (8.4)

these inequalities are the primary motivation for studying L2L^{2} distance.

As argued in Chapter 4 yyy:just following (23),

d^(2t)π*-1e-2t/τ2,{\hat{d}}(2t)\leq\pi^{-1}_{*}e^{-2t/\tau_{2}}, (8.5)

where τ2:=λ2-1\tau_{2}:=\lambda^{-1}_{2} is the relaxation time and π*:=miniπi\pi_{*}:=\min_{i}\pi_{i}. Thus if



d(t)12d^(2t)12e-c,d(t)\leq{\textstyle\frac{1}{2}}\sqrt{{\hat{d}}(2t)}\leq{\textstyle\frac{1}{2}}% e^{-c}, (8.6)

which is small if cc is large; in particular, (8.6) gives the upper bound in

τ2τ^τ2(12log1π*+1),\tau_{2}\leq{\hat{\tau}}\leq\tau_{2}\left(\frac{1}{2}\log\frac{1}{\pi_{*}}+1% \right), (8.7)

and the lower bound follows easily.

For many simple chains (see Chapter 5), τ2\tau_{2} can be computed exactly. Typically, however, τ2\tau_{2} can only be bounded. This can be done using the “distinguished paths” method of Chapter 4 Section yyy:3. In Section 1 we will see that that method may be regarded as a special case of a “comparison method” whereby a chain with “unknown” relaxation time is compared to a second chain with “known” relaxation time. The greater generality often leads to improved bounds on τ2\tau_{2}. As a bonus, the comparison method also gives bounds on the other “unknown” eigenvalues, and such bounds in turn can sometimes further decrease the time tt required to guarantee that d^(2t){\hat{d}}(2t), and hence also d(t)d(t), is small.

A second set of advanced techniques, encompassing the notions of Nash inequalities, moderate growth, and local Poincaré inequaltities, is described in Section 3. The development there springs from the inequality

Pi(Xt)-π()2N(s)e-(t-s)/τ2,\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq N(s)e^{-(t-s)/\tau_{2}}, (8.8)

established for all 0st0\leq s\leq t in Section 2, where

N(t)=maxiPi(Xt)2=maxipii(2t)πi,t0.N(t)=\max_{i}\|P_{i}(X_{t}\in\cdot)\|_{2}=\max_{i}\sqrt{\frac{p_{ii}(2t)}{\pi_% {i}}},\ \ \ t\geq 0. (8.9)

Choosing s=0s=0 in (8.8) gives


and maximizing over ii recaptures (8.5). The point of Section 3, however, is that one can sometimes reduce the bound by a better choice of ss and suitable estimates of the decay rate of N()N(\cdot). Such estimates can be provided by so-called Nash inequalities, which are implied by (1) moderate growth conditions and (2) local Poincaré inequalities. Roughly speaking, for chains satisfying these two conditions, judicious choice of ss shows that variation mixing time and τ^{\hat{\tau}} are both of order Δ2\Delta^{2}, where Δ\Delta is the diameter of the graph underlying the chain.

xxx Might not do (1) or (2), so need to modify the above.

To outline a third direction of improvement, we begin by noting that neither of the bounds in (8.7) can be much improved in general. Indeed, ignoring Θ(1)\Theta(1) factors as usual, the lower bound is equality for the nn-cycle (Chapter 5, Example yyy:7) and the upper bound is equality for the M/M/1/nn queue (Chapter 5, Example yyy:6) with traffic intensity ρ(0,1)\rho\in(0,1).

In Section 4 we introduce the log-Sobolev time τl\tau_{l} defined by

τl:=sup{(g)/(g,g):gconstant}\tau_{l}:=\sup\{\mbox{${\cal L}$}(g)/\mbox{${\cal E}$}(g,g):g\not\equiv\mbox{% constant}\} (8.10)

where (g)\mbox{${\cal L}$}(g) is the entropy-like quantity

(g):=iπig2(i)log(|g(i)|/g2),\mbox{${\cal L}$}(g):=\sum_{i}\pi_{i}g^{2}(i)\log(|g(i)|/\|g\|_{2}),

recalling g22=iπig2(i)\|g\|^{2}_{2}=\sum_{i}\pi_{i}g^{2}(i). Notice the similarity between (8.10) and the extremal characterization of τ2\tau_{2} (Chapter 3 Theorem yyy:22):

τ2=sup{g22/(g,g):iπig(i)=0, g0}.\tau_{2}=\sup\{\|g\|^{2}_{2}/\mbox{${\cal E}$}(g,g):\sum_{i}\pi_{i}g(i)=0,\ \ % g\not\equiv 0\}.

We will see that

τ2τlτ2log(1π*-1)2(1-2π*)\tau_{2}\leq\tau_{l}\leq\tau_{2}\frac{\log\left(\frac{1}{\pi_{*}}-1\right)}{2(% 1-2\pi_{*})}

and that τ^{\hat{\tau}} is more closely related to τl\tau_{l} than to τ2\tau_{2}, in the sense that

τlτ^τl(12loglog1π*+2).\tau_{l}\leq{\hat{\tau}}\leq\tau_{l}\left(\frac{1}{2}\log\log\frac{1}{\pi_{*}}% +2\right). (8.11)

To illustrate the improvement over (8.7), from the knowledge for the dd-cube (Chapter 5, Example yyy:15) that τ2=d/2\tau_{2}=d/2, one can deduce from (8.7) that

12dτ^14(log2)d2+12d.{\textstyle\frac{1}{2}}d\leq{\hat{\tau}}\leq{\textstyle\frac{1}{4}}(\log 2)d^{% 2}+{\textstyle\frac{1}{2}}d. (8.12)

In Section 4.4 (Example 27) we will see that τl=d/2\tau_{l}=d/2; then from (8.11) we can deduce the substantial improvement

12dτ^14dlogd+(1-14log1log2)d\frac{1}{2}d\leq{\hat{\tau}}\leq\frac{1}{4}d\log d+\left(1-\frac{1}{4}\log% \frac{1}{\log 2}\right)d (8.13)

upon (8.12).

ZZZ!: Recall also the corrections in my notes on pages 8.2.11–12 (and 8.4.27). Continue same paragraph:

The upper bound here is remarkably tight: from Chapter 5 yyy:(65),

τ^=14dlogd+(14log1log(1+e-2))d+o(d)  as dd\to\infty.{\hat{\tau}}=\frac{1}{4}d\log d+\left(\frac{1}{4}\log\frac{1}{\log(1+e^{-2})}% \right)d+o(d)\mbox{\ \ as $d\to\infty$.}

ZZZ!: In fact, the remainder term is O(1)O(1). Continue same paragraph:

Thus log-Sobolev techniques provide another means of improving mixing time bounds, both in L2L^{2} and, because of (8.3)–(8.4), in variation. As will be seen, these techniques can also be combined usefully with comparison methods and Nash inequalities.