# Chapter 8 Advanced $L^{2}$ Techniques for Bounding Mixing Times (May 19 1999)

xxx In next revision, we should change the definition [in Chapter 4, yyy:(14)] of ${\hat{d}}(t)$ so that what is now $\sqrt{{\hat{d}}(2t)}$ becomes ${\hat{d}}(t)$.

This chapter concerns advanced $L^{2}$-based techniques, developed mainly by Persi Diaconis and Laurent Saloff-Coste [115, 116, 117, 118] for bounding mixing times for (finite, irreducible) reversible Markov chains. For convenience, we will work in continuous time throughout this chapter, unless otherwise noted. Many of the results are conveniently expressed in terms of an $L^{2}$ threshold time” ${\hat{\tau}}$ (xxx use different notation?) defined by

 ${\hat{\tau}}:=\inf\{t>0:\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq e^% {-1}\}.$ (8.1)

xxx For NOTES: Discussion of discrete time, esp. negative eigenvalues.

Several preliminary comments are in order here. First, the definition of the $L^{2}$ distance $\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}$ may be recalled from Chapter 2 section yyy:6.2, and Chapter 3 yyy:(55) and the spectral representation give useful reexpressions:

 $\displaystyle\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}_{2}$ $\displaystyle=$ $\displaystyle\sum_{j}\pi_{j}\left(\frac{p_{ij}(t)}{\pi_{j}}-1\right)^{2}$ (8.2) $\displaystyle=$ $\displaystyle\frac{p_{ii}(2t)}{\pi_{i}}-1$ $\displaystyle=$ $\displaystyle\pi^{-1}_{i}\sum_{m=2}^{n}\exp(-2\lambda_{m}t)u^{2}_{im}.$

Second, from (8.2) and Chapter 4 yyy:(14) we may also write the maximum $L^{2}$ distance appearing in (8.1) using

 $\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}_{2}=\max_{i}\frac{p_{ii}(2t)}{% \pi_{i}}-1=\max_{i,j}\frac{p_{ij}(2t)}{\pi_{j}}-1={\hat{d}}(2t).$

Third, by the application of the Cauchy–Schwarz lemma in Chapter 4 Lemma yyy:8, variation distance can be bounded by $L^{2}$ distance:

 $\displaystyle 4d^{2}_{i}(t)$ $\displaystyle:=$ $\displaystyle 4\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}\leq\|P_{i}(X_{t}\in% \cdot)-\pi(\cdot)\|^{2}_{2},$ (8.3) $\displaystyle 4d^{2}(t)$ $\displaystyle:=$ $\displaystyle 4\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|^{2}\leq{\hat{d}}(2t);$ (8.4)

these inequalities are the primary motivation for studying $L^{2}$ distance.

As argued in Chapter 4 yyy:just following (23),

 ${\hat{d}}(2t)\leq\pi^{-1}_{*}e^{-2t/\tau_{2}},$ (8.5)

where $\tau_{2}:=\lambda^{-1}_{2}$ is the relaxation time and $\pi_{*}:=\min_{i}\pi_{i}$. Thus if

 $t\geq\tau_{2}\left(\frac{1}{2}\log\frac{1}{\pi_{*}}+c\right),$

then

 $d(t)\leq{\textstyle\frac{1}{2}}\sqrt{{\hat{d}}(2t)}\leq{\textstyle\frac{1}{2}}% e^{-c},$ (8.6)

which is small if $c$ is large; in particular, (8.6) gives the upper bound in

 $\tau_{2}\leq{\hat{\tau}}\leq\tau_{2}\left(\frac{1}{2}\log\frac{1}{\pi_{*}}+1% \right),$ (8.7)

and the lower bound follows easily.

For many simple chains (see Chapter 5), $\tau_{2}$ can be computed exactly. Typically, however, $\tau_{2}$ can only be bounded. This can be done using the “distinguished paths” method of Chapter 4 Section yyy:3. In Section 1 we will see that that method may be regarded as a special case of a “comparison method” whereby a chain with “unknown” relaxation time is compared to a second chain with “known” relaxation time. The greater generality often leads to improved bounds on $\tau_{2}$. As a bonus, the comparison method also gives bounds on the other “unknown” eigenvalues, and such bounds in turn can sometimes further decrease the time $t$ required to guarantee that ${\hat{d}}(2t)$, and hence also $d(t)$, is small.

A second set of advanced techniques, encompassing the notions of Nash inequalities, moderate growth, and local Poincaré inequaltities, is described in Section 3. The development there springs from the inequality

 $\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq N(s)e^{-(t-s)/\tau_{2}},$ (8.8)

established for all $0\leq s\leq t$ in Section 2, where

 $N(t)=\max_{i}\|P_{i}(X_{t}\in\cdot)\|_{2}=\max_{i}\sqrt{\frac{p_{ii}(2t)}{\pi_% {i}}},\ \ \ t\geq 0.$ (8.9)

Choosing $s=0$ in (8.8) gives

 $\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq\pi^{-1/2}_{i}e^{-t/\tau_{2}},$

and maximizing over $i$ recaptures (8.5). The point of Section 3, however, is that one can sometimes reduce the bound by a better choice of $s$ and suitable estimates of the decay rate of $N(\cdot)$. Such estimates can be provided by so-called Nash inequalities, which are implied by (1) moderate growth conditions and (2) local Poincaré inequalities. Roughly speaking, for chains satisfying these two conditions, judicious choice of $s$ shows that variation mixing time and ${\hat{\tau}}$ are both of order $\Delta^{2}$, where $\Delta$ is the diameter of the graph underlying the chain.

xxx Might not do (1) or (2), so need to modify the above.

To outline a third direction of improvement, we begin by noting that neither of the bounds in (8.7) can be much improved in general. Indeed, ignoring $\Theta(1)$ factors as usual, the lower bound is equality for the $n$-cycle (Chapter 5, Example yyy:7) and the upper bound is equality for the M/M/1/$n$ queue (Chapter 5, Example yyy:6) with traffic intensity $\rho\in(0,1)$.

In Section 4 we introduce the log-Sobolev time $\tau_{l}$ defined by

 $\tau_{l}:=\sup\{\mbox{{\cal L}}(g)/\mbox{{\cal E}}(g,g):g\not\equiv\mbox{% constant}\}$ (8.10)

where $\mbox{{\cal L}}(g)$ is the entropy-like quantity

 $\mbox{{\cal L}}(g):=\sum_{i}\pi_{i}g^{2}(i)\log(|g(i)|/\|g\|_{2}),$

recalling $\|g\|^{2}_{2}=\sum_{i}\pi_{i}g^{2}(i)$. Notice the similarity between (8.10) and the extremal characterization of $\tau_{2}$ (Chapter 3 Theorem yyy:22):

 $\tau_{2}=\sup\{\|g\|^{2}_{2}/\mbox{{\cal E}}(g,g):\sum_{i}\pi_{i}g(i)=0,\ \ % g\not\equiv 0\}.$

We will see that

 $\tau_{2}\leq\tau_{l}\leq\tau_{2}\frac{\log\left(\frac{1}{\pi_{*}}-1\right)}{2(% 1-2\pi_{*})}$

and that ${\hat{\tau}}$ is more closely related to $\tau_{l}$ than to $\tau_{2}$, in the sense that

 $\tau_{l}\leq{\hat{\tau}}\leq\tau_{l}\left(\frac{1}{2}\log\log\frac{1}{\pi_{*}}% +2\right).$ (8.11)

To illustrate the improvement over (8.7), from the knowledge for the $d$-cube (Chapter 5, Example yyy:15) that $\tau_{2}=d/2$, one can deduce from (8.7) that

 ${\textstyle\frac{1}{2}}d\leq{\hat{\tau}}\leq{\textstyle\frac{1}{4}}(\log 2)d^{% 2}+{\textstyle\frac{1}{2}}d.$ (8.12)

In Section 4.4 (Example 27) we will see that $\tau_{l}=d/2$; then from (8.11) we can deduce the substantial improvement

 $\frac{1}{2}d\leq{\hat{\tau}}\leq\frac{1}{4}d\log d+\left(1-\frac{1}{4}\log% \frac{1}{\log 2}\right)d$ (8.13)

upon (8.12).

ZZZ!: Recall also the corrections in my notes on pages 8.2.11–12 (and 8.4.27). Continue same paragraph:

The upper bound here is remarkably tight: from Chapter 5 yyy:(65),

 $d\to\infty$

ZZZ!: In fact, the remainder term is $O(1)$. Continue same paragraph:

Thus log-Sobolev techniques provide another means of improving mixing time bounds, both in $L^{2}$ and, because of (8.3)–(8.4), in variation. As will be seen, these techniques can also be combined usefully with comparison methods and Nash inequalities.