# 8.2 Improved bounds on $L^{2}$ distance

The central theme of the remainder of this chapter is that norms other than the $L^{1}$ norm (and closely related variation distance) and $L^{2}$ norm can be used to improve substantially upon the bound

 $\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq\pi^{-1/2}e^{-t/\tau_{2}}.$ (8.26)

## 8.2.1 $L^{q}$ norms and operator norms

Our discussion here of $L^{q}$ norms will parallel and extend the discussion in Chapter 2 Section yyy:6.2 of $L^{1}$ and $L^{2}$ norms. Given $1\leq q\leq\infty$, both the $L^{q}$ norm of a function and the $L^{q}$ norm of a signed measure are defined with respect to some fixed reference probability distribution $\pi$ on $I$, which for our purposes will be the stationary distribution of some irreducible but not necessarily reversible chain under consideration. For $1\leq q<\infty$, the $L^{q}$ norm of a function $f:I\to{\bf R}$ is

 $\|f\|_{q}:=\left(\sum_{i}\pi_{i}|f(i)|^{q}\right)^{1/q},$

and we define the $L^{q}$ norm of a signed measure $\nu$ on $I$ to be the $L^{q}$ norm of its density function with respect to $\pi$:

 $\|\nu\|_{q}:=\left(\sum_{j}\pi^{1-q}_{j}\,|\nu_{j}|^{q}\right)^{1/q}.$

For $q=\infty$, the corresponding definitions are

 $\|f\|_{\infty}:=\max_{i}|f(i)|$

and

 $\|\nu\|_{\infty}:=\max_{j}(|\nu_{j}|/\pi_{j}).$

Any matrix ${\bf A}:=(a_{ij}:i,j\in I)$ operates on functions $f:I\to{\bf R}$ by left-multiplication:

 $({\bf A}f)(i)=\sum_{j}a_{ij}f(j),$ (8.27)

and on signed measures $\nu$ by right-multiplication:

 $(\nu{\bf A})_{j}=\sum_{i}\nu_{i}a_{ij}.$ (8.28)

For (8.27), fix $1\leq q_{1}\leq\infty$ and $1\leq q_{2}\leq\infty$ and regard ${\bf A}$ as a linear operator mapping $L^{q_{1}}$ into $L^{q_{2}}$. The operator norm $\|{\bf A}\|_{q_{1}\to q_{2}}$ is defined by

 $\|{\bf A}\|_{q_{1}\to q_{2}}:=\sup\{\|{\bf A}f\|_{q_{2}}:\|f\|_{q_{1}}=1\}.$ (8.29)

The sup in (8.29) is always achieved, and there are many equivalent reexpressions, including

 $\displaystyle\|{\bf A}\|_{q_{1}\to q_{2}}$ $\displaystyle=$ $\displaystyle\max\{\|{\bf A}f\|_{q_{2}}:\|f\|_{q_{1}}\leq 1\}$ $\displaystyle=$ $\displaystyle\max\{\|{\bf A}f\|_{q_{2}}/\|f\|_{q_{1}}:f\neq 0\}.$

Note also that

 $\|{\bf B}{\bf A}\|_{q_{1}\to q_{3}}\leq\|{\bf A}\|_{q_{1}\to q_{2}}\|{\bf B}\|% _{q_{2}\to q_{3}},\ \ \ 1\leq q_{1},q_{2},q_{3}\leq\infty.$ (8.30)

For (8.28), we may similarly regard ${\bf A}$ as a linear operator mapping signed measures $\nu$, measured by $\|\nu\|_{q_{1}}$, to signed measures $\nu{\bf A}$, measured by $\|\nu{\bf A}\|_{q_{2}}$. The corresponding definition of operator norm, call it $\||{\bf A}\||_{q_{1}\to q_{2}}$, is then

 $\||{\bf A}\||_{q_{1}\to q_{2}}:=\sup\{\|\nu{\bf A}\|_{q_{2}}:\|\nu\|_{q_{1}}=1\}.$

A brief calculation shows that

 $\||{\bf A}\||_{q_{1}\to q_{2}}=\|{\bf A}^{*}\|_{q_{1}\to q_{2}},$

where ${\bf A}^{*}$ is the matrix with $(i,j)$ entry $\pi_{j}a_{ji}/\pi_{i}$, that is, ${\bf A}^{*}$ is the adjoint operator to ${\bf A}$ (with respect to $\pi$).

Our applications in this chapter will all have ${\bf A}={\bf A}^{*}$, so we will not need to distinguish between the two operator norms. In fact, all our applications will take ${\bf A}$ to be either ${\bf P}_{t}$ or ${\bf P}_{t}-{\bf E}$ for some $t\geq 0$, where

 ${\bf P}_{t}:=(p_{ij}(t):i,j\in I)$

xxx notation ${\bf P}_{t}$ found elsewhere in book?

and ${\bf E}=\lim_{t\to\infty}{\bf P}_{t}$ is the transition matrix for the trivial discrete time chain that jumps in one step to stationarity:

 ${\bf E}=(\pi_{j}:i,j\in I),$

and where we assume that the chain for $({\bf P}_{t})$ is reversible. Note that ${\bf E}$ operates on functions essentially as expectation with respect to $\pi$:

 $({\bf E}f)(i)=\sum_{j}\pi_{j}f(j),\ \ \ i\in I.$

The effect of ${\bf E}$ on signed measures is to map $\nu$ to $(\sum_{i}\nu_{i})\pi$, and

 ${\bf P}_{t}{\bf E}={\bf E}={\bf E}{\bf P}_{t},\ \ \ t\geq 0.$ (8.31)

## 8.2.2 A more general bound on $L^{2}$ distance

The following preliminary result, a close relative to Chapter 3, Lemmas yyy:21 and 23, is used frequently enough in the sequel that we isolate it for reference. It is the simple identity in part (b) that shows why $L^{2}$-based techniques are so useful.

###### Lemma 8.10

(a) For any function $f$,

 $\frac{d}{dt}\|{\bf P}_{t}f\|^{2}_{2}=-2\mbox{{\cal E}}({\bf P}_{t}f,{\bf P}_% {t}f)\leq-\frac{2}{\tau_{2}}{{\rm var}}_{\pi}\,{\bf P}_{t}f\leq 0.$

(b)

 $\|{\bf P}_{t}-{\bf E}\|_{2\to 2}=e^{-t/\tau_{2}},\ \ \ t\geq 0.$

Proof. (a) Using the backward equations

 $\frac{d}{dt}p_{ij}(t)=\sum_{k}q_{ik}\,p_{kj}(t)$

we find

 $\frac{d}{dt}({\bf P}_{t}f)(i)=\sum_{k}q_{ik}[({\bf P}_{t}f)(k)]$

and so

 $\displaystyle\frac{d}{dt}\|{\bf P}_{t}f\|^{2}_{2}$ $\displaystyle=$ $\displaystyle 2\sum_{i}\sum_{k}\pi_{i}[({\bf P}_{t}f)(i)]q_{ik}[({\bf P}_{t}f)% (k)]$ $\displaystyle=$ $\displaystyle-2\mbox{{\cal E}}({\bf P}_{t}f,{\bf P}_{t}f)\mbox{\ \ \ by % Chapter~{}3 yyy:(70)}$ $\displaystyle\leq$ $\tau_{2}$

(b) From (a), for any $f$ we have

 $\frac{d}{dt}\|({\bf P}_{t}-{\bf E})f\|^{2}_{2}=\frac{d}{dt}\|{\bf P}_{t}(f-{% \bf E}f)\|^{2}_{2}\leq-\frac{2}{\tau_{2}}\|({\bf P}_{t}-{\bf E})f\|^{2}_{2},$

which yields

 $\displaystyle\|({\bf P}_{t}-{\bf E})f\|^{2}_{2}$ $\displaystyle\leq$ $\displaystyle\|({\bf P}_{0}-{\bf E})f\|^{2}_{2}\,e^{-2t/\tau_{2}}=({{\rm var}}% _{\pi}\,f)e^{-2t/\tau_{2}}$ $\displaystyle\leq$ $\displaystyle\|f\|^{2}_{2}\,e^{-2t/\tau_{2}}.$

Thus $\|{\bf P}_{t}-{\bf E}\|_{2\to 2}\leq e^{-t/\tau_{2}}$. Taking $f$ to be the eigenvector

 $f_{i}:=\pi^{-1/2}_{i}u_{i2},\ \ \ i\in I,$

of ${\bf P}_{t}-{\bf E}$ corresponding to eigenvalue $\exp(-t/\tau_{2})$ demonstrates equality and completes the proof of (b).

The key to all further developments in this chapter is the following result.

###### Lemma 8.11

For an irreducible reversible chain with arbitrary initial distribution and any $s,t\geq 0$,

 $\|P(X_{s+t}\in\cdot)-\pi(\cdot)\|_{2}\leq\|P(X_{s}\in\cdot)\|_{2}\|{\bf P}_{t}% -{\bf E}\|_{2\to 2}=\|P(X_{s}\in\cdot)\|_{2}\,e^{-t/\tau_{2}}.$

Proof. The equality is Lemma 8.10(b), and

 $\|P(X_{s+t}\in\cdot)-\pi(\cdot)\|_{2}=\|P(X_{s}\in\cdot)({\bf P}_{t}-{\bf E})% \|_{2}\leq\|P(X_{s}\in\cdot)\|_{2}\|{\bf P}_{t}-{\bf E}\|_{2\to 2}$

proves the inequality.

We have already discussed, in Section 8.1, a technique for bounding $\tau_{2}$ when (as is usually the case) it cannot be computed exactly. To utilize Lemma 8.11, we must also bound $\|P(X_{s}\in\cdot)\|_{2}$. Since

 $\|P(X_{s}\in\cdot)\|_{2}=\|P(X_{0}\in\cdot){\bf P}_{s}\|_{2}$ (8.32)

xxx For NOTES: By Jensen’s inequality (for $1\leq q<\infty$), any transition matrix contracts $L^{q}$ for any $1\leq q\leq\infty$.

and each ${\bf P}_{t}$ is contractive on $L^{2}$, i.e., $\|{\bf P}_{t}\|_{2\to 2}\leq 1$ (this follows, for example, from Lemma 8.10(a); and note that $\|{\bf P}_{t}\|_{2\to 2}=1$ by considering constant functions), it follows that

 $1$ (8.33)

and the decrease is strictly monotone unless $P(X_{0}\in\cdot)=\pi(\cdot)$. From (8.32) follows

 $1\leq q^{*}\leq\infty$ (8.34)

and again

 $1$ (8.35)

The norm $\|{\bf P}_{s}\|_{q^{*}\to 2}$ decreases in $q^{*}$ (for fixed $s$) and is identically $1$ when $q^{*}\geq 2$, but in applications we will want to take $q^{*}<2$. The following duality lemma will then often prove useful. Recall that $1\leq q,q^{*}\leq\infty$ are said to be (Hölder-)conjugate exponents if

 $\frac{1}{q}+\frac{1}{q^{*}}=1.$ (8.36)
###### Lemma 8.12

For any operator ${\bf A}$, let

 ${\bf A}^{*}=(\pi_{j}a_{ji}/\pi_{i}:i,j\in I)$

denote its adjoint with respect to $\pi$. Then, for any $1\leq q_{1},q_{2}\leq\infty$,

 $\|{\bf A}\|_{q_{1}\to q_{2}}=\|{\bf A}^{*}\|_{q^{*}_{2}\to q^{*}_{1}}.$

In particular, for a reversible chain and any $1\leq q\leq\infty$ and $s\geq 0$,

 $\|{\bf P}_{s}\|_{2\to q}=\|{\bf P}_{s}\|_{q^{*}\to 2}.$ (8.37)

Proof. Classical duality for $L^{q}$ spaces (see, e.g., Chapter 6 in [303]) asserts that, given $1\leq q\leq\infty$ and $g$ on $I$,

 $\|g\|_{q^{*}}=\max\{|\langle f,g\rangle|:\|f\|_{q}=1\}$

where

 $\langle f,g\rangle:=\sum_{i}\pi_{i}f(i)g(i).$

Thus

 $\displaystyle\|{\bf A}^{*}g\|_{q^{*}_{1}}$ $\displaystyle=$ $\displaystyle\max\{|\langle f,{\bf A}^{*}g\rangle|:\|f\|_{q_{1}}=1\}$ $\displaystyle=$ $\displaystyle\max\{|\langle{\bf A}f,g\rangle|:\|f\|_{q}=1\},$

and also

 $|\langle{\bf A}f,g\rangle|\leq\|{\bf A}f\|_{q_{2}}\|g\|_{q^{*}_{2}}\leq\|{\bf A% }\|_{q_{1}\to q_{2}}\|f\|_{q_{1}}\|g\|_{q^{*}_{2}},$

so

 $\|{\bf A}^{*}g\|_{q^{*}_{1}}\leq\|{\bf A}\|_{q_{1}\to q_{2}}\|g\|_{q^{*}_{2}}.$

Since this is true for every $g$, we conclude $\|{\bf A}^{*}\|_{q^{*}_{2}\to q^{*}_{1}}\leq\|{\bf A}\|_{q_{1}\to q_{2}}$. Reverse roles to complete the proof.

As a corollary, if $q^{*}=1$ then (8.34) and (8.37) combine to give

 $\|P(X_{s}\in\cdot)\|_{2}\leq\|{\bf P}_{s}\|_{1\to 2}=\|{\bf P}_{s}\|_{2\to\infty}$

and then

 $\|P(X_{s+t}\in\cdot)-\pi(\cdot)\|_{2}\leq\|{\bf P}_{s}\|_{2\to\infty}\,e^{-t/% \tau_{2}}$

from Lemma 8.11. Thus

 $\sqrt{{\hat{d}}(2(s+t))}\leq\|{\bf P}_{s}\|_{2\to\infty}\,e^{-t/\tau_{2}}.$ (8.38)

Here is a somewhat different derivation of (8.38):

###### Lemma 8.13

For $0\leq s\leq t$,

 $\displaystyle\sqrt{{\hat{d}}(2t)}=\|{\bf P}_{t}-{\bf E}\|_{2\to\infty}$ $\displaystyle\leq$ $\displaystyle\|{\bf P}_{s}\|_{2\to\infty}\|{\bf P}_{t-s}-{\bf E}\|_{2\to 2}$ $\displaystyle=$ $\displaystyle\|{\bf P}_{s}\|_{2\to\infty}\,e^{-(t-s)/\tau_{2}}.$

Proof. In light of (8.31), (8.30), and Lemma 8.10(b), we need only establish the first equality. Indeed, $\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}$ is the $L^{2}$ norm of the function $(P_{i}(X_{t}\in\cdot)/\pi(\cdot))-1$ and so equals

 $\max\left\{\left|\sum_{j}(p_{ij}(t)-\pi_{j})f(j)\right|:\|f\|_{2}=1\right\}=% \max\{|(({\bf P}_{t}-{\bf E})f)(i)|:\|f\|_{2}=1\}.$

Taking the maximum over $i\in I$ we obtain

 $\displaystyle\sqrt{{\hat{d}}(2t)}$ $\displaystyle=$ $\displaystyle\max\left\{\max_{i}|(({\bf P}_{t}-{\bf E})f)(i)|:\|f\|_{2}=1\right\}$ $\displaystyle=$ $\displaystyle\max\{\|({\bf P}_{t}-{\bf E})f\|_{\infty}:\|f\|_{2}=1\}$ $\displaystyle=$ $\displaystyle\|{\bf P}_{t}-{\bf E}\|_{2\to\infty}.~{}\ \ \rule{4.3pt}{4.3pt}$

Choosing $s=0$ in Lemma 8.11 recaptures (8.26), and choosing $s=0$ in Lemma 8.13 likewise recaptures the consequence (8.5) of (8.26). The central theme for both Nash and log-Sobolev techniques is that one can improve upon these results by more judicious choice of $s$.

## 8.2.3 Exact computation of $N(s)$

The proof of Lemma 8.13 can also be used to show that

 $N(s):=\|{\bf P}_{s}\|_{2\to\infty}=\max_{i}\|P_{i}(X_{s}\in\cdot)\|_{2}=\max_{% i}\sqrt{\frac{p_{ii}(2s)}{\pi_{i}}},$ (8.39)

as at (8.9). In those rare instances when the spectral representation is known explicitly, this gives the formula

xxx Also useful in conjunction with comparison method—see Section 3.

xxx If we can compute this, we can compute ${\hat{d}}(2t)=N^{2}(t)-1$. But the point is to test out Lemma 8.13.

 $N^{2}(s)=1+\max_{i}\pi_{i}^{-1}\sum_{m=2}^{n}u^{2}_{im}\exp(-2\lambda_{m}s),$ (8.40)

and the techniques of later sections are not needed to compute $N(s)$. In particular, in the vertex-transitive case

 $N^{2}(s)=1+\sum_{m=2}^{n}\exp(-2\lambda s).$

The norm $N(s)$ clearly behaves nicely under the formation of products:

 $N(s)=N^{(1)}(s)N^{(2)}(s).$ (8.41)
###### Example 8.14

The two-state chain and the $d$-cube.

For the two-state chain, the results of Chapter 5 Example yyy:4 show

 $N^{2}(s)=1+\frac{\max(p,q)}{\min(p,q)}e^{-2(p+q)s}.$

In particular, for the continuized walk on the $2$-path,

 $N^{2}(s)=1+e^{-4s}.$

By the extension of (8.41) to higher-order products, we therefore have

 $N^{2}(s)=(1+e^{-4s/d})^{d}$

for the continuized walk on the $d$-cube. This result is also easily derived from the results of Chapter 5 Example yyy:15. For $d\geq 2$ and $t\geq\frac{1}{4}d\log(d-1)$, the optimal choice of $s$ in Lemma 8.13 is therefore

 $s={\textstyle\frac{1}{4}}d\log(d-1)$

and this leads in straightforward fashion to the bound

 ${\hat{\tau}}\leq{\textstyle\frac{1}{4}}d(\log d+3).$

While this is a significant improvement on the bound [cf. (8.12)]

 ${\hat{\tau}}\leq{\textstyle\frac{1}{4}}(\log 2)d^{2}+{\textstyle\frac{1}{2}}d$

obtained by setting $s=0$, i.e., obtained using only information about $\tau_{2}$, it is not

xxx REWRITE!, in light of corrections to my notes.

as sharp as the upper bound

 ${\hat{\tau}}\leq(1+o(1)){\textstyle\frac{1}{4}}d\log d$

in (8.13) that will be derived using log-Sobolev techniques.

###### Example 8.15

The complete graph.

For this graph, the results of Chapter 5 Example yyy:9 show

 $N^{2}(s)=1+(n-1)\exp\left(-\frac{2ns}{n-1}\right).$

It turns out for this example that $s=0$ is the optimal choice in Lemma 8.13. This is not surprising given the sharpness of the bound in (8.7) in this case. See Example 8.32 below for further details.

###### Example 8.16

Product random walk on a $d$-dimensional grid.

Consider again the benchmark product chain (i.e., the “tilde chain”) in Example 8.5. That chain has relaxation time

 $\tau_{2}=d\left(1-\cos\left(\frac{\pi}{m}\right)\right)^{-1}\leq\frac{1}{2}dm^% {2},$

so choosing $s=0$ in Lemma 8.13 gives

 $\displaystyle{\hat{\tau}}$ $\displaystyle\leq$ $\displaystyle\frac{d}{1-\cos(\pi/m)}\left(\frac{1}{2}\log n+1\right)$ (8.42) $\displaystyle\leq$ $\displaystyle{\textstyle\frac{1}{4}}dm^{2}(\log n+2).$

This bound can be improved using $N(\cdot)$. Indeed, if we first consider continuized random walk on the $m$-path with self-loops added at each end, the stationary distribution is uniform, the eigenvalues are

 $\lambda_{l}=1-\cos(\pi(l-1)/m),\ \ \ 1\leq l\leq m,$

and the eigenvectors are given by

 $u_{il}=(2/m)^{1/2}\cos(\pi(l-1)(i-{\textstyle\frac{1}{2}})/m),\ \ \ 0\leq i% \leq m-1,\ \ \ 2\leq l\leq m.$

According to (8.40) and simple estimates, for $s>0$

 $N^{2}(s)-1\leq 2\sum_{l=2}^{m}\exp[-2s(1-\cos(\pi(l-1)/m)]\leq 2\sum_{l=1}^{m-% 1}\exp(-4sl^{2}/m^{2})$

and

 $\displaystyle\sum_{l=2}^{m-1}\exp(-4sl^{2}/m^{2})$ $\displaystyle\leq$ $\displaystyle\int_{x=1}^{\infty}\exp(-4sx^{2}/m^{2})\,dx$ $\displaystyle=$ $\displaystyle m\left(\frac{\pi}{4s}\right)^{1/2}P\left(Z\geq\frac{2(2s)^{1/2}}% {m}\right)$ $\displaystyle\leq$ $\displaystyle\left(\frac{\pi/4}{4s/m^{2}}\right)^{1/2}\exp(-4s/m^{2})$ $\displaystyle\leq$ $\displaystyle(4s/m^{2})^{-1/2}\exp(-4s/m^{2})$

when $Z$ is standard normal; in particular, we have used the well-known (xxx: point to Ross book exercise) bound

 $P(Z\geq z)\leq{\textstyle\frac{1}{2}}e^{-z^{2}/2},\ \ \ z\geq 0.$

Thus

 $N^{2}(s)\leq 1+2[1+(4s/m^{2})^{-1/2}]\exp(-4s/m^{2}),\ \ \ s>0.$

Return now to the “tilde chain” of Example 8.5, and assume for simplicity that $m_{1}=\cdots=m_{d}=m$. Since this chain is a slowed-down $d$-fold product of the path chain, it has

 $N^{2}(s)\leq\left[1+2\left(1+\left(\frac{4s}{dm^{2}}\right)^{-1/2}\right)\exp% \left(-\frac{4s}{dm^{2}}\right)\right]^{d},\ \ \ s>0.$ (8.43)

In particular, since ${\hat{d}}(2t)=N^{2}(t)-1$, it is now easy to see that

 ${\hat{\tau}}\leq Km^{2}d\log d=Kn^{2/d}d\log d$ (8.44)

for a universal constant $K$.

xxx We’ve improved on (8.42), which gave order $m^{2}d^{2}\log m$.

xxx When used to bound $d(t)$ at (8.4), the bound (8.44) is “right”: see Theorem 4.1, p. 481, in [118].

The optimal choice of $s$ in Lemma 8.13 cannot be obtained explicitly when (8.43) is used to bound $N(s)=\|{\bf P}_{s}\|_{2\to\infty}$. For this reason, and for later purposes, it is useful to use the simpler, but more restricted, bound

 $0\leq s\leq dm^{2}/16$ (8.45)

To verify this bound, simply notice that $u+2ue^{-u^{2}}+2e^{-u^{2}}\leq 4$ for $0\leq u\leq\frac{1}{2}$. When $s=dm^{2}/16$, (8.45), and $\tau_{2}\leq dm^{2}/2$ are used in Lemma 8.13, we find

 ${\hat{\tau}}\leq{\textstyle\frac{3}{4}}m^{2}d^{2}(\log 2+{\textstyle\frac{3}{4% }}d^{-1}).$

xxx Improvement over (8.42) by factor $\Theta(\log m)$, but display following (8.43) shows still off by factor $\Theta(d/\log d)$.