8 Advanced L2L^{2} Techniques for Bounding Mixing Times (May 19 1999)

8.3 Nash inequalities

xxx For NOTES: Nash vs. Sobolev

A Nash inequality for a chain is an inequality of the form

g22+1DC[(g,g)+1Tg22]g11/D\|g\|^{2+\frac{1}{D}}_{2}\leq C\,[\mbox{${\cal E}$}(g,g)+{\textstyle\frac{1}{T% }}\|g\|^{2}_{2}]\,\|g\|^{1/D}_{1} (8.46)

that holds for some positive constants CC, DD, and TT and for all functions gg. We connect Nash inequalities to mixing times in Section 8.3.1, and in Section 8.3.2 we discuss a comparison method for establishing such inequalities.

8.3.1 Nash inequalities and mixing times

A Nash inequality implies a useful bound on the quantity

N(t)=𝐏t2N(t)=\|{\bf P}_{t}\|_{2\to\infty} (8.47)

appearing in the mixing time Lemma 8.13. This norm is continuous in tt and decreases to 𝐄2=1\|{\bf E}\|_{2\to\infty}=1 as tt\uparrow\infty. Here is the main result:

Theorem 8.17

If the Nash inequality (8.46) holds for a continuous-time reversible chain, some C,D,T>0C,D,T>0, and all gg, then the norm N(s)N(s) at (8.47) satisfies

N(t)e(DC/t)D   for 0<tT0<t\leq T.N(t)\leq e(DC/t)^{D}\mbox{\ \ \ for $0<t\leq T$.\/}

Proof. First note N(t)=𝐏t12N(t)=\|{\bf P}_{t}\|_{1\to 2} by Lemma 8.12. Thus we seek a bound on h(t):=𝐏tg22h(t):=\|{\bf P}_{t}g\|^{2}_{2} independent of gg satisfying g1=1\|g\|_{1}=1; the square root of such a bound will also bound N(t)N(t).

Substituting 𝐏tg{\bf P}_{t}g for gg in (8.46) and utilizing the identity in Lemma 8.10(a) and the fact that 𝐏t{\bf P}_{t} is contractive on L1L^{1}, we obtain the differential inequality

h(t)1+12DC[-12h(t)+1Th(t)],   t0.h(t)^{1+\frac{1}{2D}}\leq C\,[-{\textstyle\frac{1}{2}}h^{\prime}(t)+{% \textstyle\frac{1}{T}}h(t)],\ \ \ t\geq 0.

Writing

H(t):=[12Ch(t)e-2t/T]-1/(2D),H(t):=\left[{\textstyle\frac{1}{2}}Ch(t)e^{-2t/T}\right]^{-1/(2D)},

the inequality can be equivalently rearranged to

H(t)[2D(C/2)1+12D]-1etDT,   t0.H^{\prime}(t)\geq\left[2D(C/2)^{1+\frac{1}{2D}}\right]^{-1}e^{\frac{t}{DT}},\ % \ \ t\geq 0.

Since H(0)>0H(0)>0, it follows that

H(t)T[2(C/2)1+12D]-1(etDT-1),   t0,H(t)\geq T\left[2(C/2)^{1+\frac{1}{2D}}\right]^{-1}\left(e^{\frac{t}{DT}}-1% \right),\ \ \ t\geq 0,

or equivalently

h(t)[TC(1-e-t/(DT))]-2D,   t0.h(t)\leq\left[\frac{T}{C}\left(1-e^{-t/(DT)}\right)\right]^{-2D},\ \ \ t\geq 0.

But

e-t/(DT)1-tT(1-e-1/D)   for 0<tT0<t\leq T,e^{-t/(DT)}\leq 1-\frac{t}{T}(1-e^{-1/D})\mbox{\ \ \ for $0<t\leq T$,}

so for these same values of tt we have

h(t)\displaystyle h(t) \displaystyle\leq [tC(1-e-1/D)]-2D\displaystyle\left[\frac{t}{C}\left(1-e^{-1/D}\right)\right]^{-2D}
=\displaystyle= e2[tC(e1/D-1)]-2D[e(DC/t)D]2,\displaystyle e^{2}\left[\frac{t}{C}\left(e^{1/D}-1\right)\right]^{-2D}\leq[e(% DC/t)^{D}]^{2},

as desired.    

We now return to Lemma 8.13 and, for tTt\geq T, set s=Ts=T. (Indeed, using the bound on N(s)N(s) in Theorem 8.17, this is the optimal choice of ss if T<Dτ2T<D\tau_{2}.) This gives

xxx NOTE: In next theorem, only need conclusion of Theorem 8.17, not hypothesis!

Theorem 8.18

In Theorem 8.17, if c1c\geq 1 and

tT+τ2(Dlog(DCT)+c),t\geq T+\tau_{2}\left(D\log\left(\frac{DC}{T}\right)+c\right),

then d^(2t)e1-c\sqrt{{\hat{d}}(2t)}\leq e^{1-c}; in particular,

τ^T+τ2(Dlog(DCT)+2).{\hat{\tau}}\leq T+\tau_{2}\left(D\log\left(\frac{DC}{T}\right)+2\right).

The following converse of sorts to Theorem 8.17 will be very useful in conjunction with the comparison method.

Theorem 8.19

If a continuous time reversible chain satisfies

N(t)Ct-D   for 0<tT0<t\leq T,N(t)\leq Ct^{-D}\mbox{\ \ \ for\/ $0<t\leq T$,}

then it satisfies the Nash inequality

g22+1DC[(g,g)+12Tg22]g11/D   for all gg\|g\|^{2+\frac{1}{D}}_{2}\leq C^{\prime}\,[\mbox{${\cal E}$}(g,g)+{\textstyle% \frac{1}{2T}}\|g\|^{2}_{2}]\,\|g\|^{1/D}_{1}\mbox{\ \ \ for all\/~{}$g$}

with

C:=2(1+12D)[(1+2D)1/2C]1/D22+12DC1/D.C^{\prime}:=2\,(1+{\textstyle\frac{1}{2D}})\,[(1+2D)^{1/2}C]^{1/D}\leq 2^{2+% \frac{1}{2D}}\,C^{1/D}.

xxx Detail in proof to be filled in (I have notes): Show (𝐏tg,𝐏tg)\mbox{${\cal E}$}({\bf P}_{t}g,{\bf P}_{t}g)\downarrow as tt\uparrow, or at least that it’s maximized at t=0t=0. Stronger of two statements is equivalent to assertion that 𝐏tf22\|{\bf P}_{t}f\|^{2}_{2} is convex in tt.

Proof. As in the proof of Theorem 8.17, we note N(t)=𝐏t12N(t)=\|{\bf P}_{t}\|_{1\to 2}. Hence, for any gg and any 0<tT0<t\leq T,

g22\displaystyle\|g\|^{2}_{2} =\displaystyle= 𝐏tg22-s=0tdds𝐏sg22ds\displaystyle\|{\bf P}_{t}g\|^{2}_{2}-\int_{s=0}^{t}{\textstyle\frac{d}{ds}}\|% {\bf P}_{s}g\|^{2}_{2}\,ds
=\displaystyle= 𝐏tg22+2s=0t(𝐏sg,𝐏sg)ds   by Lemma 8.10(a)\displaystyle\|{\bf P}_{t}g\|^{2}_{2}+2\int_{s=0}^{t}\mbox{${\cal E}$}({\bf P}% _{s}g,{\bf P}_{s}g)\,ds\mbox{\ \ \ by Lemma~{}\ref{basic-1}(a)}
\displaystyle\leq 𝐏tg22+2(g,g)t   xxx see above\displaystyle\|{\bf P}_{t}g\|^{2}_{2}+2\mbox{${\cal E}$}(g,g)t\mbox{\ \ \ xxx % see above}
\displaystyle\leq 2(g,g)t+C2t-2Dg12.\displaystyle 2\mbox{${\cal E}$}(g,g)t+C^{2}t^{-2D}\|g\|^{2}_{1}.

This gives

g22t[2(g,g)+1Tg22]+C2t-2Dg12\|g\|^{2}_{2}\leq t[2\mbox{${\cal E}$}(g,g)+{\textstyle\frac{1}{T}}\|g\|^{2}_{% 2}]+C^{2}t^{-2D}\|g\|^{2}_{1}

for any t>0t>0. The righthand side here is convex in tt and minimized (for g0g\neq 0) at

t=(2DC2g122(g,g)+T-1g22)1/(2D+1).t=\left(\frac{2DC^{2}\|g\|^{2}_{1}}{2\mbox{${\cal E}$}(g,g)+T^{-1}\|g\|^{2}_{2% }}\right)^{1/(2D+1)}.

Plugging in this value, raising both sides to the power 1+12D1+{\textstyle\frac{1}{2D}} and simplifying yields the desired Nash inequality. The upper bound for CC^{\prime} is derived with a little bit of calculus.    

8.3.2 The comparison method for bounding N()N(\cdot)

In Section 8.1 we compared relaxation times for two chains by comparing Dirichlet forms and variances. The point of this subsection is that the comparison can be extended to the norm function N()N(\cdot) of (8.47) using Nash inequalities. Then results on N()N(\cdot) like those in Section 8.2.3 can be used to bound mixing times for other chains on the same state space.

xxx For NOTES?: Can even use different spaces. New paragraph:

To see how this goes, suppose that a benchmark chain is known to satisfy

N~(t)C~t-D~   for 0<tT~0<t\leq{\tilde{T}}.{\tilde{N}}(t)\leq{\tilde{C}}t^{-{\tilde{D}}}\mbox{\ \ \ for $0<t\leq{\tilde{T% }}$.} (8.48)

By Theorem 8.19, it then satisfies a Nash inequality. The L1L^{1}- and L2L^{2}-norms appearing in this inequality can be compared in the obvious fashion [cf. (8.18)] and the Dirichlet forms can be compared as in Theorem 8.1. This shows that the chain of interest also satisfies a Nash inequality. But then Theorem 8.17 gives a bound like (8.48) for the chain of interest, and Theorem 8.18 can then be used to bound the L2L^{2} threshold time τ^{\hat{\tau}}.

Here is the precise result; the details of the proof are left to the reader.

Theorem 8.20 (comparison of bounds on N()N(\cdot))

If a reversible benchmark chain satisfies

N~(t)C~t-D~   for 0<tT~0<t\leq{\tilde{T}}{\tilde{N}}(t)\leq{\tilde{C}}t^{-{\tilde{D}}}\mbox{\ \ \ for\/ $0<t\leq{\tilde% {T}}$}

for constants C~,D~,T~>0{\tilde{C}},{\tilde{D}},{\tilde{T}}>0, then any other reversible chain on the same state space satisfies

N(t)e(DC/t)D   for 0<tT0<t\leq T,N(t)\leq e(DC/t)^{D}\mbox{\ \ \ for\/ $0<t\leq T$,}

where, with aa and AA as defined in Corollary 8.2, and with

a:=maxi(π~i/πi),a^{\prime}:=\max_{i}({\tilde{\pi}}_{i}/\pi_{i}),

we set

D\displaystyle D =\displaystyle= D~,\displaystyle{\tilde{D}},
C\displaystyle C =\displaystyle= a-(2+1D)a1/DA×2(1+12D)[(1+2D)1/2C~]1/D\displaystyle a^{-(2+\frac{1}{D})}{a^{\prime}}^{1/D}A\times 2(1+{\textstyle% \frac{1}{2D}})[(1+2D)^{1/2}{\tilde{C}}]^{1/D}
\displaystyle\leq a-(2+1D)a1/DA×22+12DC~1/D,\displaystyle a^{-(2+\frac{1}{D})}{a^{\prime}}^{1/D}A\times 2^{2+\frac{1}{2D}}% {\tilde{C}}^{1/D},
T\displaystyle T =\displaystyle= 2Aa2T~.\displaystyle\frac{2A}{{a^{\prime}}^{2}}{\tilde{T}}.

xxx Must correct this slightly. Works for any AA such that ~A{\tilde{\cal E}}\leq A\mbox{${\cal E}$}, not just minimal one. This is important since we need a lower bound on TT but generally only have an upper bound on ~/{\tilde{\cal E}}/\mbox{${\cal E}$}. The same goes for aa^{\prime} (only need upper bound on π~i/πi{\tilde{\pi}}_{i}/\pi_{i}): we also need an upper bound on TT.

Example 8.21

Random walk on a dd-dimensional grid.

As in Example 8.5, consider the continuized walk on the dd-dimensional grid I={0,,m-1}dI=\{0,\ldots,m-1\}^{d}. In Example 8.5 we compared the Dirichlet form and variance for this walk to the dd-fold product of random walk on the mm-path with end self-loops to obtain

τ2τ~2=d(1-cosπm)-112dm2;\tau_{2}\leq{\tilde{\tau}}_{2}=d(1-\cos{\textstyle\frac{\pi}{m}})^{-1}\leq{% \textstyle\frac{1}{2}}dm^{2}; (8.49)

using the simple bound d^(2t)π*-1/2e-t/τ2(2n)-1/2exp(-2tdm2)\sqrt{{\hat{d}}(2t)}\leq\pi^{-1/2}_{*}e^{-t/\tau_{2}}\leq(2n)^{-1/2}\exp\left(% -\frac{2t}{dm^{2}}\right) we then get

τ^14m2d[log(2n)+2],{\hat{\tau}}\leq{\textstyle\frac{1}{4}}m^{2}d[\log(2n)+2], (8.50)

which is of order m2d2logmm^{2}d^{2}\log m. Here we will see how comparing N()N(\cdot), too, gives a bound of order m2d2logdm^{2}d^{2}\log d. In Example 8.43, we will bring log-Sobolev techniques to bear, too, to lower this bound to order m2dlogdm^{2}d\log d

xxx which is correct, at least for TV. New paragraph:

Recalling (8.45), we may apply Theorem 8.20 with

D~=d/4,   C~=(4dm2)d/4,   T~=dm2/16,{\tilde{D}}=d/4,\ \ \ {\tilde{C}}=(4dm^{2})^{d/4},\ \ \ {\tilde{T}}=dm^{2}/16,

and, from the considerations in Example 8.5,

a12,   A=1,   a=2.a\geq{\textstyle\frac{1}{2}},\ \ \ A=1,\ \ \ a^{\prime}=2.

xxx See xxx following Theorem 8.20. Same paragraph:

This gives

D=d/4,   C26+10ddm2216dm2,   T=dm2/32.D=d/4,\ \ \ C\leq 2^{6+\frac{10}{d}}dm^{2}\leq 2^{16}dm^{2},\ \ \ T=dm^{2}/32.

Plugging these into Theorem 8.18 yields

τ^18m2d2logd+(198log2)m2d2+3332m2d,{\hat{\tau}}\leq{\textstyle\frac{1}{8}}m^{2}d^{2}\log d+({\textstyle\frac{19}{% 8}}\log 2)m^{2}d^{2}+{\textstyle\frac{33}{32}}m^{2}d, (8.51)

which is 3m2d2logd\leq 3m^{2}d^{2}\log d for d2d\geq 2.

Other variants of the walk, including the thinned-grid walk of Example 8.6, can be handled in a similar fashion.

xxx Do moderate growth and local Poincaré? Probably not, to keep length manageable. Also, will need to rewrite intro a little, since not doing Δ2\Delta^{2}-stuff (in any detail).