8 Advanced L2L^{2} Techniques for Bounding Mixing Times (May 19 1999)

8.4 Logarithmic Sobolev inequalities

xxx For NOTES: For history and literature, see ([117], first paragraph and end of Section 1).

xxx For NOTES: Somewhere mention relaxing to nonreversible chains.

8.4.1 The log-Sobolev time τl\tau_{l}

Given a probability distribution π\pi on a finite set II, define

xxx For NOTES: Persi’s (g)\mbox{${\cal L}$}(g) is double ours.

(g):=iπig2(i)log(|g(i)|/g2)\mbox{${\cal L}$}(g):=\sum_{i}\pi_{i}g^{2}(i)\log(|g(i)|/\|g\|_{2}) (8.52)

for g0g\not\equiv 0, recalling g22=iπig2(i)\|g\|^{2}_{2}=\sum_{i}\pi_{i}g^{2}(i) and using the convention 0log0=00\log 0=0. By Jensen’s inequality,

(g)0\mbox{${\cal L}$}(g)\geq 0, with equality if and only if |g||g| is constant.

Given a finite, irreducible, reversible Markov chain with stationary distribution π\pi, define the logarithmic Sobolev (or log-Sobolev) time by

xxx For NOTES: Persi’s α\alpha is 1/(2τl)1/(2\tau_{l}).

xxx Note τl<\tau_{l}<\infty. (Show?) See also Corollary 8.27.

τl:=sup{(g)/(g,g):gconstant}.\tau_{l}:=\sup\{\mbox{${\cal L}$}(g)/\mbox{${\cal E}$}(g,g):g\not\equiv\mbox{% constant}\}. (8.53)

Notice the similarity between (8.53) and the extremal characterization of τ2\tau_{2} (Chapter 3, Theorem yyy:22):

τ2=sup{g22/(g,g):iπig(i)=0, g0}.\tau_{2}=\sup\{\|g\|^{2}_{2}/\mbox{${\cal E}$}(g,g):\sum_{i}\pi_{i}g(i)=0,\ \ % g\not\equiv 0\}.

We discuss exact computation of τl\tau_{l} in Section 8.4.3, the behavior of τl\tau_{l} for product chains in Section 8.4.4, and a comparison method for bounding τl\tau_{l} in Section 8.4.5. In Section 8.4.2 we focus on the connection between τl\tau_{l} and mixing times. A first such result asserts that the relaxation time does not exceed the log-Sobolev time:

Lemma 8.22

τ2τl\tau_{2}\leq\tau_{l}.

xxx Remarks about how “usually” equality?

xxx For NOTES: Proof from [302], via [117].

Proof. Given gconstantg\not\equiv\mbox{constant} and ϵ\epsilon, let f:=1+ϵgf:=1+\epsilon g. Then, writing g¯=iπig(i)\bar{g}=\sum_{i}\pi_{i}g(i), and with all asymptotics as ϵ0\epsilon\to 0,

log|f|2\displaystyle\log|f|^{2} =\displaystyle= 2ϵg-ϵ2g2+O(ϵ3),\displaystyle 2\epsilon g-\epsilon^{2}g^{2}+O(\epsilon^{3}),
logf22\displaystyle\log\|f\|^{2}_{2} =\displaystyle= 2ϵg¯+ϵ2g22-2ϵ2g¯2+O(ϵ3),\displaystyle 2\epsilon\bar{g}+\epsilon^{2}\|g\|^{2}_{2}-2\epsilon^{2}\bar{g}^% {2}+O(\epsilon^{3}),
log|f|2f22\displaystyle\log\frac{|f|^{2}}{\|f\|^{2}_{2}} =\displaystyle= 2ϵ(g-g¯)+ϵ2(2g¯2-g22-g2)+O(ϵ3).\displaystyle 2\epsilon(g-\bar{g})+\epsilon^{2}(2\bar{g}^{2}-\|g\|^{2}_{2}-g^{% 2})+O(\epsilon^{3}).

Also,

f2=1+2ϵg+ϵ2g2;f^{2}=1+2\epsilon g+\epsilon^{2}g^{2};

thus

f2log(|f|2/f22)=2ϵ(g-g¯)+ϵ2(3g2-g22-4gg¯+2g¯2)+O(ϵ3)f^{2}\log(|f|^{2}/\|f\|^{2}_{2})=2\epsilon(g-\bar{g})+\epsilon^{2}(3g^{2}-\|g% \|^{2}_{2}-4g\bar{g}+2\bar{g}^{2})+O(\epsilon^{3})

and so

(f)=ϵ2(g22-g¯2)+O(ϵ3)=ϵ2varπg+O(ϵ3).\mbox{${\cal L}$}(f)=\epsilon^{2}(\|g\|^{2}_{2}-\bar{g}^{2})+O(\epsilon^{3})=% \epsilon^{2}{{\rm var}}_{\pi}\,g+O(\epsilon^{3}).

Furthermore, (f,f)=ϵ2(g,g)\mbox{${\cal E}$}(f,f)=\epsilon^{2}\mbox{${\cal E}$}(g,g); therefore

τl(f)(f,f)=varπg(g,g)+O(ϵ).\tau_{l}\geq\frac{\mbox{${\cal L}$}(f)}{\mbox{${\cal E}$}(f,f)}=\frac{{{\rm var% }}_{\pi}\,g}{\mbox{${\cal E}$}(g,g)}+O(\epsilon).

Finish by letting ϵ0\epsilon\to 0 and then taking the supremum over gg.    

8.4.2 τl\tau_{l}, mixing times, and hypercontractivity

In this subsection we discuss the connection between the L2L^{2} threshold time parameter

τ^=inf{t>0:d^(2t)=maxiPi(Xt)-π()2e-1}{\hat{\tau}}=\inf\{t>0:\sqrt{{\hat{d}}(2t)}=\max_{i}\|P_{i}(X_{t}\in\cdot)-\pi% (\cdot)\|_{2}\leq e^{-1}\} (8.54)

and the log-Sobolev time τl\tau_{l}. As in Section 8.3, we again consider the fundamental quantity

N(s)=𝐏s2N(s)=\|{\bf P}_{s}\|_{2\to\infty}

arising in the bound on d^(2t)\sqrt{{\hat{d}}(2t)} in Lemma 8.13, and recall from Section 8.3.1 that

N(s)N(s) decreases strictly monotonically from π*-1/2\pi^{-1/2}_{*} at s=0s=0 to 11 as ss\uparrow\infty.

The function NN is continuous. It would be nice (especially for use in conjunction with the comparison technique) if we could characterize, in terms of the Dirichlet form {\cal E}, the value of ss, call it s*s^{*}, such that N(s)N(s) equals 22 (say), but such a characterization is not presently available.

xxx For NOTES?: A partial result is Theorem 3.9 in [117], taking q=q=\infty.

Open Problem 8.23

Characterize s*s^{*} in terms of {\cal E}.

To carry on along these general lines, it turns out to be somewhat more convenient to substitute use of

P(Xt)-π()2P(X0)qq-1𝐏s2qe-(t-s)/τ2,   2q<,\hskip{21.681pt}\|P(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq\|P(X_{0}\in\cdot)\|_{% \frac{q}{q-1}}\,\|{\bf P}_{s}\|_{2\to q}\,e^{-(t-s)/\tau_{2}},\ \ \ 2\leq q<\infty, (8.55)

an immediate consequence of Lemmas 8.11 and 8.12 and (8.34), for use of Lemma 8.13. The reason is that, like N(s)N(s), 𝐏s2q\|{\bf P}_{s}\|_{2\to q} decreases monotonically to 11 as ss\uparrow\infty; but, unlike N(s)N(s), it turns out that

for each q2q\geq 2, 𝐏s2q\|{\bf P}_{s}\|_{2\to q} equals 11 for all sufficiently large ss. (8.56)

The property (8.56) is called hypercontractivity, in light of the facts that, for fixed ss, 𝐏s{\bf P}_{s} is a contraction on L2L^{2} and 𝐏s2q\|{\bf P}_{s}\|_{2\to q} is increasing in qq. Let

sq:=inf{s0:𝐏s2q1}=inf{s:𝐏s2q=1};s_{q}:=\inf\{s\geq 0:\|{\bf P}_{s}\|_{2\to q}\leq 1\}=\inf\{s:\|{\bf P}_{s}\|_% {2\to q}=1\};

then s2=0<sqs_{2}=0<s_{q}, and we will see presently that sq<s_{q}<\infty for q2q\geq 2. The following theorem affords a connections with the log-Sobolev time τl\tau_{l} (and hence with the Dirichlet form {\cal E}).

Theorem 8.24

For any finite, irreducible, reversible chain,

τl=sup2<q<2sqlog(q-1).\tau_{l}=\sup_{2<q<\infty}\frac{2s_{q}}{\log(q-1)}.

Proof. The theorem is equivalently rephrased as follows:

𝐏t2q1\|{\bf P}_{t}\|_{2\to q}\leq 1 for all t0t\geq 0 and 2q<2\leq q<\infty satisfying e2t/uq-1e^{2t/u}\geq q-1 (8.57)

if and only if uτlu\geq\tau_{l}. The proof will make use of the generalization

q(g):=iπi|g(i)|qlog(|g(i)|/gq)\mbox{${\cal L}$}_{q}(g):=\sum_{i}\pi_{i}|g(i)|^{q}\log(|g(i)|/\|g\|_{q})

of (8.52). Fixing 0g00\not\equiv g\geq 0 and u>0u>0, we will also employ the notation

q(t)\displaystyle q(t) :=\displaystyle:= 1+e2t/u,   G(t):=𝐏tgq(t)q(t),\displaystyle 1+e^{2t/u},\ \ \ G(t):=\|{\bf P}_{t}g\|^{q(t)}_{q(t)}, (8.58)
F(t)\displaystyle F(t) :=\displaystyle:= 𝐏tgq(t)=exp[1q(t)logG(t)]\displaystyle\|{\bf P}_{t}g\|_{q(t)}=\exp\left[\frac{1}{q(t)}\log G(t)\right]

for t0t\geq 0.

As a preliminary, we compute the derivative of FF. To begin, we can proceed as at the start of the proof of Lemma 8.10(a) to derive

G(t)=-q(t)(𝐏tg,(𝐏tg)q(t)-1)+q(t)q(t)Eπ[(𝐏tg)q(t)log((𝐏tg)q(t))].G^{\prime}(t)=-q(t)\,\mbox{${\cal E}$}\left({\bf P}_{t}g,({\bf P}_{t}g)^{q(t)-% 1}\right)+\frac{q^{\prime}(t)}{q(t)}E_{\pi}\left[({\bf P}_{t}g)^{q(t)}\log% \left(({\bf P}_{t}g)^{q(t)}\right)\right].

Then

F(t)\displaystyle F^{\prime}(t) =\displaystyle= [G(t)q(t)G(t)-q(t)logG(t)q2(t)]F(t)\displaystyle\left[\frac{G^{\prime}(t)}{q(t)G(t)}-\frac{q^{\prime}(t)\log G(t)% }{q^{2}(t)}\right]F(t) (8.59)
=\displaystyle= F(t)-(q(t)-1)[q(t)q(t)q(t)(𝐏tg)-(𝐏tg,(𝐏tg)q(t)-1)].\displaystyle F(t)^{-(q(t)-1)}\left[\frac{q^{\prime}(t)}{q(t)}\mbox{${\cal L}$% }_{q(t)}({\bf P}_{t}g)-\mbox{${\cal E}$}\left({\bf P}_{t}g,({\bf P}_{t}g)^{q(t% )-1}\right)\right].

For the first half of the proof we suppose that (8.57) holds and must prove τlu\tau_{l}\leq u, that is, we must establish the log-Sobolev inequality

(g)u(g,g)\mbox{${\cal L}$}(g)\leq u\,\mbox{${\cal E}$}(g,g) for every gg. (8.60)

To establish (8.60) it is enough to consider 0g>00\not\equiv g>0,

xxx Do we actually use g0g\geq 0 here?

since for arbitrary gg we have

(g)=(|g|)\mbox{${\cal L}$}(g)=\mbox{${\cal L}$}(|g|) and (g,g)(|g|,|g|)\mbox{${\cal E}$}(g,g)\geq\mbox{${\cal E}$}(|g|,|g|). (8.61)

Plugging the specific formula (8.58) for q(t)q(t) into (8.59) and setting t=0t=0 gives

F(0)=g2-1(u-1(g)-(g,g)).F^{\prime}(0)=\|g\|^{-1}_{2}(u^{-1}\mbox{${\cal L}$}(g)-\mbox{${\cal E}$}(g,g)). (8.62)

Moreover, since

F(t)=𝐏tgq(t)\displaystyle F(t)=\|{\bf P}_{t}g\|_{q(t)} \displaystyle\leq 𝐏t2q(t)g2g2   by (8.57)\displaystyle\|{\bf P}_{t}\|_{2\to q(t)}\|g\|_{2}\leq\|g\|_{2}\mbox{\ \ \ by~{% }(\ref{rephrase})}
=\displaystyle= 𝐏0g2=F(0),\displaystyle\|{\bf P}_{0}g\|_{2}=F(0),

the (right-hand) derivative of FF at 00 must be nonpositive. The inequality (8.60) now follows from (8.62).

For the second half of the proof, we may assume u=τlu=\tau_{l} and must establish (8.57). For g0g\geq 0, (8.53) and Lemma 8.25 (to follow) give

q2q(g)=(gq/2)τl(gq/2,gq/2)q2τl4(q-1)(g,gq-1)\frac{q}{2}\mbox{${\cal L}$}_{q}(g)=\mbox{${\cal L}$}(g^{q/2})\leq\tau_{l}\,% \mbox{${\cal E}$}(g^{q/2},g^{q/2})\leq\frac{q^{2}\tau_{l}}{4(q-1)}\mbox{${\cal E% }$}(g,g^{q-1}) (8.63)

for any 1<q<1<q<\infty. With q(t):=1+e2t/τlq(t):=1+e^{2t/\tau_{l}}, we have q(t)=2τl(q(t)-1)q^{\prime}(t)=\frac{2}{\tau_{l}}(q(t)-1), and replacing gg by 𝐏tg{\bf P}_{t}g in (8.63) we obtain

q(t)q(t)q(t)(𝐏tg)-(𝐏tg,(𝐏tg)q(t)-1)0.\frac{q^{\prime}(t)}{q(t)}\mbox{${\cal L}$}_{q(t)}({\bf P}_{t}g)-\mbox{${\cal E% }$}\left({\bf P}_{t}g,({\bf P}_{t}g)^{q(t)-1}\right)\leq 0.

From (8.59) we then find F(t)0F^{\prime}(t)\leq 0 for all t0t\geq 0. Since F(0)=g2F(0)=\|g\|_{2}, this implies

𝐏tgq(t)g2.\|{\bf P}_{t}g\|_{q(t)}\leq\|g\|_{2}. (8.64)

We have assumed g0g\geq 0, but (8.64) now extends trivially to general gg, and therefore

𝐏t2q(t)1.\|{\bf P}_{t}\|_{2\to q(t)}\leq 1.

This gives the desired hypercontractivity assertion (8.57).    

Here is the technical Dirichlet form lemma that was used in the proof of Theorem 8.24.

Lemma 8.25

(g,gq-1)4(q-1)q2(gq/2,gq/2)\mbox{${\cal E}$}(g,g^{q-1})\geq\frac{4(q-1)}{q^{2}}\mbox{${\cal E}$}(g^{q/2},% g^{q/2}) for g0g\geq 0 and 1<q<1<q<\infty.

xxx Do we somewhere have the following?:

(f,g)=12ijiπiqij(f(i)-f(j))(g(i)-g(j)).\mbox{${\cal E}$}(f,g)={\textstyle\frac{1}{2}}\sum_{i}\sum_{j\neq i}\pi_{i}q_{% ij}(f(i)-f(j))(g(i)-g(j)). (8.65)

Proof. For any 0a<b0\leq a<b

(bq/2-aq/2b-a)2\displaystyle\left(\frac{b^{q/2}-a^{q/2}}{b-a}\right)^{2} =\displaystyle= (q2(b-a)abtq2-1dt)2\displaystyle\left(\frac{q}{2(b-a)}\int_{a}^{b}t^{\frac{q}{2}-1}\,dt\right)^{2}
\displaystyle\leq q24(b-a)abtq-2dt=q24(q-1)bq-1-aq-1b-a.\displaystyle\frac{q^{2}}{4(b-a)}\int_{a}^{b}t^{q-2}\,dt=\frac{q^{2}}{4(q-1)}% \frac{b^{q-1}-a^{q-1}}{b-a}.

This shows that

(bq-1-aq-1)(b-a)4(q-1)q2(bq/2-aq/2)2(b^{q-1}-a^{q-1})(b-a)\geq\frac{4(q-1)}{q^{2}}(b^{q/2}-a^{q/2})^{2}

and the lemma follows easily from this and (8.65).    

Now we are prepared to bound τ^{\hat{\tau}} in terms of τl\tau_{l}.

Theorem 8.26

(a) If c0c\geq 0, then for any state ii with πie-1\pi_{i}\leq e^{-1},

Pi(Xt)-π()2e1-c\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq e^{1-c} for t12τlloglog1πi+cτ2t\geq{\textstyle\frac{1}{2}}\tau_{l}\log\log{\textstyle\frac{1}{\pi_{i}}}+c% \tau_{2}.

(b)

τ^12τlloglog1π*+2τ2τl(12loglog1π*+2).{\hat{\tau}}\leq{\textstyle\frac{1}{2}}\tau_{l}\log\log{\textstyle\frac{1}{\pi% _{*}}}+2\tau_{2}\leq\tau_{l}({\textstyle\frac{1}{2}}\log\log{\textstyle\frac{1% }{\pi_{*}}}+2).

Proof. Part (b) follows immediately from (8.54), part (a), and Lemma 8.22. To prove part (a), we begin with (8.55):

Pi(Xt)-π()2πi-1/q𝐏s2qe-(t-s)/τ2.\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq\pi^{-1/q}_{i}\,\|{\bf P}_{s}\|_{2% \to q}\,e^{-(t-s)/\tau_{2}}.

As in the second half of the proof of Theorem 8.24, let q=q(s):=1+e2s/τlq=q(s):=1+e^{2s/\tau_{l}}. Then 𝐏s2q(s)1\|{\bf P}_{s}\|_{2\to q(s)}\leq 1. Thus

Pi(Xt)-π()2πi-1/q(s)e-(t-s)/τ2,   0st.\|P_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq\pi^{-1/q(s)}_{i}\,e^{-(t-s)/\tau_{% 2}},\ \ \ 0\leq s\leq t.

Choosing s=12τlloglog(1πi)s=\frac{1}{2}\tau_{l}\log\log(\frac{1}{\pi_{i}}) we have q(s)=1+log(1πi)q(s)=1+\log(\frac{1}{\pi_{i}}) and thus

𝐏i(Xt)-π()2exp(1-t-sτ2)\|{\bf P}_{i}(X_{t}\in\cdot)-\pi(\cdot)\|_{2}\leq\exp(1-\frac{t-s}{\tau_{2}}) for tst\geq s.    

We have established the upper bound in the following corollary; for the lower bound, see Corollary 3.11 in [117].

Corollary 8.27
τlτ^τl(12loglog1π*+2).\tau_{l}\leq{\hat{\tau}}\leq\tau_{l}({\textstyle\frac{1}{2}}\log\log{% \textstyle\frac{1}{\pi_{*}}}+2).

Examples illustrating the improvement Corollary 8.27 affords over the similar result (8.7) in terms of τ2\tau_{2} are offered in Examples 8.37 and 8.40.

8.4.3 Exact computation of τl\tau_{l}

Exact computation of τl\tau_{l} is exceptionally difficult—so difficult, in fact, that τl\tau_{l} is known only for a handful of examples. We present some of these examples in this subsection.

Example 8.28

Trivial two-state chains.

We consider a discrete-time chain on {0,1}\{0,1\} that jumps in one step to stationarity (or, since the value of τl\tau_{l} is unaffected by continuization, the corresponding continuized chain). Thus (p00,p01,p10,p11)=(θ,1-θ,θ,1-θ)(p_{00},p_{01},p_{10},p_{11})=(\theta,1-\theta,\theta,1-\theta) with θ=π0=1-π1\theta=\pi_{0}=1-\pi_{1}. We also assume 0<θ1/20<\theta\leq 1/2. The claim is that

τl={log[(1-θ)/θ]2(1-2θ)if θ1/2\theta\neq 1/21if θ=1/2\theta=1/2.\tau_{l}=\left\{\begin{array}[]{ll}\frac{\log[(1-\theta)/\theta]}{2(1-2\theta)% }&\mbox{if $\theta\neq 1/2$}\\ 1&\mbox{if $\theta=1/2$.}\end{array}\right. (8.66)

Note that this is continuous and decreasing for θ(0,1/2]\theta\in(0,1/2].

To prove (8.66), we need to show that (g)/(g,g)τ(θ)\mbox{${\cal L}$}(g)/\mbox{${\cal E}$}(g,g)\leq\tau(\theta) for every nonconstant gg on {0,1}\{0,1\}, where τ(θ)\tau(\theta) denotes the righthand side of (8.66), with equality for some g0g_{0}. First suppose θ1/2\theta\neq 1/2. For the inequality, as at (8.60)–(8.61) we may suppose g0g\geq 0 and, by homogeneity,

Eπg=θg(0)+(1-θ)g(1)=1.E_{\pi}g=\theta g(0)+(1-\theta)g(1)=1.

We will work in terms of the single variable

x:=1/(g(0)-g(1)),x:=1/(g(0)-g(1)),

so that

g(0)=1+1-θx,   g(1)=1-θxg(0)=1+\frac{1-\theta}{x},\ \ \ g(1)=1-\frac{\theta}{x}

and we must consider x(-,-(1-θ)][θ,)x\in(-\infty,-(1-\theta)]\cup[\theta,\infty). We calculate

(g,g)\displaystyle\mbox{${\cal E}$}(g,g) =\displaystyle= θ(1-θ)(g(0)-g(1))2=θ(1-θ)/x2,\displaystyle\theta(1-\theta)(g(0)-g(1))^{2}=\theta(1-\theta)/x^{2},
g22\displaystyle\|g\|^{2}_{2} =\displaystyle= θ(1+1-θx)2+(1-θ)(1-θx)2\displaystyle\theta\left(1+\frac{1-\theta}{x}\right)^{2}+(1-\theta)\left(1-% \frac{\theta}{x}\right)^{2}
=\displaystyle= [θ(x+1-θ)2+(1-θ)(x-θ)2]/x2=1+θ(1-θ)x2,\displaystyle[\theta(x+1-\theta)^{2}+(1-\theta)(x-\theta)^{2}]/x^{2}=1+\frac{% \theta(1-\theta)}{x^{2}},
(x)\displaystyle\ell(x) :=\displaystyle:= (g)=θ(1+1-θx)2log(1+1-θx)\displaystyle\mbox{${\cal L}$}(g)=\theta\left(1+\frac{1-\theta}{x}\right)^{2}% \log\left(1+\frac{1-\theta}{x}\right)
     +(1-θ)(1-θx)2log(1-θx)\displaystyle     +(1-\theta)\left(1-\frac{\theta}{x}\right)^{2}\log\left(1-% \frac{\theta}{x}\right)
     -12θ(1+θ(1-θ)x2)log(1+θ(1-θ)x2)\displaystyle     -\frac{1}{2}\theta\left(1+\frac{\theta(1-\theta)}{x^{2}}% \right)\log\left(1+\frac{\theta(1-\theta)}{x^{2}}\right)
=\displaystyle= [θ(x+1-θ)2log(x+1-θ)2+(1-θ)(x-θ)2log(x-θ)2\displaystyle\left[\theta(x+1-\theta)^{2}\log(x+1-\theta)^{2}+(1-\theta)(x-% \theta)^{2}\log(x-\theta)^{2}\right.
     -(x2+θ(1-θ))log(x2+θ(1-θ))]/(2x2),\displaystyle     \left.-(x^{2}+\theta(1-\theta))\log(x^{2}+\theta(1-\theta))% \right]/(2x^{2}),
r(x)\displaystyle r(x) :=\displaystyle:= 2θ(1-θ)(g)(g,g)=2x2(x)\displaystyle 2\theta(1-\theta)\frac{\mbox{${\cal L}$}(g)}{\mbox{${\cal E}$}(g% ,g)}=2x^{2}\ell(x)
=\displaystyle= θ(x+1-θ)2log(x+1-θ)2+(1-θ)(x-θ)2log(x-θ)2\displaystyle\theta(x+1-\theta)^{2}\log(x+1-\theta)^{2}+(1-\theta)(x-\theta)^{% 2}\log(x-\theta)^{2}
     -(x2+θ(1-θ))log(x2+θ(1-θ)).\displaystyle     -(x^{2}+\theta(1-\theta))\log(x^{2}+\theta(1-\theta)).

¿From here, a straightforward but very tedious calculus exercise shows that rr decreases over (-,-(1-θ)](-\infty,-(1-\theta)], with r(-)=2θ(1-θ)r(-\infty)=2\theta(1-\theta), and that rr is strictly unimodal over [θ,)[\theta,\infty), with r(θ)=0r(\theta)=0 and r()=2θ(1-θ)r(\infty)=2\theta(1-\theta). It follows that r(x)r(x) is maximized over (-,-(1-θ)][θ,)(-\infty,-(1-\theta)]\cup[\theta,\infty) by taking xx to be the unique root to

0\displaystyle 0 =\displaystyle= r(x)=4θ(x+(1-θ))log(x+(1-θ))\displaystyle r^{\prime}(x)=4\theta(x+(1-\theta))\log(x+(1-\theta)) (8.67)
     + 4(1-θ)(x-θ)log(x-θ)-2xlog(x2+θ(1-θ))\displaystyle     +\,4(1-\theta)(x-\theta)\log(x-\theta)-2x\log(x^{2}+\theta(1% -\theta))

over (θ,)(\theta,\infty).

There is on hope for solving (8.67) explicitly unless

x2+θ(1-θ)=(x+1-θ)(x-θ),x^{2}+\theta(1-\theta)=(x+1-\theta)(x-\theta),

i.e., x=2θ(1-θ)/(1-2θ)x=2\theta(1-\theta)/(1-2\theta). Fortunately, this is a solution to (8.67), and it falls in (θ,)(\theta,\infty). The corresponding value of rr is θ(1-θ)1-2θlog1-θθ\frac{\theta(1-\theta)}{1-2\theta}\log\frac{1-\theta}{\theta}, so (8.66) follows, and we learn furthermore that the function gg maximizing (g)/(g,g)\mbox{${\cal L}$}(g)/\mbox{${\cal E}$}(g,g) is g0g_{0}, with g0(0)=12θg_{0}(0)=\frac{1}{2\theta} and g0(1)=12(1-θ)g_{0}(1)=\frac{1}{2(1-\theta)}.

For θ=1/2\theta=1/2, the major change is that now rr is increasing, rather than unimodal, over [θ,)[\theta,\infty). Thus rsup=2θ(1-θ)=1/2r_{\sup}=2\theta(1-\theta)=1/2, and (8.66) again follows.

Example 8.29

Two-state chains.

Now consider any irreducible chain (automatically reversible) on {0,1}\{0,1\}, with stationary distribution π\pi. Without loss of generality we may suppose π0π1\pi_{0}\leq\pi_{1}. We claim that

τl={π1log(π1/π0)2p01(1-2π0)if π01/2\pi_{0}\neq 1/21/(2p01)if π0=1/2\pi_{0}=1/2.\tau_{l}=\left\{\begin{array}[]{ll}\frac{\pi_{1}\log(\pi_{1}/\pi_{0})}{2p_{01}% (1-2\pi_{0})}&\mbox{if $\pi_{0}\neq 1/2$}\\ &\\ 1/(2p_{01})&\mbox{if $\pi_{0}=1/2$.}\end{array}\right.

The proof is easy. The functional (g)\mbox{${\cal L}$}(g) depends only on π\pi and so is unchanged from Example 8.28, and the Dirichlet form changes from (g,g)=π0π1(g(0)-g(1))2\mbox{${\cal E}$}(g,g)=\pi_{0}\pi_{1}(g(0)-g(1))^{2} in Example 8.28 to (g,g)=p01(g(0)-g(1))2\mbox{${\cal E}$}(g,g)=p_{01}(g(0)-g(1))^{2} here.

Remark. Recall from Chapter 5, Example yyy:4 that τ2=1/(p01+p10)=π1/p01\tau_{2}=1/(p_{01}+p_{10})=\pi_{1}/p_{01}. It follows that

τlτ2={log(π1/π0)2(1-2π0)if 0<π0<1/20<\pi_{0}<1/21if π0=1/2\pi_{0}=1/2\frac{\tau_{l}}{\tau_{2}}=\left\{\begin{array}[]{ll}\frac{\log(\pi_{1}/\pi_{0}% )}{2(1-2\pi_{0})}&\mbox{if $0<\pi_{0}<1/2$}\\ &\\ 1&\mbox{if $\pi_{0}=1/2$}\end{array}\right.

is a continuous and decreasing function of π0\pi_{0}. In particular, we have equality in Lemma 8.22 for a two-state chain if and only if π0=1/2\pi_{0}=1/2. Moreover,

τl/τ212log(1/π0) as π00\pi_{0}\to 0.\tau_{l}/\tau_{2}\sim{\textstyle\frac{1}{2}}\log(1/\pi_{0})\to\infty\mbox{\ as% $\pi_{0}\to 0$.}
Example 8.30

Trivial chains.

The proof of Lemma 8.22 and the result of Example 8.28 can be combined to prove the following result: For the “trivial” chain with pijπjp_{ij}\equiv\pi_{j}, the log-Sobolev time τl\tau_{l} is given (when π*<1/2\pi_{*}<1/2) by

τl=log(1π*-1)2(1-2π*).\tau_{l}=\frac{\log(\frac{1}{\pi_{*}}-1)}{2(1-2\pi_{*})}.

We omit the details, referring the reader to Theorem 5.1 of [117].

As an immediate corollary, we get a reverse-inequality complement to Lemma 8.22:

Corollary 8.31

For any reversible chain (with π*<1/2\pi_{*}<1/2, which is automatic for n3n\geq 3),

τlτ2log(1π*-1)2(1-2π*).\tau_{l}\leq\tau_{2}\frac{\log(\frac{1}{\pi_{*}}-1)}{2(1-2\pi_{*})}.

Proof. The result of Example 8.30 can be written

(g)(varπg)log(1π*-1)2(1-2π*),\mbox{${\cal L}$}(g)\leq({{\rm var}}_{\pi}\,g)\frac{\log(\frac{1}{\pi_{*}}-1)}% {2(1-2\pi_{*})},

and varπgτ2(g,g){{\rm var}}_{\pi}\,g\leq\tau_{2}\mbox{${\cal E}$}(g,g) by the extremal characterization of τ2\tau_{2}.    

Example 8.32

The complete graph.

It follows readily from Example 8.30 that the continuized walk of the complete graph has

τl=(n-1)log(n-1)2(n-2)12logn.\tau_{l}=\frac{(n-1)\log(n-1)}{2(n-2)}\sim\frac{1}{2}\log n.

Since τ2=(n-1)/n\tau_{2}=(n-1)/n, equality holds in Corollary 8.31 for this example.

xxx Move the following warning to follow Corollary 8.27, perhaps?

Warning.  Although the ratio of the upper bound on τ^{\hat{\tau}} to lower bound in Corollary 8.27 is smaller than that in (8.7), the upper bound in Corollary 8.27 is sometimes of larger order of magnitude than the upper bound in (8.7). For the complete graph, (8.7) says

n-1nτ^n-1n(12logn+1){\textstyle\frac{n-1}{n}}\leq{\hat{\tau}}\leq{\textstyle\frac{n-1}{n}}({% \textstyle\frac{1}{2}}\log n+1)

and Corollary 8.27 yields

(1+o(1))12lognτ^(1+o(1))14(logn)(loglogn),(1+o(1)){\textstyle\frac{1}{2}}\log n\leq{\hat{\tau}}\leq(1+o(1)){\textstyle% \frac{1}{4}}(\log n)(\log\log n),

while, from Chapter 5, yyy:(33) it follows that

τ^=12logn+O(lognn).{\hat{\tau}}={\textstyle\frac{1}{2}}\log n+O\left(\frac{\log n}{n}\right).

As another example, the product chain development in the next subsection together with Example 8.29 will give τl\tau_{l} exactly for the dd-cube. On the other hand, the exact value of τl\tau_{l} is unknown even for many of the simplest examples in Chapter 5. For instance,

Open Problem 8.33

Calculate τl\tau_{l} for the nn-cycle (Chapter 5 Example yyy:7) when n4n\geq 4.

xxx For NOTES: n=3n=3 is complete graph K3K_{3}, covered by Example 8.32. (τl=log2\tau_{l}=\log 2 for n=3n=3.)

Notwithstanding Open Problem 8.33, the value of τl\tau_{l} is known up to multiplicative constants. Indeed, it is shown in Section 4.2 in [117] that

14π2n2τl2516π2n2.\frac{1}{4\pi^{2}}n^{2}\leq\tau_{l}\leq\frac{25}{16\pi^{2}}n^{2}.

Here is a similar result we will find useful later in dealing with our running example of the grid.

Example 8.34

The mm-path with end self-loops.

For this example, discussed above in Example 8.16, we claim

2π2m2τlm2.\frac{2}{\pi^{2}}m^{2}\leq\tau_{l}\leq m^{2}.

The lower bound is easy, using Lemma 8.22:

τlτ2=(1-cos(π/m))-12π2m2.\tau_{l}\geq\tau_{2}=(1-\cos(\pi/m))^{-1}\geq\frac{2}{\pi^{2}}m^{2}.

For the upper bound we use Corollary 8.27 and estimation of τ^{\hat{\tau}}. Indeed, in Example 8.16 it was shown that

d^(2t)=N2(t)-1[1+(4t/m2)-1/2]exp(-4t/m2),   t>0.{\hat{d}}(2t)=N^{2}(t)-1\leq\left[1+(4t/m^{2})^{-1/2}\right]\,\exp(-4t/m^{2}),% \ \ \ t>0.

Substituting t=m2t=m^{2} gives d^(2t)3/2e-2<e-1\sqrt{{\hat{d}}(2t)}\leq\sqrt{3/2}\,e^{-2}<e^{-1}, so τlτ^m2\tau_{l}\leq{\hat{\tau}}\leq m^{2}.

xxx P.S. Persi (98/07/02) points out that H. T. Yau showed τl=Θ(nlogn)\tau_{l}=\Theta(n\log n) for random transpositions by combining τlτ2\tau_{l}\geq\tau_{2} (Lemma 8.22) and τl(g0)/(g0,g0)\tau_{l}\leq\mbox{${\cal L}$}(g_{0})/\mbox{${\cal E}$}(g_{0},g_{0}) with g0=delta functiong_{0}=\mbox{delta function}. I have written notes generalizing and discussing this and will incorporate them into a later version.

8.4.4 τl\tau_{l} and product chains

xxx Remind reader of definition of product chain in continuous time given in Chapter 4 Section yyy:6.2.

xxx Motivate study as providing benchmark chains for comparison method.

xxx Recall from Chapter 4, yyy:(42):

τ2=max(τ2(1),τ2(2)).\tau_{2}=\max(\tau^{(1)}_{2},\tau^{(2)}_{2}). (8.68)

xxx Product chain has transition rates equal (off diagonal) to

q(i1,i2),(j1,j2)={qi1,j1(1)if i1j1i_{1}\neq j_{1} and i2=j2i_{2}=j_{2}qi2,j2(2)if i1=j1i_{1}=j_{1} and i2j2i_{2}\neq j_{2}0otherwise.q_{(i_{1},i_{2}),(j_{1},j_{2})}=\left\{\begin{array}[]{ll}q^{(1)}_{i_{1},j_{1}% }&\mbox{if $i_{1}\neq j_{1}$ and $i_{2}=j_{2}$}\\ &\\ q^{(2)}_{i_{2},j_{2}}&\mbox{if $i_{1}=j_{1}$ and $i_{2}\neq j_{2}$}\\ &\\ 0&\mbox{otherwise.}\\ \end{array}\right. (8.69)

xxx Dirichlet form works out very nicely for products:

Lemma 8.35
(g,g)=i2πi2(2)(1)(g(,i2),g(,i2))+i1πi1(1)(2)(g(i1,),g(i1,)).\mbox{${\cal E}$}(g,g)=\sum_{i_{2}}\pi^{(2)}_{i_{2}}\mbox{${\cal E}$}^{(1)}(g(% \cdot,i_{2}),g(\cdot,i_{2}))+\sum_{i_{1}}\pi^{(1)}_{i_{1}}\mbox{${\cal E}$}^{(% 2)}(g(i_{1},\cdot),g(i_{1},\cdot)).

Proof. This follows easily from (8.69) and the definition of {\cal E} in Chapter 3 Section yyy:6.1 (cf. (68)).    

The analogue of (8.68) for the log-Sobolev time is also true:

xxx For NOTES?: Can give analagous proof of (8.68): see my notes, page 8.4.24A.

Theorem 8.36

For a continuous-time product chain,

τl=max(τl(1),τl(2)).\tau_{l}=\max(\tau^{(1)}_{l},\tau^{(2)}_{l}).

Proof. The keys to the proof are Lemma 8.35 and the following “law of total {\cal L}-functional.” Given a function g0g\not\equiv 0 on the product state space I=I1×I2I=I_{1}\times I_{2}, define a function G20G_{2}\not\equiv 0 on I2I_{2} by

G2(i2):=g(,i2)2=(i1πi1g2(i1,i2))1/2.G_{2}(i_{2}):=\|g(\cdot,i_{2})\|_{2}=\left(\sum_{i_{1}}\pi_{i_{1}}g^{2}(i_{1},% i_{2})\right)^{1/2}.

Then

(g)\displaystyle\mbox{${\cal L}$}(g) =\displaystyle= i1,i2πi1,i2g2(i1,i2)[log(|g(i1,i2)|/G2(i2))+log(G2(i2)/g2)]\displaystyle\sum_{i_{1},i_{2}}\pi_{i_{1},i_{2}}\,g^{2}(i_{1},i_{2})\left[\log% (|g(i_{1},i_{2})|/G_{2}(i_{2}))+\log(G_{2}(i_{2})/\|g\|_{2})\right]
=\displaystyle= i2πi2(2)(1)(g(,i2))+(2)(G2),\displaystyle\sum_{i_{2}}\pi^{(2)}_{i_{2}}\mbox{${\cal L}$}^{(1)}(g(\cdot,i_{2% }))+\mbox{${\cal L}$}^{(2)}(G_{2}),

where we have used

G222=g22.\|G_{2}\|^{2}_{2}=\|g\|^{2}_{2}.

Thus, using the extremal characterization (definition) (8.53) of τl(1)\tau^{(1)}_{l} and τl(2)\tau^{(2)}_{l},

(g)τl(1)i2πi2(2)(1)(g(,i2),g(,i2))+τl(2)(2)(G2,G2).\mbox{${\cal L}$}(g)\leq\tau^{(1)}_{l}\sum_{i_{2}}\pi^{(2)}_{i_{2}}\mbox{${% \cal E}$}^{(1)}(g(\cdot,i_{2}),g(\cdot,i_{2}))+\tau^{(2)}_{l}\mbox{${\cal E}$}% ^{(2)}(G_{2},G_{2}). (8.70)

But from

|G2(j2)-G2(i2)|=|g(,j2)2-g(,i2)2|g(,j2)-g(,i2)2|G_{2}(j_{2})-G_{2}(i_{2})|=\left|\|g(\cdot,j_{2})\|_{2}-\|g(\cdot,i_{2})\|_{2% }\right|\leq\|g(\cdot,j_{2})-g(\cdot,i_{2})\|_{2}

follows

(2)(G2,G2)πi1(1)(2)(g(i1, ),g(i1, )).\mbox{${\cal E}$}^{(2)}(G_{2},G_{2})\leq\sum\pi^{(1)}_{i_{1}}\mbox{${\cal E}$}% ^{(2)}(g(i_{1},\cdot),g(i_{1},\cdot)). (8.71)

From (8.70), (8.71), Lemma 8.35, and the extremal characterization of τl\tau_{l} we conclude τlmax(τl(1),τl(2))\tau_{l}\leq\max(\tau^{(1)}_{l},\tau^{(2)}_{l}). Testing on functions that depend only on one of the two variables shows that τl=max(τl(1),τl(2))\tau_{l}=\max(\tau^{(1)}_{l},\tau^{(2)}_{l}).    

Theorem 8.36 extends in the obvious fashion to higher-dimensional products.

Example 8.37

The dd-cube.

The continuized walk on the dd-cube (Chapter 5, Example yyy:15) is simply the product of dd copies of the continuized walk on the 22-path, each run at rate 1/d1/d. Therefore, since the log-Sobolev time for the 22-path equals 1/21/2 by Example 8.29, the corresponding time for the dd-cube is

τl=d/2=τ2.\tau_{l}=d/2=\tau_{2}.

¿From this and the upper bound in Corollary 8.27 we can deduce

τ^14dlogd+(1-14log1log2)d.{\hat{\tau}}\leq{\textstyle\frac{1}{4}}d\log d+(1-{\textstyle\frac{1}{4}}\log{% \textstyle\frac{1}{\log 2}})d.

As discussed in this chapter’s introduction, this bound is remarkably sharp and improves significantly upon the analogous bound that uses only knowledge of τ2\tau_{2}.  xxx Recall corrections marked on pages 8.2.11–12 of my notes.

8.4.5 The comparison method for bounding τl\tau_{l}

In Section 8.1 we compared relaxation times for two chains by using the extremal characterization and comparing Dirichlet forms and variances. For comparing variances, we used the characterization

varπg=minc𝐑g-c2.{{\rm var}}_{\pi}\,g=\min_{c\in{\bf R}}\|g-c\|_{2}.

To extend the comparison method to log-Sobolev times, we need the following similar characterization of {\cal L}.

xxx For NOTES: Cite [181].

Lemma 8.38

The functional {\cal L} in (8.52) satisfies

(g)=minc>0iπiL(g(i),c),   g0,\mbox{${\cal L}$}(g)=\min_{c>0}\sum_{i}\pi_{i}L(g(i),c),\ \ \ g\not\equiv 0, (8.72)

with

L(g(i),c):=g2(i)log(|g(i)|/c)-12(g2(i)-c2)0.L(g(i),c):=g^{2}(i)\log(|g(i)|/c)-{\textstyle\frac{1}{2}}(g^{2}(i)-c^{2})\geq 0. (8.73)

Proof. We compute

f(c)\displaystyle f(c) :=\displaystyle:= 2iπiL(g(i),c1/2)=Eπ(g2log|g|2)-g22logc-g22+c,\displaystyle 2\sum_{i}\pi_{i}L(g(i),c^{1/2})=E_{\pi}(g^{2}\log|g|^{2})-\|g\|^% {2}_{2}\log c-\|g\|^{2}_{2}+c,
f(c)\displaystyle f^{\prime}(c) =\displaystyle= 1-c-1g22,   f′′(c)=c-2g22>0.\displaystyle 1-c^{-1}\|g\|^{2}_{2},\ \ \ f^{\prime\prime}(c)=c^{-2}\|g\|^{2}_% {2}>0.

Thus ff is strictly convex and minimized by the choice c=g22c=\|g\|^{2}_{2}, and so

minc>0iπiL(g(i),c)=12minc>0f(c)=12f(g22)=(g).\min_{c>0}\sum_{i}\pi_{i}L(g(i),c)={\textstyle\frac{1}{2}}\min_{c>0}f(c)={% \textstyle\frac{1}{2}}f(\|g\|^{2}_{2})=\mbox{${\cal L}$}(g).

This proves (8.72). Finally, applying the inequality

xlog(x/y)-(x-y)0   for all x0x\geq 0y>0y>0x\log(x/y)-(x-y)\geq 0\mbox{\ \ \ for all $x\geq 0$, $y>0$}

to x=g2(i)x=g^{2}(i) and y=c2y=c^{2} gives the inequality in (8.73).    

Now it’s easy to see how to compare log-Sobolev times, since, adopting the notation of Section 8.1, Lemma 8.38 immediately yields the analogue

(g)~(g)maxi(πi/π~i)\mbox{${\cal L}$}(g)\leq{\tilde{\cal L}}(g)\max_{i}(\pi_{i}/{\tilde{\pi}}_{i})

of (8.18). In the notation of Corollary 8.2, we therefore have

Corollary 8.39 (comparison of log-Sobolev times)
τlAaτ~l.\tau_{l}\leq\frac{A}{a}{\tilde{\tau}}_{l}.
Example 8.40

Random walk on a dd-dimensional grid.

xxx Remarked in Example 8.34 that τlm2\tau_{l}\leq m^{2} for mm-path with end self-loops.

xxx So by Theorem 8.36, benchmark product chain has τ~ldm2{\tilde{\tau}}_{l}\leq dm^{2}.

Recalling A1A\leq 1 and a1/2a\geq 1/2 from Example 8.5, we therefore find

τl2m2d\tau_{l}\leq 2m^{2}d (8.74)

for random walk on the grid. Then Theorem 8.26(b) gives

τ^m2d(loglog(2n)+4),{\hat{\tau}}\leq m^{2}d(\log\log(2n)+4),

which is of order m2d(logd+loglogm)m^{2}d(\log d+\log\log m). This is an improvement on the τ2\tau_{2}-only bound O(m2d2logm)O(m^{2}d^{2}\log m) of (8.50) and may be compared with the Nash-based bound O(m2d2logd)O(m^{2}d^{2}\log d) of (8.51). In Example 8.43 we will combine Nash-inequality and log-Sobolev techniques to get a bound of order m2dlogdm^{2}d\log d

xxx right for TV.