4 Hitting and Convergence Time, and Flow Rate, Parameters for Reversible Markov Chains (October 11, 1994)

4.4 The relaxation time τ2\tau_{2}

The parameter τ2\tau_{2} is the relaxation time, defined in terms of the eigenvalue λ2\lambda_{2} (Chapter 3 section yyy) as

τ2\displaystyle\tau_{2} =\displaystyle= (1-λ2)-1 in discrete time\displaystyle(1-\lambda_{2})^{-1}\mbox{ in discrete time}
=\displaystyle= λ2-1 in continuous time.\displaystyle\lambda_{2}^{-1}\mbox{ in continuous time.}

In Chapter 3 yyy we proved a lower bound for an nn-state discrete-time chain:

τ21-1n\tau_{2}\geq 1-\frac{1}{n}

which is attained by random walk on the complete graph. We saw in Chapter 3 Theorem yyy the extremal characterization

τ2=sup{||g||22/(g,g):iπig(i)=0}.\tau_{2}=\sup\{||g||_{2}^{2}/\mbox{${\cal E}$}(g,g):\sum_{i}\pi_{i}g(i)=0\}. (4.22)

The next three lemmas give inequalities between τ2\tau_{2} and the parameters studied earlier in this chapter. Write π*miniπi\pi_{*}\equiv\min_{i}\pi_{i}.

Lemma 4.23

In continuous time,

τ2τ1τ2(1+12log1π*).\tau_{2}\leq\tau_{1}\leq\tau_{2}\left(1+\frac{1}{2}\log\frac{1}{\pi_{*}}\right).

In discrete time,

τ1(2)4ee-1τ2(1+12log1π*).\tau_{1}^{(2)}\leq\frac{4e}{e-1}\ \tau_{2}\left(1+\frac{1}{2}\log\frac{1}{\pi_% {*}}\right).
Lemma 4.24

τ2τ0(n-1)τ2\tau_{2}\leq\tau_{0}\leq(n-1)\tau_{2}.

Lemma 4.25

τ*2τ2π*\tau^{*}\leq\frac{2\tau_{2}}{\pi_{*}}.

Proof of Lemma 4.23. Consider first the continuous time case. By the spectral representation, as tt\rightarrow\infty we have pii(t)-πiciexp(-t/τ2)p_{ii}(t)-\pi_{i}\sim c_{i}\exp(-t/\tau_{2}) with some ci0c_{i}\neq 0. But by Lemma 4.5 we have |pii(t)-πi|=O(exp(-t/τ1))|p_{ii}(t)-\pi_{i}|=O(\exp(-t/\tau_{1})). This shows τ2τ1\tau_{2}\leq\tau_{1}. For the right inequality, the spectral representation gives

pii(t)-πie-t/τ2.p_{ii}(t)-\pi_{i}\leq e^{-t/\tau_{2}}. (4.23)

Recalling the definition (4.14) of d^\hat{d},

d¯(t)\displaystyle\bar{d}(t) \displaystyle\leq 2d(t)\displaystyle 2d(t)
\displaystyle\leq d^(2t) by Lemma 4.8(b)\displaystyle\sqrt{\hat{d}(2t)}\mbox{ by Lemma }\ref{hatinq}(b)
\displaystyle\leq maxie-2t/τ2πi by (4.14) and (4.23)\displaystyle\sqrt{\max_{i}\frac{e^{-2t/\tau_{2}}}{\pi_{i}}}\mbox{ by }(\ref{% dhat})\mbox{ and }(\ref{sr7})
=\displaystyle= π*-1/2e-t/τ2\displaystyle\pi_{*}^{-1/2}e^{-t/\tau_{2}}

and the result follows. The upper bound on τ1(2)\tau_{1}^{(2)} holds in continuous time by Lemmas 4.11 and 4.12, and so holds in discrete time because τ1(2)\tau_{1}^{(2)} and τ2\tau_{2} are unaffected by continuization.

Proof of Lemma 4.24. τ2τ0\tau_{2}\leq\tau_{0} because τ2\tau_{2} is the first term in the eigentime identity for τ0\tau_{0}. For the other bound, Chapter 3 Lemma yyy gives the inequality in

τ0=jπjEπTjj(1-πj)τ2=(n-1)τ2.\tau_{0}=\sum_{j}\pi_{j}E_{\pi}T_{j}\leq\sum_{j}(1-\pi_{j})\tau_{2}=(n-1)\tau_% {2}.

Proof of Lemma 4.25. Fix states a,ba,b such that EaTb+EbTa=τ*E_{a}T_{b}+E_{b}T_{a}=\tau^{*} and fix a function 0g10\leq g\leq 1 attaining the sup in the extremal characterization (Chapter 3 Theorem yyy), so that

τ*=1(g,g), g(a)=0,g(b)=1.\tau^{*}=\frac{1}{\mbox{${\cal E}$}(g,g)},\ \ g(a)=0,g(b)=1.

Write c=iπig(i)c=\sum_{i}\pi_{i}g(i). Applying the extremal characterization of τ2\tau_{2} to the centered function g-cg-c,

τ2||g-c,g-c||22(g-c,g-c)=varπg(X0)(g,g)=τ*varπg(X0).\tau_{2}\geq\frac{||g-c,g-c||_{2}^{2}}{\mbox{${\cal E}$}(g-c,g-c)}=\frac{{\rm var% }\ _{\pi}g(X_{0})}{\mbox{${\cal E}$}(g,g)}=\tau^{*}\ {\rm var}\ _{\pi}g(X_{0}).

But

varπg(X0)\displaystyle{\rm var}\ _{\pi}g(X_{0}) \displaystyle\geq πac2+πb(1-c)2\displaystyle\pi_{a}c^{2}+\pi_{b}(1-c)^{2}
\displaystyle\geq inf0y1(πay2+πb(1-y)2)\displaystyle\inf_{0\leq y\leq 1}(\pi_{a}y^{2}+\pi_{b}(1-y)^{2})
=\displaystyle= πaπbπa+πb\displaystyle\frac{\pi_{a}\pi_{b}}{\pi_{a}+\pi_{b}}
\displaystyle\geq 12min(πa,πb)\displaystyle\frac{1}{2}\min(\pi_{a},\pi_{b})
\displaystyle\geq π*/2\displaystyle\pi_{*}/2

establishing the lemma. \Box

Simple examples show that the bounds in these Lemmas cannot be much improved in general. Specifically

(a) on the complete graph (Chapter 5 Example yyy) τ2=(n-1)τ0\tau_{2}=(n-1)\tau_{0} and τ2*=2τ2π*\tau_{2}^{*}=\frac{2\tau_{2}}{\pi_{*}}.

(b) On the barbell (Chapter 5 Example yyy), τ2,τ1\tau_{2},\tau_{1} and τ0\tau_{0} are asymptotic to each other.

(c) In the M/M/1/nM/M/1/n queue, τ1/τ2=Θ(log1/π*)\tau_{1}/\tau_{2}=\Theta(\log 1/\pi_{*}) as nn\rightarrow\infty. \Box

In the context of Lemma 4.23, if we want to relate τ1disc\tau_{1}^{{\mbox{disc}}} itself to eigenvalues in discrete time we need to take almost-periodicity into account and use β=max(λ2,-λn)\beta=\max(\lambda_{2},-\lambda_{n}) in place of τ2\tau_{2}. Rephrasing the proof of Lemma 4.23 gives

Lemma 4.26

In discrete time,

1log1/βτ1𝑑𝑖𝑠𝑐1+12log1/π*log1/β\lceil\frac{1}{\log 1/\beta}\rceil\leq\tau_{1}^{{\mbox{disc}}}\leq\lceil\frac{% 1+\frac{1}{2}\log 1/\pi_{*}}{\log 1/\beta}\rceil

Regarding a discrete-time chain as random walk on a weighted graph, let Δ\Delta be the diameter of the graph. By considering the definition of the variation distance d¯(t)\bar{d}(t) and initial vertices i,ji,j at distance Δ\Delta, it is obvious that d¯(t)=1\bar{d}(t)=1 for t<Δ/2t<\Delta/2, and hence τ1discΔ/2\tau_{1}^{{\mbox{disc}}}\geq\lceil\Delta/2\rceil. Combining with the upper bound in Lemma 4.26 leads to a relationship between the diameter and the eigenvalues of a weighted graph.

Corollary 4.27
log1β2+log(1/π*)Δ.\log\frac{1}{\beta}\leq\frac{2+\log(1/\pi_{*})}{\Delta}.

This topic will be discussed further in Chapter yyy.

4.4.1 Correlations and variances for the stationary chain

Perhaps the most natural probabilistic interpretation of τ2\tau_{2} is as follows. Recall that the correlation between random variables Y,ZY,Z is

cor(Y,Z)E(YZ)-(EY)(EZ)varYvarZ.\mbox{cor}(Y,Z)\equiv\frac{E(YZ)-(EY)(EZ)}{\sqrt{{\rm var}\ Y\ {\rm var}\ Z}}.

For a stationary Markov chain define the maximal correlation function

ρ(t)maxh,gcor(h(X0),g(Xt))\rho(t)\equiv\max_{h,g}\mbox{cor}(h(X_{0}),g(X_{t}))

This makes sense for general chains (see Notes for further comments), but under our standing assumption of reversibility we have

Lemma 4.28

In continuous time,

ρ(t)=exp(-t/τ2),  t0.\rho(t)=\exp(-t/\tau_{2}),\ t\geq 0.

In discrete time,

ρ(t)=βt,  t0\rho(t)=\beta^{t},\ t\geq 0

where β=max(λ2,-λn)\beta=\max(\lambda_{2},-\lambda_{n}).

This is a translation of the Rayleigh-Ritz characterization of eigenvalues (Chapter 3 yyy) – we leave the details to the reader.

Now consider a function gg with Eπg(X0)=0E_{\pi}g(X_{0})=0 and ||g||22Eπg2(X0)>0||g||_{2}^{2}\equiv E_{\pi}g^{2}(X_{0})>0. Write

St\displaystyle S_{t}\equiv 0tg(Xs)ds\displaystyle\int_{0}^{t}g(X_{s})ds in continuous time
St\displaystyle S_{t}\equiv s=0t-1g(Xs)\displaystyle\sum_{s=0}^{t-1}g(X_{s}) in discrete time.

Recall from Chapter 2 yyy that for general chains there is a limit variance σ2=limtt-1varSt\sigma^{2}=\lim_{t\rightarrow\infty}t^{-1}{\rm var}\ S_{t}. Reversibility gives extra qualitative and quantitative information. The first result refers to the stationary chain.

Proposition 4.29

In continuous time, t-1varπStσ2t^{-1}{\rm var}\ _{\pi}S_{t}\uparrow\sigma^{2}, where

0<σ22τ2||g||22.0<\sigma^{2}\leq 2\tau_{2}||g||_{2}^{2}.

And A(t/τ2)tσ2varπSttσ2A(t/\tau_{2})t\sigma^{2}\leq{\rm var}\ _{\pi}S_{t}\leq t\sigma^{2}, where

A(u)0u(1-su)e-sds=1-u-1(1-e-u)1 as u.A(u)\equiv\int_{0}^{u}\left(1-\frac{s}{u}\right)e^{-s}ds=1-u^{-1}(1-e^{-u})% \uparrow 1\mbox{ as }u\uparrow\infty.

In discrete time,

t-1varπStσ22τ2||g||22t^{-1}{\rm var}\ _{\pi}S_{t}\rightarrow\sigma^{2}\leq 2\tau_{2}||g||_{2}^{2}
σ2t(1-2τ2t)varπStσ2t+||g||22\sigma^{2}t\left(1-\frac{2\tau_{2}}{t}\right)\leq{\rm var}\ _{\pi}S_{t}\leq% \sigma^{2}t+||g||_{2}^{2}

and so in particular

varπStt||g||22(2τ2+1t).{\rm var}\ _{\pi}S_{t}\leq t||g||_{2}^{2}(2\tau_{2}+\frac{1}{t}). (4.24)

Proof. Consider first the continuous time case. A brief calculation using the spectral representation (Chapter 3 yyy) gives

Eπg(X0)g(Xt)=m2gm2e-λmtE_{\pi}g(X_{0})g(X_{t})=\sum_{m\geq 2}g^{2}_{m}e^{-\lambda_{m}t} (4.25)

where gm=iπi1/2uimg(i)g_{m}=\sum_{i}\pi^{1/2}_{i}u_{im}g(i). So

t-1varπSt\displaystyle t^{-1}{\rm var}\ _{\pi}S_{t} =\displaystyle= t-10t0tEπg(Xu)g(Xs)dsdu\displaystyle t^{-1}\int_{0}^{t}\int_{0}^{t}E_{\pi}g(X_{u})g(X_{s})\ dsdu (4.26)
=\displaystyle= 2t-10t(t-s)Eπg(X0)g(Xs)ds\displaystyle 2t^{-1}\int_{0}^{t}(t-s)E_{\pi}g(X_{0})g(X_{s})ds
=\displaystyle= 20t(1-st)m2gm2e-λmsds\displaystyle 2\int_{0}^{t}\left(1-\frac{s}{t}\right)\sum_{m\geq 2}g^{2}_{m}e^% {-\lambda_{m}s}\ ds
=\displaystyle= 2m2gm2λmA(λmt)\displaystyle 2\sum_{m\geq 2}\frac{g^{2}_{m}}{\lambda_{m}}A(\lambda_{m}t) (4.27)

by change of variables in the integral defining A(u)A(u). The right side increases with tt to

σ22m2gm2/λm,\sigma^{2}\equiv 2\sum_{m\geq 2}g^{2}_{m}/\lambda_{m}, (4.28)

and the sum here is at most m2gm2/λ2=||g||22τ2\sum_{m\geq 2}g^{2}_{m}/\lambda_{2}=||g||_{2}^{2}\tau_{2}. On the other hand, A()A(\cdot) is increasing, so

t-1varπ(St)2m2gm2λmA(λ2t)=σ2A(t/τ2).t^{-1}{\rm var}\ _{\pi}(S_{t})\geq 2\sum_{m\geq 2}\frac{g^{2}_{m}}{\lambda_{m}% }A(\lambda_{2}t)=\sigma^{2}A(t/\tau_{2}).

In discrete time the arguments are messier, and we will omit details of calculations. The analog of (4.26) becomes

t-1varπSt=s=-(t-1)t-1(1-|i|t)m2gm2λms.t^{-1}{\rm var}\ _{\pi}S_{t}=\sum_{s=-(t-1)}^{t-1}\left(1-\frac{|i|}{t}\right)% \sum_{m\geq 2}g_{m}^{2}\lambda_{m}^{s}.

In place of the change of variables argument for (4.27), one needs an elementary calculation to get

t-1varπSt=2m2gm21-λmB(λm,t)t^{-1}{\rm var}\ _{\pi}S_{t}=2\sum_{m\geq 2}\frac{g_{m}^{2}}{1-\lambda_{m}}\ B% (\lambda_{m},t) (4.29)
where B(λ,t)=1+λ2-λ(1-λt)t(1-λ).\mbox{where }B(\lambda,t)=\frac{1+\lambda}{2}-\frac{\lambda(1-\lambda^{t})}{t(% 1-\lambda)}.

This shows

t-1varπStσ2m2gm21+λm1-λmt^{-1}{\rm var}\ _{\pi}S_{t}\rightarrow\sigma^{2}\equiv\sum_{m\geq 2}g_{m}^{2}% \frac{1+\lambda_{m}}{1-\lambda_{m}}

and the sum is bounded above by

1+λ21-λ2m2gm2=1+λ21-λ2||g||222τ2||g||22.\frac{1+\lambda_{2}}{1-\lambda_{2}}\sum_{m\geq 2}g_{m}^{2}=\frac{1+\lambda_{2}% }{1-\lambda_{2}}||g||_{2}^{2}\leq 2\tau_{2}||g||_{2}^{2}.

Next, rewrite (4.29) as

varπSt-σ2t=-2m2gm2λm(1-λmt)(1-λm)2.{\rm var}\ _{\pi}S_{t}-\sigma^{2}t=-2\sum_{m\geq 2}g_{m}^{2}\frac{\lambda_{m}(% 1-\lambda_{m}^{t})}{(1-\lambda_{m})^{2}}.

Then the upper bound for varπSt{\rm var}\ _{\pi}S_{t} follows by checking

inf-1λ<1λ(1-λt)(1-λ)2-12.\inf_{-1\leq\lambda<1}\frac{\lambda(1-\lambda^{t})}{(1-\lambda)^{2}}\geq-\frac% {1}{2}.

For the lower bound, one has to verify

sup-1λλ22λ(1-λt)(1-λ)(1+λ) is attained at λ2 (and equals CC, say)\sup_{-1\leq\lambda\leq\lambda_{2}}\frac{2\lambda(1-\lambda^{t})}{(1-\lambda)(% 1+\lambda)}\mbox{ is attained at }\lambda_{2}\mbox{ (and equals $C$, say)}

where in the sequel we may assume λ2>0\lambda_{2}>0. Then

B(λm,t)1+λm2(1-C/t),  m2B(\lambda_{m},t)\geq\frac{1+\lambda_{m}}{2}(1-C/t),\ m\geq 2

and so

t-1varπStσ2(1-C/t).t^{-1}{\rm var}\ _{\pi}S_{t}\geq\sigma^{2}\ (1-C/t).

But

Ct=2λ2(1-λ2t)t(1-λ2)(1+λ2)2t(1-λ2)=2τ2/t\frac{C}{t}=\frac{2\lambda_{2}(1-\lambda_{2}^{t})}{t(1-\lambda_{2})(1+\lambda_% {2})}\leq\frac{2}{t(1-\lambda_{2})}=2\tau_{2}/t

giving the lower bound. \Box

Note that even in discrete time it is τ2\tau_{2} that matters in Proposition 4.29. Eigenvalues near -1-1 are irrelevant, except that for a periodic chain we have σ=0\sigma=0 for one particular function gg (which?).

Continuing the study of St0tg(Xs)dsS_{t}\equiv\int_{0}^{t}g(X_{s})ds or its discrete analog for a stationary chain, standardize to the case where Eπg(X0)=0,Eπg2(X0)=1E_{\pi}g(X_{0})=0,E_{\pi}g^{2}(X_{0})=1. Proposition 4.29 provides finite-time bounds for the asymptotic approximation of variance. One would like a similar finite-time bound for the asymptotic Normal approximation of the distribution of StS_{t}.

Open Problem 4.30

Is there some explicit function ψ(b,s)0\psi(b,s)\rightarrow 0 as ss\rightarrow\infty, not depending on the chain, such that for standardized gg and continuous-time chains,

supx|Pπ(Stσt1/2x)-P(Zx)|ψ(||g||,t/τ2)\sup_{x}\left|P_{\pi}\left(\frac{S_{t}}{\sigma t^{1/2}}\leq x\right)-P(Z\leq x% )\right|\leq\psi(||g||_{\infty},t/\tau_{2})

where ||g||maxi|g(i)|||g||_{\infty}\equiv\max_{i}|g(i)| and ZZ has Normal(0,1)(0,1) distribution?

See the Notes for further comments. For the analogous result about large deviations see Chapter yyy.

4.4.2 Algorithmic issues

Suppose we want to estimate the average g¯iπig(i)\bar{g}\equiv\sum_{i}\pi_{i}g(i) of a function gg defined on state space. If we could sample i.i.d. from π\pi we would need order ε-2\varepsilon^{-2} samples to get an estimator with error about εvarπg\varepsilon\sqrt{{\rm var}\ _{\pi}g}. Now consider the setting where we cannot directly sample from π\pi but instead use the “Markov Chain Monte Carlo” method of setting up a reversible chain with stationary distribution π\pi. How many steps of the chain do we need to get the same accuracy? As in section 4.3.3, because we typically can’t quantify the closeness to π\pi of a feasible initial distribution, we consider bounds which hold for arbitrary initial states. In assessing the number of steps required, there are two opposite traps to avoid. The first is to say (cf. Proposition 4.29) that ε-2τ2\varepsilon^{-2}\tau_{2} steps suffice. This is wrong because the relaxation time bounds apply to the stationary chain and cannot be directly applied to a non-stationary chain. The second trap is to say that because it take Θ(τ1)\Theta(\tau_{1}) steps to obtain one sample from the stationary distribution, we therefore need order ε-2τ1\varepsilon^{-2}\tau_{1} steps in order to get ε-2\varepsilon^{-2} independent samples. This is wrong because we don’t need independent samples. The correct answer is order (τ1+ε-2τ2)(\tau_{1}+\varepsilon^{-2}\tau_{2}) steps. The conceptual idea (cf. the definition of τ1(2)\tau_{1}^{(2)}) is to find a stopping time achieving distribution π\pi and use it as an initial state for simulating the stationary chain. More feasible to implement is the following algorithm.

Algorithm. For a specified real number t1>0t_{1}>0 and an integer m21m_{2}\geq 1, generate M(t1)M(t_{1}) with Poisson(t1)(t_{1}) distribution. Simulate the chain XtX_{t} from arbitrary initial distribution for M(t1)+m2-1M(t_{1})+m_{2}-1 steps and calculate

A(g,t1,m2)1m2t=M(t1)M(t1)+m2-1g(Xt).A(g,t_{1},m_{2})\equiv\frac{1}{m_{2}}\sum_{t=M(t_{1})}^{M(t_{1})+m_{2}-1}g(X_{% t}).
Corollary 4.31
P(|A(g,t1,m2)-g¯|>ε||g||2)s(t1)+2τ2+1/m2ε2m2P\left(\left|A(g,t_{1},m_{2})-\bar{g}\right|>\varepsilon||g||_{2}\right)\leq s% (t_{1})+\frac{2\tau_{2}+1/m_{2}}{\varepsilon^{2}m_{2}}

where s(t)s(t) is separation (recall section 4.3.1) for the continuized chain.

To make the right side approximately δ\delta we may take

t1=τ1(1)log(2/δ);   m2=4τ2ε2δ.t_{1}=\tau_{1}^{(1)}\lceil\log(2/\delta)\rceil;\ \ \ m_{2}=\lceil\frac{4\tau_{% 2}}{\varepsilon^{2}\delta}\rceil.

Since the mean number of steps is t1+m2-1t_{1}+m_{2}-1, this formalizes the idea that we can estimate g¯\bar{g} to within ε||g||2\varepsilon||g||_{2} in order (τ1+ε-2τ2)(\tau_{1}+\varepsilon^{-2}\tau_{2}) steps.

xxx if don’t know tau‘s

Proof. We may suppose g¯=0\bar{g}=0. Since XM(t1)X_{M(t_{1})} has the distribution of the continuized chain at time t1t_{1}, we may use the definition of s(t1)s(t_{1}) to write

P(XM(t1))=(1-s(t1))π+s(t1)ρP(X_{M(t_{1})}\in\cdot)=(1-s(t_{1}))\pi\ +s(t_{1})\rho

for some probability distribution ρ\rho. It follows that

P(|A(g,t1,m2)|>ε||g||2)s(t1)+Pπ(|1m2t=0m2-1g(Xt)|>ε||g||2)P\left(\left|A(g,t_{1},m_{2})\right|>\varepsilon||g||_{2}\right)\leq s(t_{1})+% P_{\pi}\left(\left|\frac{1}{m_{2}}\sum_{t=0}^{m_{2}-1}g(X_{t})\right|>% \varepsilon||g||_{2}\right)
s(t1)+1m22ε2||g||22varπ(t=0m2-1g(Xt)).\leq s(t_{1})+\frac{1}{m_{2}^{2}\varepsilon^{2}||g||_{2}^{2}}\ {\rm var}\ _{% \pi}\left(\sum_{t=0}^{m_{2}-1}g(X_{t})\right).

Apply (4.24).

4.4.3 τ2\tau_{2} and distinguished paths

The extremal characterization (4.22) can be used to get lower bounds on τ2\tau_{2} by considering a tractable test function gg. (xxx list examples). As mentioned in Chapter 3, it is an open problem to give an extremal characterization of τ2\tau_{2} as exactly an inf over flows or similar constructs. As an alternative, Theorem 4.32 gives a non-exact upper bound on τ2\tau_{2} involving quantities derived from arbitrary choices of paths between states. An elegant exposition of this idea, expressed by the first inequality in Theorem 4.32, was given by Diaconis and Stroock [122], and Sinclair [307] noted the second inequality. We copy their proofs.

We first state the result in the setting of random walk on a weighted graph. As in section 4.1, consider a path x=i0,i1,,im=yx=i_{0},i_{1},\ldots,i_{m}=y, and call this path γxy\gamma_{xy}. This path has length |γxy|=m|\gamma_{xy}|=m and has “resistance”

r(γxy)eγxy1/wer(\gamma_{xy})\equiv\sum_{e\in\gamma_{xy}}1/w_{e}

where here and below ee denotes a directed edge.

Theorem 4.32

For each ordered pair (x,y)(x,y) of vertices in a weighted graph, let γxy\gamma_{xy} be a path from xx to yy. Then for discrete-time random walk,

τ2wmaxexyπxπyr(γxy)1(eγxy)\tau_{2}\leq w\max_{e}\sum_{x}\sum_{y}\pi_{x}\pi_{y}r(\gamma_{xy})1_{(e\in% \gamma_{xy})}
τ2wmaxe1wexyπxπy|γxy|1(eγxy).\tau_{2}\leq w\max_{e}\frac{1}{w_{e}}\sum_{x}\sum_{y}\pi_{x}\pi_{y}|\gamma_{xy% }|1_{(e\in\gamma_{xy})}.

Note that the two inequalities coincide on an unweighted graph.

Proof. For an edge e=(i,j)e=(i,j) write Δg(e)=g(j)-g(i)\Delta g(e)=g(j)-g(i). The first equality below is the fact 2var(Y1)=E(Y1-Y2)22\ {\rm var}\ (Y_{1})=E(Y_{1}-Y_{2})^{2} for i.i.d. YY’s, and the first inequality is Cauchy-Schwarz.

2||g||22\displaystyle 2||g||_{2}^{2} =\displaystyle= xyπxπy(g(y)-g(x))2\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}(g(y)-g(x))^{2} (4.30)
=\displaystyle= xyπxπy(eγxyΔg(e))2\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}\left(\sum_{e\in\gamma_{xy}}\Delta g% (e)\right)^{2}
=\displaystyle= xyπxπyr(γxy)(eγxy1wer(γxy)weΔg(e))2\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}r(\gamma_{xy})\left(\sum_{e\in% \gamma_{xy}}\frac{1}{\sqrt{w_{e}r(\gamma_{xy})}}\sqrt{w_{e}}\Delta g(e)\right)% ^{2}
\displaystyle\leq xyπxπyr(γxy)eγxywe(Δg(e))2\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}r(\gamma_{xy})\sum_{e\in\gamma_{xy}% }w_{e}(\Delta g(e))^{2} (4.31)
=\displaystyle= xyπxπyr(γxy)ewe(Δg(e))21(eγxy)\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}r(\gamma_{xy})\sum_{e}w_{e}(\Delta g% (e))^{2}1_{(e\in\gamma_{xy})}
\displaystyle\leq κewe(Δg(e))2=κ 2w(g,g)\displaystyle\kappa\sum_{e}w_{e}(\Delta g(e))^{2}=\kappa\ 2w\mbox{${\cal E}$}(% g,g) (4.32)

where κ\kappa is the max in the first inequality in the statement of the Theorem. The first inequality now follows from the extremal characterization (4.22). The second inequality makes a simpler use of the Cauchy-Schwarz inequality, in which we replace (4.30,4.31,4.32) by

=\displaystyle= xyπxπy(eγxy1Δg(e))2\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}\left(\sum_{e\in\gamma_{xy}}1\cdot% \Delta g(e)\right)^{2} (4.33)
\displaystyle\leq xyπxπy|γxy|eγxy(Δg(e))2\displaystyle\sum_{x}\sum_{y}\pi_{x}\pi_{y}|\gamma_{xy}|\sum_{e\in\gamma_{xy}}% \left(\Delta g(e)\right)^{2}
\displaystyle\leq κewe(Δg(e))2=κ 2w(g,g)\displaystyle\kappa^{\prime}\sum_{e}w_{e}(\Delta g(e))^{2}=\kappa^{\prime}\ 2w% \mbox{${\cal E}$}(g,g)

where κ\kappa^{\prime} is the max in the second inequality in the statement of the Theorem.

Remarks. (a) Theorem 4.32 applies to continuous-time (reversible) chains by setting wij=πiqijw_{ij}=\pi_{i}q_{ij}.

(b) One can replace the deterministic choice of paths γxy\gamma_{xy} by random paths Γxy\Gamma_{xy} of the form x=V0,V1,,VM=yx=V_{0},V_{1},\ldots,V_{M}=y of random length M=|Γxy|M=|\Gamma_{xy}|. The second inequality extends in the natural way, by taking expectations in (4.33) to give

xyπxπyE(|Γxy|1(eΓxy)e(Δg(e))2),\leq\sum_{x}\sum_{y}\pi_{x}\pi_{y}\ E\left(\ |\Gamma_{xy}|1_{(e\in\Gamma_{xy})% }\ \sum_{e}\left(\Delta g(e)\right)^{2}\right),

and the conclusion is

Corollary 4.33
τ2wmaxe1wexyπxπyE|Γxy|1(eΓxy).\tau_{2}\leq w\max_{e}\frac{1}{w_{e}}\sum_{x}\sum_{y}\pi_{x}\pi_{y}\ E\ |% \Gamma_{xy}|1_{(e\in\Gamma_{xy})}.

(c) Inequalities in the style of Theorem 4.32 are often called Poincaré inequalities because, to quote [122], they are “the discrete analog of the classical method of Poincaré for estimating the spectral gap of the Laplacian on a domain (see e.g. Bandle [39])”. I prefer the descriptive name the distinguished path method. This method has the same spirit as the coupling method for bounding τ1\tau_{1} (see Chapter yyy), in that we get to use our skill and judgement in making wise choices of paths in specific examples. xxx list examples. Though its main utility is in studying hard examples, we give some simple illustrations of its use below.

Write the conclusion of Corollary 4.33 as τ2wmaxe1weF(e)\tau_{2}\leq w\max_{e}\frac{1}{w_{e}}\ F(e). Consider a regular unweighted graph, and let Γx,y\Gamma_{x,y} be chosen uniformly from the set of minimum-length paths from xx to yy. Suppose that F(e)F(e) takes the same value FF for every directed edge ee. A sufficient condition for this is that the graph be arc-transitive (see Chapter 8 yyy). Then, summing over edges in Corollary 4.33,

τ2||wexyπxπyE|Γxy|1(eΓxy)=wxyπxπyE|Γxy|2\tau_{2}|\stackrel{\rightarrow}{{\cal E}}|\leq w\sum_{e}\sum_{x}\sum_{y}\pi_{x% }\pi_{y}E|\Gamma_{xy}|1_{(e\in\Gamma_{xy})}=w\sum_{x}\sum_{y}\pi_{x}\pi_{y}E|% \Gamma_{xy}|^{2}

where |||\stackrel{\rightarrow}{{\cal E}}| is the number of directed edges. Now w=||w=|\stackrel{\rightarrow}{{\cal E}}|, so we may reinterpret this inequality as follows.

Corollary 4.34

For random walk on an arc-transitive graph, τ2ED2\tau_{2}\leq ED^{2}, where D=d(ξ1,ξ2)D=d(\xi_{1},\xi_{2}) is the distance between independent uniform random vertices ξ1,ξ2\xi_{1},\xi_{2}.

In the context of the dd-dimensional torus ZNdZ^{d}_{N}, the upper bound is asymptotic (as NN\rightarrow\infty) to N2E(i=1dUi)2N^{2}E\left(\sum_{i=1}^{d}U_{i}\right)^{2} where the UiU_{i} are independent uniform [0,1/2][0,1/2], This bound is asymptotic to d(d+1/3)N2/16d(d+1/3)N^{2}/16. Here (Chapter 5 Example yyy) in fact τ2dN2/(2π2)\tau_{2}\sim dN^{2}/(2\pi^{2}), so for fixed dd the bound is off by only a constant. On the dd-cube (Chapter 5 Example yyy), DD has Binomial(d,1/2)(d,1/2) distribution and so the bound is ED2=(d2+d)/4ED^{2}=(d^{2}+d)/4, while in fact τ2=d/2\tau_{2}=d/2.

Intuitively one feels that the bound in Corollary 4.34 should hold for more general graphs, but the following example illustrates a difficulty.

Example 4.35

Consider the graph on n=2mn=2m vertices obtained from two complete graphs on mm vertices by adding mm edges comprising a matching of the two vertex-sets.

Here a straightforward implementation of Theorem 4.32 gives an upper bound of 2m2m, while in fact τ2=m/2\tau_{2}=m/2. On the other hand the conclusion of Corollary 4.34 would give an O(1)O(1) bound. Thus even though this example has a strong symmetry property (vertex-transitivity: Chapter 8 yyy) no bound like Corollary 4.34 can hold.