3 Reversible Markov Chains (September 10, 2002)

3.5 Complete monotonicity

One advantage of working in continuous time is to exploit complete monotonicity properties. Abstractly, call f:[0,)[0,)f:[0,\infty)\rightarrow[0,\infty) completely monotone (CM) if there is a nonnegative measure μ\mu on [0,)[0,\infty) such that

f(t)=[0,)e-θtμ(dθ),  0t<.f(t)=\int_{[0,\infty)}\!e^{-\theta t}\mu(d\theta),\ \ 0\leq t<\infty. (3.44)

Our applications will use only the special case of a finite sum

f(t)=mame-θmt,  for some am>0,  θm0,f(t)=\sum_{m}a_{m}e^{-\theta_{m}t},\mbox{\ \ for some\ }a_{m}>0,\ \theta_{m}% \geq 0, (3.45)

but finiteness plays no essential role. If ff is CM then (provided they exist) so are

-f(t),-f^{\prime}(t),
F¯(t):=tf(s)ds\bar{F}(t):=\int_{t}^{\infty}\!f(s)\,ds (3.46)

A probability distribution ν\nu on [0,)[0,\infty) is called CM if its tail distribution function F¯(t):ν(t,)\bar{F}(t):\equiv\nu(t,\infty) is CM; equivalently, if its density function ff is CM (except that here we must in the general case allow the possibility f(0)=f(0)=\infty). In more probabilistic language, ν\nu is CM iff it can be expressed as the distribution of ξ/Λ\xi/\Lambda, where ξ\xi and Λ\Lambda are independent random variables such that

ξ has Exponential(1) distribution;  Λ>0.\xi\mbox{\ has Exponential(1) distribution;\ \ }\Lambda>0. (3.47)

Given a CM function or distribution, the spectral gap λ0\lambda\geq 0 can be defined consistently by

λ\displaystyle\lambda :=\displaystyle:= inf{t>0:μ[0,t]>0}  in setting (3.44)\displaystyle\inf\{t>0:\mu[0,t]>0\}\mbox{\ \ in setting~{}(\ref{cm1})}
λ\displaystyle\lambda :=\displaystyle:= min{θm}       in setting (3.45)\displaystyle\min\{\theta_{m}\}\mbox{\qquad\qquad\qquad\ in setting~{}(\ref{cm2})}
λ\displaystyle\lambda :=\displaystyle:= essinfΛ             in setting (3.47).\displaystyle{\rm ess}\inf\Lambda\mbox{\qquad\qquad\ \ \ \ \ \ \ \, in setting~{}(\ref{xilambda})}.

This λ\lambda controls the behavior of f(t)f(t) as tt\rightarrow\infty. A key property of CM functions is that their value at a general time tt can be bounded in terms of their behavior at 00 and at \infty, as follows.

Lemma 3.16

Let ff be CM with 0<f(0)<0<f(0)<\infty. Then

exp(f(0)tf(0))f(t)f(0)F¯(t)F¯(0)exp(-λt), 0t<\exp\left(\frac{f^{\prime}(0)\,t}{f(0)}\right)\leq\frac{f(t)}{f(0)}\leq\frac{% \bar{F}(t)}{\bar{F}(0)}\leq\exp(-\lambda t),\ 0\leq t<\infty

where λ\lambda is the spectral gap.

We might have F¯(0)=\bar{F}(0)=\infty, but then F¯(t)=\bar{F}(t)=\infty and λ=0\lambda=0 so the convention /=1\infty/\infty=1 works.

Proof. By scaling we may suppose f(0)=1f(0)=1. So we can rewrite (3.44) as

f(t)=Ee-Θtf(t)=Ee^{-\Theta t} (3.48)

where Θ\Theta has distribution μ\mu. Then f(t)=-E(Θe-Θt)f^{\prime}(t)=-E(\Theta e^{-\Theta t}). Because θe-θt\theta\mapsto e^{-\theta t} is decreasing, the random variables Θ\Theta and e-Θte^{-\Theta t} are negatively correlated (this fact is sometimes called “Chebyshev’s other inequality”, and makes a nice exercise [Hint: Symmetrize!]) and so E(Θe-Θt)(EΘ)(Ee-Θt)E(\Theta e^{-\Theta t})\leq(E\Theta)(Ee^{-\Theta t}). This says -f(t)-f(0)f(t)-f^{\prime}(t)\leq-f^{\prime}(0)\,f(t), or in other words ddtlogf(t)f(0)\frac{d}{dt}\log f(t)\geq f^{\prime}(0). Integrating gives logf(t)tf(0)\log f(t)\geq tf^{\prime}(0), which is the leftmost inequality. (Recall we scaled to make f(0)=1f(0)=1.) For the second inequality,

F¯(t)\displaystyle\bar{F}(t) =\displaystyle= E(Θ-1e-Θt)  by integrating (3.48)\displaystyle E(\Theta^{-1}e^{-\Theta t})\mbox{\ \ by integrating~{}(\ref{xyz})}
\displaystyle\geq (EΘ-1)(Ee-Θt)  by positive correlation\displaystyle(E\Theta^{-1})(Ee^{-\Theta t})\mbox{\ \ by positive correlation}
=\displaystyle= F¯(0)f(t).\displaystyle\bar{F}(0)\,f(t).

Finally, from the definition of the spectral gap λ\lambda it is clear that f(t)/f(0)e-λtf(t)/f(0)\leq e^{-\lambda t}. But F¯\bar{F} has the same spectral gap as ff.    

Returning to the study of continuous-time reversible chains, the spectral representation (3.40) says that Pi(Xt=i)P_{i}(X_{t}=i) is a CM function. It is often convenient to subtract the limit and say

Pi(Xt=i)-πi is a CM function.P_{i}(X_{t}=i)-\pi_{i}\mbox{\ is a CM function.} (3.49)

More generally, given any function g:IRg:I\rightarrow R the function

ρ(t):E[g(Xt)g(X0)]\rho(t):\equiv E[g(X_{t})g(X_{0})] (3.50)

is CM for the stationary chain, because by (3.33)

ρ(t)\displaystyle\rho(t) =\displaystyle= m=1n(iπi1/2g(i)uim)(jπj1/2g(j)ujm)exp(-λmt)\displaystyle\sum_{m=1}^{n}\left(\sum_{i}\pi^{1/2}_{i}g(i)u_{im}\right)\left(% \sum_{j}\pi^{1/2}_{j}g(j)u_{jm}\right)\exp(-\lambda_{m}t) (3.51)
=\displaystyle= m=1n(iπi1/2g(i)uim)2exp(-λmt).\displaystyle\sum_{m=1}^{n}\left(\sum_{i}\pi^{1/2}_{i}g(i)u_{im}\right)^{2}% \exp(-\lambda_{m}t).

Specializing to the case g=1Ag=1_{A} and conditioning,

P(XtA|X0A) is a CM functionP(X_{t}\in A\,|\,X_{0}\in A)\mbox{\ is a CM function} (3.52)

again assuming the stationary chain. When AA is a singleton, this is (3.49).

Remark. To study directly discrete-time reversible chains, one would replace CM functions by sequences (fn)(f_{n}) of the form

fn=-11θnμ(dθ).f_{n}=\int_{-1}^{1}\theta^{n}\ \mu(d\theta).

But analogs of Lemma 3.16 and subsequent results (e.g., Proposition 3.22) become messier—so we prefer to derive discrete-time results by continuization.

3.5.1 Lower bounds on mean hitting times

As a quick application, we give bounds on mean hitting times to a single state from a stationary start. Recall qi=jiqijq_{i}=\sum_{j\neq i}q_{ij} is the exit rate from ii, and τ2\tau_{2} is the relaxation time of the chain.

Lemma 3.17

For any state ii in a continuous-time chain,

(1-πi)2qiπiEπTiτ2(1-πi)πi.\frac{(1-\pi_{i})^{2}}{q_{i}\pi_{i}}\leq E_{\pi}T_{i}\leq\frac{\tau_{2}(1-\pi_% {i})}{\pi_{i}}.

By continuization, the Lemma holds in discrete time, replacing qiq_{i} by 1-pii1-p_{ii}.

Proof. The mean hitting time formula is

πiEπTi=Zii=0(Pi(Xt=i)-πi)dt.\pi_{i}E_{\pi}T_{i}=Z_{ii}=\int_{0}^{\infty}\!(P_{i}(X_{t}=i)-\pi_{i})\,dt.

Write f(t)f(t) for the integrand. We know ff is CM, and here λλ2\lambda\geq\lambda_{2} by (3.40), and f(0)=-qif^{\prime}(0)=-q_{i}, so the extreme bounds of Lemma 3.16 become, after multiplying by f(0)=1-πif(0)=1-\pi_{i},

(1-πi)exp(-qit/(1-πi))f(t)(1-πi)e-λ2t.(1-\pi_{i})\exp(-q_{i}t/(1-\pi_{i}))\leq f(t)\leq(1-\pi_{i})e^{-\lambda_{2}t}.

Integrating these bounds gives the result.    

We can now give general lower bounds on some basic parameters we will study in Chapter 4.

Proposition 3.18

For a discrete-time chain on nn states,

 jπjEπTj\displaystyle\ \ \sum_{j}\pi_{j}E_{\pi}T_{j} \displaystyle\geq (n-1)2n\displaystyle\frac{(n-1)^{2}}{n} (3.53)
 maxi,j(EiTj+EjTi)\displaystyle\ \ \max_{i,j}(E_{i}T_{j}+E_{j}T_{i}) \displaystyle\geq 2(n-1)\displaystyle 2(n-1) (3.54)
 maxi,jEiTj\displaystyle\ \ \max_{i,j}E_{i}T_{j} \displaystyle\geq n-1\displaystyle n-1 (3.55)
 τ2\displaystyle\ \ \tau_{2} \displaystyle\geq n-1n.\displaystyle\frac{n-1}{n}. (3.56)

Remark. These inequalities become equalities for random walk on the complete graph (Chapter 5 margin: 4/22/96 versionExample 9). By examining the proof, it can be shown that this is the only chain where an equality holds.

Proof. We go to the continuized chain, which has qi=1-pii1q_{i}=1-p_{ii}\leq 1. Then

jπjEπTj\displaystyle\sum_{j}\pi_{j}E_{\pi}T_{j} \displaystyle\geq j(1-πj)2  by Lemma 3.17\displaystyle\sum_{j}(1-\pi_{j})^{2}\mbox{\ \ by Lemma~{}\ref{rlb}}
=\displaystyle= n-2+jπj2\displaystyle n-2+\sum_{j}\pi^{2}_{j}
\displaystyle\geq n-2+1n\displaystyle n-2+{\textstyle\frac{1}{n}}
=\displaystyle= (n-1)2/n,\displaystyle(n-1)^{2}/n,

giving (3.53). By the eigentime identity,

jπjEπTj=m2λm-1(n-1)τ2\sum_{j}\pi_{j}E_{\pi}T_{j}=\sum_{m\geq 2}\lambda_{m}^{-1}\leq(n-1)\tau_{2}

and so (3.56) follows from (3.53).

Now fix ii and write τ0=jπjEkTj\tau_{0}=\sum_{j}\pi_{j}E_{k}T_{j}, which (by the random target lemma) doesn’t depend on kk. Then

jiπj1-πi(EiTj+EjTi)=τ0+EπTi1-πi.\sum_{j\neq i}\frac{\pi_{j}}{1-\pi_{i}}(E_{i}T_{j}+E_{j}T_{i})=\frac{\tau_{0}+% E_{\pi}T_{i}}{1-\pi_{i}}. (3.57)

If the right side were strictly less than 2(n-1)2(n-1) for all ii, then

iπi(τ0+EπTi)<2(n-1)iπi(1-πi),\sum_{i}\pi_{i}(\tau_{0}+E_{\pi}T_{i})<2(n-1)\sum_{i}\pi_{i}(1-\pi_{i}),

which implies

2τ0<2(n-1)(1-iπi2)2(n-1)(1-1n)=2(n-1)2n,2\tau_{0}<2(n-1)\left(1-\sum_{i}\pi^{2}_{i}\right)\leq 2(n-1)\left(1-\frac{1}{% n}\right)=\frac{2(n-1)^{2}}{n},

contradicting (3.53). Therefore there exists an ii such that

jiπj1-πi(EiTj+EjTi)2(n-1)\sum_{j\neq i}\frac{\pi_{j}}{1-\pi_{i}}(E_{i}T_{j}+E_{j}T_{i})\geq 2(n-1)

and so there exists jij\neq i such that EiTj+EjTi2(n-1)E_{i}T_{j}+E_{j}T_{i}\geq 2(n-1). This is (3.54), and (3.55) follows immediately.    

There are several other results in the spirit of Lemma 3.17 and Proposition 3.18. For instance, (22) in Chapter 2 margin: 9/10/99 versionsays that for a general discrete-time chain,

var   iTi+=2EπTi+1πi-1πi2.{\rm var}\ \!\!_{i}\,T^{+}_{i}=\frac{2E_{\pi}T_{i}+1}{\pi_{i}}-\frac{1}{\pi^{2% }_{i}}.

Appealing to Lemma 3.17 gives, after a little algebra,

Corollary 3.19

For any state ii in a discrete-time chain,

variTi+(1-πi)(1-2πi)πi2.{\rm var}_{i}T^{+}_{i}\geq\frac{(1-\pi_{i})(1-2\pi_{i})}{\pi^{2}_{i}}.

Again, equality holds for random walk on the complete graph.

3.5.2 Smoothness of convergence

We’re going to build some vague discussion around the following simple result.

Lemma 3.20
jpij2(t)πj=pii(2t)πi\sum_{j}\frac{p_{ij}^{2}(t)}{\pi_{j}}=\frac{p_{ii}(2t)}{\pi_{i}} (3.58)
|pik(t+s)πk-1|(pii(2t)πi-1)(pkk(2s)πk-1)\left|\frac{p_{ik}(t+s)}{\pi_{k}}-1\right|\leq\sqrt{\left(\frac{p_{ii}(2t)}{% \pi_{i}}-1\right)\left(\frac{p_{kk}(2s)}{\pi_{k}}-1\right)} (3.59)
pik(t+s)πkpii(2t)πipkk(2s)πk  and so maxi,kpik(2t)πkmaxipii(2t)πi\max_{i,k}\frac{p_{ik}(2t)}{\pi_{k}}\leq\max_{i}\frac{p_{ii}(2t)}{\pi_{i}}.\frac{p_{ik}(t+s)}{\pi_{k}}\leq\sqrt{\frac{p_{ii}(2t)}{\pi_{i}}\frac{p_{kk}(2s% )}{\pi_{k}}}\mbox{\ \ and so $\max_{i,k}\frac{p_{ik}(2t)}{\pi_{k}}\leq\max_{i}% \frac{p_{ii}(2t)}{\pi_{i}}$}. (3.60)

Proof.

pik(t+s)πk=jpij(t)pjk(s)πk=jpij(t)pkj(s)πj\frac{p_{ik}(t+s)}{\pi_{k}}=\sum_{j}p_{ij}(t)\frac{p_{jk}(s)}{\pi_{k}}=\sum_{j% }p_{ij}(t)\frac{p_{kj}(s)}{\pi_{j}}

by reversibility. Putting k=ik=i, s=ts=t gives (3.58). Rewriting the above equality as

pik(t+s)πk-1=jπjpij(t)-πjπjpkj(s)-πjπj\frac{p_{ik}(t+s)}{\pi_{k}}-1=\sum_{j}\pi_{j}\frac{p_{ij}(t)-\pi_{j}}{\pi_{j}}% \frac{p_{kj}(s)-\pi_{j}}{\pi_{j}}

and applying the Cauchy–Schwarz inequality, we get the bound ai(t)ak(s)\sqrt{a_{i}(t)a_{k}(s)}, where

ai(t)=j(pij(t)-πj)2πj=jpij2(t)πj-1=pii(2t)πi-1.a_{i}(t)=\sum_{j}\frac{(p_{ij}(t)-\pi_{j})^{2}}{\pi_{j}}=\sum_{j}\frac{p^{2}_{% ij}(t)}{\pi_{j}}-1=\frac{p_{ii}(2t)}{\pi_{i}}-1.

This proves (3.59). The cruder bound (3.60) is sometimes easier to use than (3.59) and is proved similarly.    

Discussion. Recalling from Chapter 2 margin: 9/10/99 versionSection 4.2 the definition of L2L^{2} distance between distributions, (3.58) says

Pi(Xt)-π22=pii(2t)πi-1.\|P_{i}(X_{t}\in\cdot)-\pi\|_{2}^{2}=\frac{p_{ii}(2t)}{\pi_{i}}\ -1. (3.61)

In continuous time, we may regard the assertion “Pi(Xt)-π2\|P_{i}(X_{t}\in\cdot)-\pi\|_{2} is decreasing in tt” as a consequence of the equality in (3.61) and the CM property of pii(t)p_{ii}(t). This assertion in fact holds for general chains, as pointed out in Chapter 2 margin: 9/10/99 versionLemma 35. Loosely, the general result of Chapter 2 margin: 9/10/99 versionLemma 35 says that in a general chain the ratios (Pρ(Xt=j)/πj,jI)(P_{\rho}(X_{t}=j)/\pi_{j},\,j\in I) considered as an unordered set tend to smooth out as tt increases. For a reversible chain, much more seems to be true. There is some “intrinsic geometry” on the state space such that, for the chain started at ii, the probability distribution as time increases from 00 “spreads out smoothly” with respect to the geometry. It’s hard to formalize that idea convincingly. On the other hand, (3.61) does say convincingly that the rate of convergence of the single probability pii(t)p_{ii}(t) to π(i)\pi(i) is connected to a rate of convergence of the entire distribution Pi(Xt)P_{i}(X_{t}\in\cdot) to π()\pi(\cdot). This intimate connection between the local and the global behavior of reversible chains underlies many of the technical inequalities concerning mixing times in Chapter 4 margin: 10/11/94 versionand subsequent chapters.

3.5.3 Inequalities for hitting time distributions on subsets

We mentioned in Chapter 2 margin: 9/10/99 versionSection 2.2 that most of the simple identities there for mean hitting times EiTjE_{i}T_{j} on singletons have no simple analogs for hitting times TAT_{A} on subsets. One exception is Kac’s formula (Chapter 2 margin: 9/10/99 versionCorollary 24), which says that for a general discrete-time chain

EπATA+=1/π(A).E_{\pi_{A}}T^{+}_{A}=1/\pi(A). (3.62)

It turns out that for reversible chains there are useful inequalities relating the distributions of TAT_{A} under different initial distributions. These are simplest in continuous time as consequences of CM: as always, interesting consequences may be applied to discrete-time chains via continuization.

Recall πA\pi_{A} is the stationary distribution conditioned to AA:

πA(i)π(i)/π(A),  iA.\pi_{A}(i)\equiv\pi(i)/\pi(A),\ i\in A.

Trivially

Pπ(TA>t)\displaystyle P_{\pi}(T_{A}>t) =\displaystyle= π(Ac)PπAc(TA>t)\displaystyle\pi(A^{c})P_{\pi_{A^{c}}}(T_{A}>t) (3.63)
EπTA\displaystyle E_{\pi}T_{A} =\displaystyle= π(Ac)EπAcTA.\displaystyle\pi(A^{c})E_{\pi_{A^{c}}}T_{A}. (3.64)

Define the ergodic exit distribution ρA\rho_{A} from AA by

ρA(j):=iAπiqijQ(A,Ac), jAc,\rho_{A}(j):=\frac{\sum_{i\in A}\pi_{i}q_{ij}}{Q(A,A^{c})},\ \ j\in A^{c}, (3.65)

where Q(A,Ac)Q(A,A^{c}) is the ergodic flow rate out of AA:

Q(A,Ac):=iAkAcπiqik.Q(A,A^{c}):=\sum_{i\in A}\sum_{k\in A^{c}}\pi_{i}q_{ik}. (3.66)

By stationarity, Q(A,Ac)=Q(Ac,A)Q(A,A^{c})=Q(A^{c},A).

Proposition 3.21

Fix a subset AA in a continuous-time chain.

(i)TAT_{A} has CM distribution when the initial distribution of the chain is any of the three distributions π\pi or πAc\pi_{A^{c}} or ρA\rho_{A}.

(ii) The three hitting time distributions determine each other via (3.63) and

PπAc(TA(t,t+dt))=PρA(TA>t)EρATAdt.P_{\pi_{A^{c}}}(T_{A}\in(t,t+dt))=\frac{P_{\rho_{A}}(T_{A}>t)}{E_{\rho_{A}}T_{% A}}\,dt. (3.67)

(iii) Write λA\lambda_{A} for the spectral gap associated with TAT_{A} (which is the same for each of the three initial distributions). Then

PρA(TA>t)PπAc(TA>t)=Pπ(TA>t)π(Ac)exp(-λAt),t>0P_{\rho_{A}}(T_{A}>t)\leq P_{\pi_{A^{c}}}(T_{A}>t)=\frac{P_{\pi}(T_{A}>t)}{\pi% (A^{c})}\leq\exp(-\lambda_{A}t),\ \ t>0 (3.68)

and in particular

π(Ac)Q(A,Ac)=EρATAEπAcTA=EπTAπ(Ac)1/λA.\frac{\pi(A^{c})}{Q(A,A^{c})}=E_{\rho_{A}}T_{A}\leq E_{\pi_{A^{c}}}T_{A}=\frac% {E_{\pi}T_{A}}{\pi(A^{c})}\leq 1/\lambda_{A}. (3.69)

(iv)

EπTAτ2π(Ac)π(A).E_{\pi}T_{A}\leq\frac{\tau_{2}\pi(A^{c})}{\pi(A)}. (3.70)

Remarks. (a) In margin: Concerning (b): The results cited from later require that the chain restricted to AcA^{c} be irreducible, but I think that requirement can be dropped using a limiting argument. discrete time we can define ρA\rho_{A} and Q(A,Ac)Q(A,A^{c}) by replacing qijq_{ij} by pijp_{ij} in (3.65)–(3.66), and then (3.69) holds in discrete time. The left equality of (3.69) is then a reformulation of Kac’s formula (3.62), because

EπATA+\displaystyle E_{\pi_{A}}T_{A}^{+} =\displaystyle= 1+PπA(X1Ac)EπA(TA+-1|X1Ac)\displaystyle 1+P_{\pi_{A}}(X_{1}\in A^{c})E_{\pi_{A}}(T^{+}_{A}-1|X_{1}\in A^% {c})
=\displaystyle= 1+Q(A,Ac)π(A)EρATA.\displaystyle 1+\frac{Q(A,A^{c})}{\pi(A)}\ E_{\rho_{A}}T_{A}.

(b) Equation (3.83) and Corollary 3.34 [together with remark (b) following Theorem 3.33] later show that 1/λAτ2/π(A)1/\lambda_{A}\leq\tau_{2}/\pi(A). So (3.70) can be regarded as a consequence of (3.69). Reverse inequalities will be studied in Chapter 4. margin: 10/11/94 version

Proof of Proposition 3.21. First consider the case where AA is a singleton {a}\{a\}. Then (3.70) is an immediate consequence of Lemma 3.17. The equalities in (3.69) and in (3.67) are general identities for stationary processes [(24) and (23) in Chapter 2]. margin: 9/10/99 versionWe shall prove below that TAT_{A} is CM under PπI{a}P_{\pi_{I\setminus\{a\}}}. Then by (3.63), (3.67), and (3.46), TAT_{A} is also CM under the other two initial distributions. Then the second inequality of (3.68) is the upper bound in Lemma 3.16, and the first is a consequence of (3.67) and Lemma 3.16. And (3.69) follows from (3.68) by integrating over tt.

To prove that TAT_{A} is CM under PπI{a}P_{\pi_{I\setminus\{a\}}}, introduce a parameter 0<ε<10<\varepsilon<1 and consider the modified chain (Xtε)(X^{\varepsilon}_{t}) with transition rates

qijε\displaystyle q^{\varepsilon}_{ij} :=\displaystyle:= qij,  ia\displaystyle q_{ij},\ i\neq a
qajε\displaystyle q^{\varepsilon}_{aj} :=\displaystyle:= εqaj.\displaystyle\varepsilon q_{aj}.

The modified chain remains reversible, and its stationary distribution is of the form

πiε=b1πi, ia;  πaε=b2\pi^{\varepsilon}_{i}=b_{1}\pi_{i},\ \ i\neq a;\qquad\pi^{\varepsilon}_{a}=b_{2}

where the weights b1,b2b_{1},b_{2} depend only on ε\varepsilon and πa\pi_{a}. Now as ε0\varepsilon\rightarrow 0 with tt fixed,

PπI{a}(XtεI{a})PπI{a}(Ta>t)P_{\pi_{I\setminus\{a\}}}(X^{\varepsilon}_{t}\in I\setminus\{a\})\rightarrow P% _{\pi_{I\setminus\{a\}}}(T_{a}>t) (3.71)

because the chain gets “stuck” upon hitting aa. But the left side is CM by (3.52), so the right side (which does not depend on ε\varepsilon) is CM, because the class of CM distributions is closed under pointwise limits. (The last assertion is in general the continuity theorem for Laplace transforms [133] p. 83, though for our purposes we need only the simpler fact that the set of functions of the form (3.45) with at most nn summands is closed.)

This completes the proof when AA is a singleton. We now claim that the case of general AA follows from the collapsing principle (Chapter 2 margin: 9/10/99 versionSection 7.3), i.e., by applying the special case to the chain in which the subset AA is collapsed into a single state. This is clear for all the assertions of Proposition 3.21 except for (3.70), for which we need the fact that the relaxation time τ2A\tau_{2}^{A} of the collapsed chain is at most τ2\tau_{2}. This fact is proved as Corollary 3.27 below.    

Remark. Note that the CM property implies a supermultiplicitivity property for hitting times from stationarity in a continuous-time reversible chain:

Pπ(TA>s+t)Pπ(TA>s)Pπ(TA>t).P_{\pi}(T_{A}>s+t)\geq P_{\pi}(T_{A}>s)P_{\pi}(T_{A}>t).

Contrast with the general submultiplicitivity property (Chapter 2 margin: 9/10/99 versionSection 4.3) which holds when PπP_{\pi} is replaced by maxiPi\max_{i}P_{i}.

3.5.4 Approximate exponentiality of hitting times

In many circumstances, the distribution of the first hitting time TAT_{A} on a subset AA of states with π(A)\pi(A) small (equivalently, with ETAET_{A} large) can be approximated by the exponential distribution with the same mean. As with the issue of convergence to the stationary distribution, such approximations can be proved for general chains (see Notes), but it is easier to get explicit bounds in the reversible setting. If TT has a CM distribution, then [as at (3.47), but replacing 1/Λ1/\Lambda by Θ\Theta] we may suppose T=dΘξT\ \stackrel{d}{=}\ \Theta\xi. We calculate

ET=(EΘ)(Eξ)=EΘ;   ET2=(EΘ2)(Eξ2)=2EΘ2ET=(E\Theta)(E\xi)=E\Theta;\ \ \ ET^{2}=(E\Theta^{2})(E\xi^{2})=2E\Theta^{2}

and so

ET22(ET)2=EΘ2(EΘ)21\frac{ET^{2}}{2(ET)^{2}}=\frac{E\Theta^{2}}{(E\Theta)^{2}}\geq 1

with equality iff Θ\Theta is constant, i.e., iff TT has exponential distribution. This suggests that the difference ET22(ET)2-1\frac{ET^{2}}{2(ET)^{2}}-1 can be used as a measure of “deviation from exponentiality”. Let us quote a result of Mark Brown ([72] Theorem 4.1(iii)) which quantifies this idea in a very simple way.

Proposition 3.22

Let TT have CM distribution. Then

supt|P(T>t)-e-t/ET|ET22(ET)2-1.\sup_{t}|P(T>t)-e^{-t/ET}|\leq\frac{ET^{2}}{2(ET)^{2}}-1.

So we can use this bound for hitting times TAT_{A} in a stationary reversible chain. At first sight the bound seems useful only if we can estimate EπTA2E_{\pi}T^{2}_{A} and EπTAE_{\pi}T_{A} accurately. But the following remarkable variation shows that for the hitting time distribution to be approximately exponential it is sufficient that the mean hitting time be large compared to the relaxation time τ2\tau_{2}.

Proposition 3.23

For a subset AA of a continuous-time chain,

supt|Pπ(TA>t)-exp(-t/EπTA)|τ2/EπTA.\sup_{t}|P_{\pi}(T_{A}>t)-\exp(-t/E_{\pi}T_{A})|\leq\tau_{2}/E_{\pi}T_{A}.

Proof. By the collapsing principle (Chapter 2 margin: 9/10/99 versionSection 7.3) we may suppose AA is a singleton {j}\{j\}, because (Corollary 3.27 below) collapsing cannot increase the relaxation time. Combining the mean hitting time formula with the expression (3.41) for ZjjZ_{jj} in terms of the spectral representation (3.33),

EπTj=πj-1Zjj=πj-1m2ujm2λm-1.E_{\pi}T_{j}=\pi^{-1}_{j}Z_{jj}=\pi^{-1}_{j}\sum_{m\geq 2}u^{2}_{jm}\lambda^{-% 1}_{m}. (3.72)

A similar calculation, exhibited below, shows

EπTj2-2(EπTj)22=πj-1m2ujm2λm-2.\frac{E_{\pi}T^{2}_{j}-2(E_{\pi}T_{j})^{2}}{2}=\pi^{-1}_{j}\sum_{m\geq 2}u^{2}% _{jm}\lambda^{-2}_{m}. (3.73)

But λm-2λ2-1λm-1=τ2λm-1\lambda_{m}^{-2}\leq\lambda_{2}^{-1}\lambda_{m}^{-1}=\tau_{2}\lambda_{m}^{-1} for m2m\geq 2, so the right side of (3.73) is bounded by πj-1τ2m2ujm2λm-1\pi^{-1}_{j}\tau_{2}\sum_{m\geq 2}u^{2}_{jm}\lambda^{-1}_{m}, which by (3.72) equals τ2EπTj\tau_{2}E_{\pi}T_{j}. Applying Proposition 3.22 gives Proposition 3.23.

We give a straightforward but tedious verification of (3.73) (see also Notes). The identity x2/2=0(x-t)+dt,  x0x^{2}/2=\int_{0}^{\infty}(x-t)^{+}\,dt,\ x\geq 0 starts the margin: Chapter 2 reference in following display is to 9/10/99 version. calculation

12EπTj2\displaystyle{1\over 2}E_{\pi}T^{2}_{j} =\displaystyle= 0Eπ(Tj-t)+dt\displaystyle\int_{0}^{\infty}\!E_{\pi}(T_{j}-t)^{+}\,dt
=\displaystyle= 0iPπ(Xt=i,Tj>t)EiTjdt\displaystyle\int_{0}^{\infty}\!\sum_{i}P_{\pi}(X_{t}=i,T_{j}>t)E_{i}T_{j}\,dt
=\displaystyle= iEiTjEπ(time spent at ii before TjT_{j})\displaystyle\sum_{i}E_{i}T_{j}\,E_{\pi}(\mbox{time spent at~{}$i$ before~{}$T% _{j}$})
=\displaystyle= iZjj-ZijπjZjjπi-Zjiπjπj\displaystyle\sum_{i}\frac{Z_{jj}-Z_{ij}}{\pi_{j}}\,\frac{Z_{jj}\pi_{i}-Z_{ji}% \pi_{j}}{\pi_{j}}
  by Chapter 2 Lemmas 12 and 15 (continuous-time version)
=\displaystyle= iπj-2πi(Zjj-Zij)2.\displaystyle\sum_{i}\pi^{-2}_{j}\pi_{i}(Z_{jj}-Z_{ij})^{2}.

Expanding the square, the cross-term vanishes and the first term becomes (Zjj/πj)2=(EπTj)2(Z_{jj}/\pi_{j})^{2}=(E_{\pi}T_{j})^{2}, so

12EπTj2-(EπTj)2=πj-2iπiZij2.{\textstyle\frac{1}{2}}E_{\pi}T^{2}_{j}-(E_{\pi}T_{j})^{2}=\pi_{j}^{-2}\sum_{i% }\pi_{i}Z^{2}_{ij}.

To finish the calculation,

πj-1iπiZij2\displaystyle\hskip{-36.135pt}\pi^{-1}_{j}\sum_{i}\pi_{i}Z^{2}_{ij}
=\displaystyle= πj-1iπi((pij(s)-πj)ds)((pij(t)-πj)dt)\displaystyle\pi^{-1}_{j}\sum_{i}\pi_{i}\left(\int\!(p_{ij}(s)-\pi_{j})\,ds% \right)\left(\int\!(p_{ij}(t)-\pi_{j})\,dt\right)
=\displaystyle= i((pji(s)-πi)ds)((pij(t)-πj)dt)\displaystyle\sum_{i}\left(\int\!(p_{ji}(s)-\pi_{i})\,ds\right)\left(\int\!(p_% {ij}(t)-\pi_{j})\,dt\right)
=\displaystyle= (pjj(s+t)-πj)dsdt\displaystyle\int\!\int\!\left(p_{jj}(s+t)-\pi_{j}\right)\,ds\,dt
=\displaystyle= t(pjj(t)-πj)dt\displaystyle\int\!t(p_{jj}(t)-\pi_{j})\,dt
=\displaystyle= m2ujm2te-λmtdt\displaystyle\int\!\sum_{m\geq 2}u^{2}_{jm}te^{-\lambda_{m}t}\,dt
=\displaystyle= m2ujm2λm-2.             \displaystyle\sum_{m\geq 2}u^{2}_{jm}\lambda^{-2}_{m}.\qquad\qquad\qquad~{}\ % \ \rule{4.3pt}{4.3pt}

See the Notes for a related result, Theorem 3.43.