# 3.5 Complete monotonicity

One advantage of working in continuous time is to exploit complete monotonicity properties. Abstractly, call $f:[0,\infty)\rightarrow[0,\infty)$ completely monotone (CM) if there is a nonnegative measure $\mu$ on $[0,\infty)$ such that

 $f(t)=\int_{[0,\infty)}\!e^{-\theta t}\mu(d\theta),\ \ 0\leq t<\infty.$ (3.44)

Our applications will use only the special case of a finite sum

 $f(t)=\sum_{m}a_{m}e^{-\theta_{m}t},\mbox{\ \ for some\ }a_{m}>0,\ \theta_{m}% \geq 0,$ (3.45)

but finiteness plays no essential role. If $f$ is CM then (provided they exist) so are

 $-f^{\prime}(t),$
 $\bar{F}(t):=\int_{t}^{\infty}\!f(s)\,ds$ (3.46)

A probability distribution $\nu$ on $[0,\infty)$ is called CM if its tail distribution function $\bar{F}(t):\equiv\nu(t,\infty)$ is CM; equivalently, if its density function $f$ is CM (except that here we must in the general case allow the possibility $f(0)=\infty$). In more probabilistic language, $\nu$ is CM iff it can be expressed as the distribution of $\xi/\Lambda$, where $\xi$ and $\Lambda$ are independent random variables such that

 $\xi\mbox{\ has Exponential(1) distribution;\ \ }\Lambda>0.$ (3.47)

Given a CM function or distribution, the spectral gap $\lambda\geq 0$ can be defined consistently by

 $\displaystyle\lambda$ $\displaystyle:=$ $\displaystyle\inf\{t>0:\mu[0,t]>0\}\mbox{\ \ in setting~{}(\ref{cm1})}$ $\displaystyle\lambda$ $\displaystyle:=$ $\displaystyle\min\{\theta_{m}\}\mbox{\qquad\qquad\qquad\ in setting~{}(\ref{cm2})}$ $\displaystyle\lambda$ $\displaystyle:=$ $\displaystyle{\rm ess}\inf\Lambda\mbox{\qquad\qquad\ \ \ \ \ \ \ \, in setting~{}(\ref{xilambda})}.$

This $\lambda$ controls the behavior of $f(t)$ as $t\rightarrow\infty$. A key property of CM functions is that their value at a general time $t$ can be bounded in terms of their behavior at $0$ and at $\infty$, as follows.

###### Lemma 3.16

Let $f$ be CM with $0. Then

 $\exp\left(\frac{f^{\prime}(0)\,t}{f(0)}\right)\leq\frac{f(t)}{f(0)}\leq\frac{% \bar{F}(t)}{\bar{F}(0)}\leq\exp(-\lambda t),\ 0\leq t<\infty$

where $\lambda$ is the spectral gap.

We might have $\bar{F}(0)=\infty$, but then $\bar{F}(t)=\infty$ and $\lambda=0$ so the convention $\infty/\infty=1$ works.

Proof. By scaling we may suppose $f(0)=1$. So we can rewrite (3.44) as

 $f(t)=Ee^{-\Theta t}$ (3.48)

where $\Theta$ has distribution $\mu$. Then $f^{\prime}(t)=-E(\Theta e^{-\Theta t})$. Because $\theta\mapsto e^{-\theta t}$ is decreasing, the random variables $\Theta$ and $e^{-\Theta t}$ are negatively correlated (this fact is sometimes called “Chebyshev’s other inequality”, and makes a nice exercise [Hint: Symmetrize!]) and so $E(\Theta e^{-\Theta t})\leq(E\Theta)(Ee^{-\Theta t})$. This says $-f^{\prime}(t)\leq-f^{\prime}(0)\,f(t)$, or in other words $\frac{d}{dt}\log f(t)\geq f^{\prime}(0)$. Integrating gives $\log f(t)\geq tf^{\prime}(0)$, which is the leftmost inequality. (Recall we scaled to make $f(0)=1$.) For the second inequality,

 $\displaystyle\bar{F}(t)$ $\displaystyle=$ $\displaystyle E(\Theta^{-1}e^{-\Theta t})\mbox{\ \ by integrating~{}(\ref{xyz})}$ $\displaystyle\geq$ $\displaystyle(E\Theta^{-1})(Ee^{-\Theta t})\mbox{\ \ by positive correlation}$ $\displaystyle=$ $\displaystyle\bar{F}(0)\,f(t).$

Finally, from the definition of the spectral gap $\lambda$ it is clear that $f(t)/f(0)\leq e^{-\lambda t}$. But $\bar{F}$ has the same spectral gap as $f$.

Returning to the study of continuous-time reversible chains, the spectral representation (3.40) says that $P_{i}(X_{t}=i)$ is a CM function. It is often convenient to subtract the limit and say

 $P_{i}(X_{t}=i)-\pi_{i}\mbox{\ is a CM function.}$ (3.49)

More generally, given any function $g:I\rightarrow R$ the function

 $\rho(t):\equiv E[g(X_{t})g(X_{0})]$ (3.50)

is CM for the stationary chain, because by (3.33)

 $\displaystyle\rho(t)$ $\displaystyle=$ $\displaystyle\sum_{m=1}^{n}\left(\sum_{i}\pi^{1/2}_{i}g(i)u_{im}\right)\left(% \sum_{j}\pi^{1/2}_{j}g(j)u_{jm}\right)\exp(-\lambda_{m}t)$ (3.51) $\displaystyle=$ $\displaystyle\sum_{m=1}^{n}\left(\sum_{i}\pi^{1/2}_{i}g(i)u_{im}\right)^{2}% \exp(-\lambda_{m}t).$

Specializing to the case $g=1_{A}$ and conditioning,

 $P(X_{t}\in A\,|\,X_{0}\in A)\mbox{\ is a CM function}$ (3.52)

again assuming the stationary chain. When $A$ is a singleton, this is (3.49).

Remark. To study directly discrete-time reversible chains, one would replace CM functions by sequences $(f_{n})$ of the form

 $f_{n}=\int_{-1}^{1}\theta^{n}\ \mu(d\theta).$

But analogs of Lemma 3.16 and subsequent results (e.g., Proposition 3.22) become messier—so we prefer to derive discrete-time results by continuization.

## 3.5.1 Lower bounds on mean hitting times

As a quick application, we give bounds on mean hitting times to a single state from a stationary start. Recall $q_{i}=\sum_{j\neq i}q_{ij}$ is the exit rate from $i$, and $\tau_{2}$ is the relaxation time of the chain.

###### Lemma 3.17

For any state $i$ in a continuous-time chain,

 $\frac{(1-\pi_{i})^{2}}{q_{i}\pi_{i}}\leq E_{\pi}T_{i}\leq\frac{\tau_{2}(1-\pi_% {i})}{\pi_{i}}.$

By continuization, the Lemma holds in discrete time, replacing $q_{i}$ by $1-p_{ii}$.

Proof. The mean hitting time formula is

 $\pi_{i}E_{\pi}T_{i}=Z_{ii}=\int_{0}^{\infty}\!(P_{i}(X_{t}=i)-\pi_{i})\,dt.$

Write $f(t)$ for the integrand. We know $f$ is CM, and here $\lambda\geq\lambda_{2}$ by (3.40), and $f^{\prime}(0)=-q_{i}$, so the extreme bounds of Lemma 3.16 become, after multiplying by $f(0)=1-\pi_{i}$,

 $(1-\pi_{i})\exp(-q_{i}t/(1-\pi_{i}))\leq f(t)\leq(1-\pi_{i})e^{-\lambda_{2}t}.$

Integrating these bounds gives the result.

We can now give general lower bounds on some basic parameters we will study in Chapter 4.

###### Proposition 3.18

For a discrete-time chain on $n$ states,

 $\displaystyle\ \ \sum_{j}\pi_{j}E_{\pi}T_{j}$ $\displaystyle\geq$ $\displaystyle\frac{(n-1)^{2}}{n}$ (3.53) $\displaystyle\ \ \max_{i,j}(E_{i}T_{j}+E_{j}T_{i})$ $\displaystyle\geq$ $\displaystyle 2(n-1)$ (3.54) $\displaystyle\ \ \max_{i,j}E_{i}T_{j}$ $\displaystyle\geq$ $\displaystyle n-1$ (3.55) $\displaystyle\ \ \tau_{2}$ $\displaystyle\geq$ $\displaystyle\frac{n-1}{n}.$ (3.56)

Remark. These inequalities become equalities for random walk on the complete graph (Chapter 5 margin: 4/22/96 versionExample 9). By examining the proof, it can be shown that this is the only chain where an equality holds.

Proof. We go to the continuized chain, which has $q_{i}=1-p_{ii}\leq 1$. Then

 $\displaystyle\sum_{j}\pi_{j}E_{\pi}T_{j}$ $\displaystyle\geq$ $\displaystyle\sum_{j}(1-\pi_{j})^{2}\mbox{\ \ by Lemma~{}\ref{rlb}}$ $\displaystyle=$ $\displaystyle n-2+\sum_{j}\pi^{2}_{j}$ $\displaystyle\geq$ $\displaystyle n-2+{\textstyle\frac{1}{n}}$ $\displaystyle=$ $\displaystyle(n-1)^{2}/n,$

giving (3.53). By the eigentime identity,

 $\sum_{j}\pi_{j}E_{\pi}T_{j}=\sum_{m\geq 2}\lambda_{m}^{-1}\leq(n-1)\tau_{2}$

and so (3.56) follows from (3.53).

Now fix $i$ and write $\tau_{0}=\sum_{j}\pi_{j}E_{k}T_{j}$, which (by the random target lemma) doesn’t depend on $k$. Then

 $\sum_{j\neq i}\frac{\pi_{j}}{1-\pi_{i}}(E_{i}T_{j}+E_{j}T_{i})=\frac{\tau_{0}+% E_{\pi}T_{i}}{1-\pi_{i}}.$ (3.57)

If the right side were strictly less than $2(n-1)$ for all $i$, then

 $\sum_{i}\pi_{i}(\tau_{0}+E_{\pi}T_{i})<2(n-1)\sum_{i}\pi_{i}(1-\pi_{i}),$

which implies

 $2\tau_{0}<2(n-1)\left(1-\sum_{i}\pi^{2}_{i}\right)\leq 2(n-1)\left(1-\frac{1}{% n}\right)=\frac{2(n-1)^{2}}{n},$

contradicting (3.53). Therefore there exists an $i$ such that

 $\sum_{j\neq i}\frac{\pi_{j}}{1-\pi_{i}}(E_{i}T_{j}+E_{j}T_{i})\geq 2(n-1)$

and so there exists $j\neq i$ such that $E_{i}T_{j}+E_{j}T_{i}\geq 2(n-1)$. This is (3.54), and (3.55) follows immediately.

There are several other results in the spirit of Lemma 3.17 and Proposition 3.18. For instance, (22) in Chapter 2 margin: 9/10/99 versionsays that for a general discrete-time chain,

 ${\rm var}\ \!\!_{i}\,T^{+}_{i}=\frac{2E_{\pi}T_{i}+1}{\pi_{i}}-\frac{1}{\pi^{2% }_{i}}.$

Appealing to Lemma 3.17 gives, after a little algebra,

###### Corollary 3.19

For any state $i$ in a discrete-time chain,

 ${\rm var}_{i}T^{+}_{i}\geq\frac{(1-\pi_{i})(1-2\pi_{i})}{\pi^{2}_{i}}.$

Again, equality holds for random walk on the complete graph.

## 3.5.2 Smoothness of convergence

We’re going to build some vague discussion around the following simple result.

###### Lemma 3.20
 $\sum_{j}\frac{p_{ij}^{2}(t)}{\pi_{j}}=\frac{p_{ii}(2t)}{\pi_{i}}$ (3.58)
 $\left|\frac{p_{ik}(t+s)}{\pi_{k}}-1\right|\leq\sqrt{\left(\frac{p_{ii}(2t)}{% \pi_{i}}-1\right)\left(\frac{p_{kk}(2s)}{\pi_{k}}-1\right)}$ (3.59)
 $\max_{i,k}\frac{p_{ik}(2t)}{\pi_{k}}\leq\max_{i}\frac{p_{ii}(2t)}{\pi_{i}}$ (3.60)

Proof.

 $\frac{p_{ik}(t+s)}{\pi_{k}}=\sum_{j}p_{ij}(t)\frac{p_{jk}(s)}{\pi_{k}}=\sum_{j% }p_{ij}(t)\frac{p_{kj}(s)}{\pi_{j}}$

by reversibility. Putting $k=i$, $s=t$ gives (3.58). Rewriting the above equality as

 $\frac{p_{ik}(t+s)}{\pi_{k}}-1=\sum_{j}\pi_{j}\frac{p_{ij}(t)-\pi_{j}}{\pi_{j}}% \frac{p_{kj}(s)-\pi_{j}}{\pi_{j}}$

and applying the Cauchy–Schwarz inequality, we get the bound $\sqrt{a_{i}(t)a_{k}(s)}$, where

 $a_{i}(t)=\sum_{j}\frac{(p_{ij}(t)-\pi_{j})^{2}}{\pi_{j}}=\sum_{j}\frac{p^{2}_{% ij}(t)}{\pi_{j}}-1=\frac{p_{ii}(2t)}{\pi_{i}}-1.$

This proves (3.59). The cruder bound (3.60) is sometimes easier to use than (3.59) and is proved similarly.

Discussion. Recalling from Chapter 2 margin: 9/10/99 versionSection 4.2 the definition of $L^{2}$ distance between distributions, (3.58) says

 $\|P_{i}(X_{t}\in\cdot)-\pi\|_{2}^{2}=\frac{p_{ii}(2t)}{\pi_{i}}\ -1.$ (3.61)

In continuous time, we may regard the assertion “$\|P_{i}(X_{t}\in\cdot)-\pi\|_{2}$ is decreasing in $t$” as a consequence of the equality in (3.61) and the CM property of $p_{ii}(t)$. This assertion in fact holds for general chains, as pointed out in Chapter 2 margin: 9/10/99 versionLemma 35. Loosely, the general result of Chapter 2 margin: 9/10/99 versionLemma 35 says that in a general chain the ratios $(P_{\rho}(X_{t}=j)/\pi_{j},\,j\in I)$ considered as an unordered set tend to smooth out as $t$ increases. For a reversible chain, much more seems to be true. There is some “intrinsic geometry” on the state space such that, for the chain started at $i$, the probability distribution as time increases from $0$ “spreads out smoothly” with respect to the geometry. It’s hard to formalize that idea convincingly. On the other hand, (3.61) does say convincingly that the rate of convergence of the single probability $p_{ii}(t)$ to $\pi(i)$ is connected to a rate of convergence of the entire distribution $P_{i}(X_{t}\in\cdot)$ to $\pi(\cdot)$. This intimate connection between the local and the global behavior of reversible chains underlies many of the technical inequalities concerning mixing times in Chapter 4 margin: 10/11/94 versionand subsequent chapters.

## 3.5.3 Inequalities for hitting time distributions on subsets

We mentioned in Chapter 2 margin: 9/10/99 versionSection 2.2 that most of the simple identities there for mean hitting times $E_{i}T_{j}$ on singletons have no simple analogs for hitting times $T_{A}$ on subsets. One exception is Kac’s formula (Chapter 2 margin: 9/10/99 versionCorollary 24), which says that for a general discrete-time chain

 $E_{\pi_{A}}T^{+}_{A}=1/\pi(A).$ (3.62)

It turns out that for reversible chains there are useful inequalities relating the distributions of $T_{A}$ under different initial distributions. These are simplest in continuous time as consequences of CM: as always, interesting consequences may be applied to discrete-time chains via continuization.

Recall $\pi_{A}$ is the stationary distribution conditioned to $A$:

 $\pi_{A}(i)\equiv\pi(i)/\pi(A),\ i\in A.$

Trivially

 $\displaystyle P_{\pi}(T_{A}>t)$ $\displaystyle=$ $\displaystyle\pi(A^{c})P_{\pi_{A^{c}}}(T_{A}>t)$ (3.63) $\displaystyle E_{\pi}T_{A}$ $\displaystyle=$ $\displaystyle\pi(A^{c})E_{\pi_{A^{c}}}T_{A}.$ (3.64)

Define the ergodic exit distribution $\rho_{A}$ from $A$ by

 $\rho_{A}(j):=\frac{\sum_{i\in A}\pi_{i}q_{ij}}{Q(A,A^{c})},\ \ j\in A^{c},$ (3.65)

where $Q(A,A^{c})$ is the ergodic flow rate out of $A$:

 $Q(A,A^{c}):=\sum_{i\in A}\sum_{k\in A^{c}}\pi_{i}q_{ik}.$ (3.66)

By stationarity, $Q(A,A^{c})=Q(A^{c},A)$.

###### Proposition 3.21

Fix a subset $A$ in a continuous-time chain.

(i)$T_{A}$ has CM distribution when the initial distribution of the chain is any of the three distributions $\pi$ or $\pi_{A^{c}}$ or $\rho_{A}$.

(ii) The three hitting time distributions determine each other via (3.63) and

 $P_{\pi_{A^{c}}}(T_{A}\in(t,t+dt))=\frac{P_{\rho_{A}}(T_{A}>t)}{E_{\rho_{A}}T_{% A}}\,dt.$ (3.67)

(iii) Write $\lambda_{A}$ for the spectral gap associated with $T_{A}$ (which is the same for each of the three initial distributions). Then

 $P_{\rho_{A}}(T_{A}>t)\leq P_{\pi_{A^{c}}}(T_{A}>t)=\frac{P_{\pi}(T_{A}>t)}{\pi% (A^{c})}\leq\exp(-\lambda_{A}t),\ \ t>0$ (3.68)

and in particular

 $\frac{\pi(A^{c})}{Q(A,A^{c})}=E_{\rho_{A}}T_{A}\leq E_{\pi_{A^{c}}}T_{A}=\frac% {E_{\pi}T_{A}}{\pi(A^{c})}\leq 1/\lambda_{A}.$ (3.69)

(iv)

 $E_{\pi}T_{A}\leq\frac{\tau_{2}\pi(A^{c})}{\pi(A)}.$ (3.70)

Remarks. (a) In margin: Concerning (b): The results cited from later require that the chain restricted to $A^{c}$ be irreducible, but I think that requirement can be dropped using a limiting argument. discrete time we can define $\rho_{A}$ and $Q(A,A^{c})$ by replacing $q_{ij}$ by $p_{ij}$ in (3.65)–(3.66), and then (3.69) holds in discrete time. The left equality of (3.69) is then a reformulation of Kac’s formula (3.62), because

 $\displaystyle E_{\pi_{A}}T_{A}^{+}$ $\displaystyle=$ $\displaystyle 1+P_{\pi_{A}}(X_{1}\in A^{c})E_{\pi_{A}}(T^{+}_{A}-1|X_{1}\in A^% {c})$ $\displaystyle=$ $\displaystyle 1+\frac{Q(A,A^{c})}{\pi(A)}\ E_{\rho_{A}}T_{A}.$

(b) Equation (3.83) and Corollary 3.34 [together with remark (b) following Theorem 3.33] later show that $1/\lambda_{A}\leq\tau_{2}/\pi(A)$. So (3.70) can be regarded as a consequence of (3.69). Reverse inequalities will be studied in Chapter 4. margin: 10/11/94 version

Proof of Proposition 3.21. First consider the case where $A$ is a singleton $\{a\}$. Then (3.70) is an immediate consequence of Lemma 3.17. The equalities in (3.69) and in (3.67) are general identities for stationary processes [(24) and (23) in Chapter 2]. margin: 9/10/99 versionWe shall prove below that $T_{A}$ is CM under $P_{\pi_{I\setminus\{a\}}}$. Then by (3.63), (3.67), and (3.46), $T_{A}$ is also CM under the other two initial distributions. Then the second inequality of (3.68) is the upper bound in Lemma 3.16, and the first is a consequence of (3.67) and Lemma 3.16. And (3.69) follows from (3.68) by integrating over $t$.

To prove that $T_{A}$ is CM under $P_{\pi_{I\setminus\{a\}}}$, introduce a parameter $0<\varepsilon<1$ and consider the modified chain $(X^{\varepsilon}_{t})$ with transition rates

 $\displaystyle q^{\varepsilon}_{ij}$ $\displaystyle:=$ $\displaystyle q_{ij},\ i\neq a$ $\displaystyle q^{\varepsilon}_{aj}$ $\displaystyle:=$ $\displaystyle\varepsilon q_{aj}.$

The modified chain remains reversible, and its stationary distribution is of the form

 $\pi^{\varepsilon}_{i}=b_{1}\pi_{i},\ \ i\neq a;\qquad\pi^{\varepsilon}_{a}=b_{2}$

where the weights $b_{1},b_{2}$ depend only on $\varepsilon$ and $\pi_{a}$. Now as $\varepsilon\rightarrow 0$ with $t$ fixed,

 $P_{\pi_{I\setminus\{a\}}}(X^{\varepsilon}_{t}\in I\setminus\{a\})\rightarrow P% _{\pi_{I\setminus\{a\}}}(T_{a}>t)$ (3.71)

because the chain gets “stuck” upon hitting $a$. But the left side is CM by (3.52), so the right side (which does not depend on $\varepsilon$) is CM, because the class of CM distributions is closed under pointwise limits. (The last assertion is in general the continuity theorem for Laplace transforms [133] p. 83, though for our purposes we need only the simpler fact that the set of functions of the form (3.45) with at most $n$ summands is closed.)

This completes the proof when $A$ is a singleton. We now claim that the case of general $A$ follows from the collapsing principle (Chapter 2 margin: 9/10/99 versionSection 7.3), i.e., by applying the special case to the chain in which the subset $A$ is collapsed into a single state. This is clear for all the assertions of Proposition 3.21 except for (3.70), for which we need the fact that the relaxation time $\tau_{2}^{A}$ of the collapsed chain is at most $\tau_{2}$. This fact is proved as Corollary 3.27 below.

Remark. Note that the CM property implies a supermultiplicitivity property for hitting times from stationarity in a continuous-time reversible chain:

 $P_{\pi}(T_{A}>s+t)\geq P_{\pi}(T_{A}>s)P_{\pi}(T_{A}>t).$

Contrast with the general submultiplicitivity property (Chapter 2 margin: 9/10/99 versionSection 4.3) which holds when $P_{\pi}$ is replaced by $\max_{i}P_{i}$.

## 3.5.4 Approximate exponentiality of hitting times

In many circumstances, the distribution of the first hitting time $T_{A}$ on a subset $A$ of states with $\pi(A)$ small (equivalently, with $ET_{A}$ large) can be approximated by the exponential distribution with the same mean. As with the issue of convergence to the stationary distribution, such approximations can be proved for general chains (see Notes), but it is easier to get explicit bounds in the reversible setting. If $T$ has a CM distribution, then [as at (3.47), but replacing $1/\Lambda$ by $\Theta$] we may suppose $T\ \stackrel{d}{=}\ \Theta\xi$. We calculate

 $ET=(E\Theta)(E\xi)=E\Theta;\ \ \ ET^{2}=(E\Theta^{2})(E\xi^{2})=2E\Theta^{2}$

and so

 $\frac{ET^{2}}{2(ET)^{2}}=\frac{E\Theta^{2}}{(E\Theta)^{2}}\geq 1$

with equality iff $\Theta$ is constant, i.e., iff $T$ has exponential distribution. This suggests that the difference $\frac{ET^{2}}{2(ET)^{2}}-1$ can be used as a measure of “deviation from exponentiality”. Let us quote a result of Mark Brown ([72] Theorem 4.1(iii)) which quantifies this idea in a very simple way.

###### Proposition 3.22

Let $T$ have CM distribution. Then

 $\sup_{t}|P(T>t)-e^{-t/ET}|\leq\frac{ET^{2}}{2(ET)^{2}}-1.$

So we can use this bound for hitting times $T_{A}$ in a stationary reversible chain. At first sight the bound seems useful only if we can estimate $E_{\pi}T^{2}_{A}$ and $E_{\pi}T_{A}$ accurately. But the following remarkable variation shows that for the hitting time distribution to be approximately exponential it is sufficient that the mean hitting time be large compared to the relaxation time $\tau_{2}$.

###### Proposition 3.23

For a subset $A$ of a continuous-time chain,

 $\sup_{t}|P_{\pi}(T_{A}>t)-\exp(-t/E_{\pi}T_{A})|\leq\tau_{2}/E_{\pi}T_{A}.$

Proof. By the collapsing principle (Chapter 2 margin: 9/10/99 versionSection 7.3) we may suppose $A$ is a singleton $\{j\}$, because (Corollary 3.27 below) collapsing cannot increase the relaxation time. Combining the mean hitting time formula with the expression (3.41) for $Z_{jj}$ in terms of the spectral representation (3.33),

 $E_{\pi}T_{j}=\pi^{-1}_{j}Z_{jj}=\pi^{-1}_{j}\sum_{m\geq 2}u^{2}_{jm}\lambda^{-% 1}_{m}.$ (3.72)

A similar calculation, exhibited below, shows

 $\frac{E_{\pi}T^{2}_{j}-2(E_{\pi}T_{j})^{2}}{2}=\pi^{-1}_{j}\sum_{m\geq 2}u^{2}% _{jm}\lambda^{-2}_{m}.$ (3.73)

But $\lambda_{m}^{-2}\leq\lambda_{2}^{-1}\lambda_{m}^{-1}=\tau_{2}\lambda_{m}^{-1}$ for $m\geq 2$, so the right side of (3.73) is bounded by $\pi^{-1}_{j}\tau_{2}\sum_{m\geq 2}u^{2}_{jm}\lambda^{-1}_{m}$, which by (3.72) equals $\tau_{2}E_{\pi}T_{j}$. Applying Proposition 3.22 gives Proposition 3.23.

We give a straightforward but tedious verification of (3.73) (see also Notes). The identity $x^{2}/2=\int_{0}^{\infty}(x-t)^{+}\,dt,\ x\geq 0$ starts the margin: Chapter 2 reference in following display is to 9/10/99 version. calculation

 $\displaystyle{1\over 2}E_{\pi}T^{2}_{j}$ $\displaystyle=$ $\displaystyle\int_{0}^{\infty}\!E_{\pi}(T_{j}-t)^{+}\,dt$ $\displaystyle=$ $\displaystyle\int_{0}^{\infty}\!\sum_{i}P_{\pi}(X_{t}=i,T_{j}>t)E_{i}T_{j}\,dt$ $\displaystyle=$ $i$ $\displaystyle=$ $\displaystyle\sum_{i}\frac{Z_{jj}-Z_{ij}}{\pi_{j}}\,\frac{Z_{jj}\pi_{i}-Z_{ji}% \pi_{j}}{\pi_{j}}$ by Chapter 2 Lemmas 12 and 15 (continuous-time version) $\displaystyle=$ $\displaystyle\sum_{i}\pi^{-2}_{j}\pi_{i}(Z_{jj}-Z_{ij})^{2}.$

Expanding the square, the cross-term vanishes and the first term becomes $(Z_{jj}/\pi_{j})^{2}=(E_{\pi}T_{j})^{2}$, so

 ${\textstyle\frac{1}{2}}E_{\pi}T^{2}_{j}-(E_{\pi}T_{j})^{2}=\pi_{j}^{-2}\sum_{i% }\pi_{i}Z^{2}_{ij}.$

To finish the calculation,

 $\displaystyle\hskip{-36.135pt}\pi^{-1}_{j}\sum_{i}\pi_{i}Z^{2}_{ij}$ $\displaystyle=$ $\displaystyle\pi^{-1}_{j}\sum_{i}\pi_{i}\left(\int\!(p_{ij}(s)-\pi_{j})\,ds% \right)\left(\int\!(p_{ij}(t)-\pi_{j})\,dt\right)$ $\displaystyle=$ $\displaystyle\sum_{i}\left(\int\!(p_{ji}(s)-\pi_{i})\,ds\right)\left(\int\!(p_% {ij}(t)-\pi_{j})\,dt\right)$ $\displaystyle=$ $\displaystyle\int\!\int\!\left(p_{jj}(s+t)-\pi_{j}\right)\,ds\,dt$ $\displaystyle=$ $\displaystyle\int\!t(p_{jj}(t)-\pi_{j})\,dt$ $\displaystyle=$ $\displaystyle\int\!\sum_{m\geq 2}u^{2}_{jm}te^{-\lambda_{m}t}\,dt$ $\displaystyle=$ $\displaystyle\sum_{m\geq 2}u^{2}_{jm}\lambda^{-2}_{m}.\qquad\qquad\qquad~{}\ % \ \rule{4.3pt}{4.3pt}$

See the Notes for a related result, Theorem 3.43.