3 Reversible Markov Chains (September 10, 2002)

3.6 Extremal characterizations of eigenvalues

3.6.1 The Dirichlet formalism

A reversible chain has an associated Dirichlet form {\cal E}, defined as follows. For functions g:IRg:I\rightarrow R write

(g,g):=12ijiπipij(g(j)-g(i))2\mbox{${\cal E}$}(g,g):={\textstyle\frac{1}{2}}\sum_{i}\sum_{j\neq i}\pi_{i}p_% {ij}(g(j)-g(i))^{2} (3.74)

in discrete time, and substitute qijq_{ij} for pijp_{ij} in continuous time. One can immediately check the following equivalent definitions. In discrete time

(g,g)=12Eπ(g(X1)-g(X0))2=Eπ[g(X0)(g(X0)-g(X1))].\mbox{${\cal E}$}(g,g)={\textstyle\frac{1}{2}}E_{\pi}(g(X_{1})-g(X_{0}))^{2}=E% _{\pi}[g(X_{0})(g(X_{0})-g(X_{1}))]. (3.75)

In continuous time

(g,g)\displaystyle\mbox{${\cal E}$}(g,g) =\displaystyle= 12limt0t-1Eπ(g(Xt)-g(X0))2\displaystyle{\textstyle\frac{1}{2}}\lim_{t\rightarrow 0}t^{-1}E_{\pi}(g(X_{t}% )-g(X_{0}))^{2} (3.76)
=\displaystyle= limt0t-1Eπ[g(X0)(g(X0)-g(Xt))]\displaystyle\lim_{t\rightarrow 0}t^{-1}E_{\pi}[g(X_{0})(g(X_{0})-g(X_{t}))]
=\displaystyle= -ijπig(i)qijg(j)\displaystyle-\ \sum_{i}\sum_{j}\pi_{i}g(i)q_{ij}g(j)

where the sum includes j=ij=i. Note also that for random walk on a weighted graph, (3.74) becomes

(g,g):=12ijiwijw(g(j)-g(i))2.\mbox{${\cal E}$}(g,g):=\frac{1}{2}\sum_{i}\sum_{j\neq i}\frac{w_{ij}}{w}(g(j)% -g(i))^{2}. (3.77)

Recall from Chapter 2 margin: 9/10/99 versionSection 6.2 the discussion of L2L^{2} norms for functions and measures. In particular

||g||22\displaystyle||g||_{2}^{2} =\displaystyle= iπig2(i)=Eπg2(X0)\displaystyle\sum_{i}\pi_{i}g^{2}(i)=E_{\pi}g^{2}(X_{0})
μ-π22\displaystyle\|\mu-\pi\|_{2}^{2} =\displaystyle= iμi2πi-1  for a probability distribution μ\mu.\displaystyle\sum_{i}\frac{\mu_{i}^{2}}{\pi_{i}}\,-1\mbox{\ \ for a % probability distribution~{}$\mu$}.

The relevance of {\cal E} can be seen in the following lemma.

Lemma 3.24

Write ρ(t)=(ρj(t))\rho(t)=(\rho_{j}(t)) for the distribution at time tt of a continuous-time chain, with arbitrary initial distribution. Write fj(t)=ρj(t)/πjf_{j}(t)=\rho_{j}(t)/\pi_{j}. Then

ddtρ(t)-π22=-2(f(t),f(t)).\frac{d}{dt}\|\rho(t)-\pi\|_{2}^{2}=-2\mbox{${\cal E}$}(f(t),f(t)).

Proof. ρ(t)-π22=jπj-1ρj2(t)-1\|\rho(t)-\pi\|_{2}^{2}=\sum_{j}\pi^{-1}_{j}\rho^{2}_{j}(t)\,-1, so using the forward equations

ddtρj(t)=iρi(t)qij\frac{d}{dt}\rho_{j}(t)=\sum_{i}\rho_{i}(t)q_{ij}

we get

ddtρ(t)-π22\displaystyle\frac{d}{dt}\|\rho(t)-\pi\|_{2}^{2} =\displaystyle= ji2πj-1ρj(t)ρi(t)qij\displaystyle\sum_{j}\sum_{i}2\pi^{-1}_{j}\rho_{j}(t)\rho_{i}(t)q_{ij}
=\displaystyle= 2jifj(t)fi(t)πiqij\displaystyle 2\sum_{j}\sum_{i}f_{j}(t)f_{i}(t)\pi_{i}q_{ij}

and the result follows from (3.76).    

3.6.2 Summary of extremal characterizations

For ease of comparison we state below three results which will be proved in subsequent sections. These results are commonly presented “the other way up” using infs rather than sups, but our presentation is forced by our convention of consistently defining parameters to have dimensions of “time” rather than “1/time1/\mbox{time}”. The sups are over functions g:IRg:I\rightarrow R satisfying specified constraints, and excluding g0g\equiv 0. The results below are the same in continuous and discrete time—that is, continuization doesn’t change the numerical values of the quantities we consider. We shall give the proofs in discrete time.

Extremal characterization of relaxation time. The relaxation time τ2\tau_{2} satisfies

τ2=sup{||g||22/(g,g):iπig(i)=0}.\tau_{2}=\sup\{||g||_{2}^{2}/\mbox{${\cal E}$}(g,g):\sum_{i}\pi_{i}g(i)=0\}.

Extremal characterization of quasistationary mean hitting time. Given a subset AA, let αA\alpha_{A} be the quasistationary distribution on AcA^{c} defined at (3.82). Then the quasistationary mean exit time is

EαATA=sup{||g||22/(g,g):g0g\geq 0,  g=0g=0 on AA}.E_{\alpha_{A}}T_{A}=\sup\{||g||_{2}^{2}/\mbox{${\cal E}$}(g,g):\mbox{$g\geq 0$% ,\ \,$g=0$ on~{}$A$}\}.

Extremal characterization of mean commute times. For distinct states i,ji,j the mean commute time satisfies

EiTj+EjTi=sup{1/(g,g):0g1, g(i)=1, g(j)=0}.E_{i}T_{j}+E_{j}T_{i}=\sup\{1/\mbox{${\cal E}$}(g,g):0\leq g\leq 1,\ \,g(i)=1,% \,\ g(j)=0\}.

Because the state space is finite, the sups are attained, and there are theoretical descriptions of the gg attaining the extrema in all three cases. An immediate practical use of these characterizations in concrete examples is to obtain lower bounds on the parameters by inspired guesswork, that is by choosing some simple explicit “test function” gg which seems qualitatively right and computing the right-hand quantity. See Chapter 14 margin: 3/10/94 versionExample 32 for a typical example. Of course we cannot obtain upper bounds this way, but extremal characterizations can be used as a starting point for further theoretical work (see in particular the bounds on τ2\tau_{2} in Chapter 4 margin: 10/11/94 versionSection 4).

3.6.3 The extremal characterization of relaxation time

The first two extremal characterizations are in fact just reformulations of the classical Rayleigh–Ritz extremal characterization of eigenvalues, which goes as follows ([183] Theorem 4.2.2 and eq. 4.2.7). Let SS be a symmetric matrix with eigenvalues μ1μ2\mu_{1}\geq\mu_{2}\geq\cdots. Then

μ1=sup𝐱ijxisijxjixi2\mu_{1}=\sup_{{\bf x}}\frac{\sum_{i}\sum_{j}x_{i}s_{ij}x_{j}}{\sum_{i}x^{2}_{i}} (3.78)

and an 𝐱{\bf x} attaining the sup is an eigenvalue corresponding to μ1\mu_{1} (of course sups are over 𝐱0{\bf x}\not\equiv 0). And

μ2=sup𝐲:iyixi=0ijyisijyjiyi2\mu_{2}=\sup_{{\bf y}:\sum_{i}y_{i}x_{i}=0}\frac{\sum_{i}\sum_{j}y_{i}s_{ij}y_% {j}}{\sum_{i}y^{2}_{i}} (3.79)

and a 𝐲{\bf y} attaining the sup is an eigenvalue corresponding to μ2\mu_{2}.

As observed in Section 3.4, given a discrete-time chain with transition matrix PP, the symmetric matrix (sij=πi1/2pijπj-1/2)(s_{ij}=\pi^{1/2}_{i}p_{ij}\pi^{-1/2}_{j}) has maximal eigenvalue 11 with corresponding eigenvector (πi1/2)(\pi^{1/2}_{i}). So applying (3.79) and writing yi=πi1/2g(i)y_{i}=\pi^{1/2}_{i}g(i), the second-largest eigenvalue (of SS and hence of PP) is given by

λ2=supg:iπig(i)=0ijπig(i)pijg(j)iπig2(i).\lambda_{2}=\sup_{g:\sum_{i}\pi_{i}g(i)=0}\frac{\sum_{i}\sum_{j}\pi_{i}g(i)p_{% ij}g(j)}{\sum_{i}\pi_{i}g^{2}(i)}.

In probabilistic notation the fraction is

Eπ[g(X0)g(X1)]Eπg2(X0)=1-Eπ[g(X0)(g(X1)-g(X0))]Eπg2(X0)=1-(g,g)g22.\frac{E_{\pi}[g(X_{0})g(X_{1})]}{E_{\pi}g^{2}(X_{0})}=1-\frac{E_{\pi}[g(X_{0})% (g(X_{1})-g(X_{0}))]}{E_{\pi}g^{2}(X_{0})}=1-\frac{\mbox{${\cal E}$}(g,g)}{\|g% \|_{2}^{2}}.

Since τ2=1/(1-λ2)\tau_{2}=1/(1-\lambda_{2}) in discrete time we have proved the first of our extremal characterizations.

Theorem 3.25 (Extremal characterization of relaxation time)

The relaxation time τ2\tau_{2} satisfies

τ2=sup{||g||22/(g,g):iπig(i)=0}.\tau_{2}=\sup\{||g||_{2}^{2}/\mbox{${\cal E}$}(g,g):\sum_{i}\pi_{i}g(i)=0\}.

A function g0g_{0}, say, attaining the sup in the extremal characterization is, by examining the argument above, a right eigenvector associated with λ2\lambda_{2}:

jpijg0(j)=λ2g0(i).\sum_{j}p_{ij}g_{0}(j)=\lambda_{2}g_{0}(i).

(From this point on in the discussion, we assume g0g_{0} is normalized so that g02=1\|g_{0}\|_{2}=1.)  The corresponding left eigenvector θ\theta :

iθipij=λ2θj for all jj\sum_{i}\theta_{i}p_{ij}=\lambda_{2}\theta_{j}\mbox{\ for all~{}$j$}

is the signed measure θ\theta such that θi=πig0(i)\theta_{i}=\pi_{i}g_{0}(i). To continue a somewhat informal discussion of the interpretation of g0g_{0}, it is convenient to switch to continuous time (to avoid issues of negative eigenvalues) and to assume λ2\lambda_{2} has multiplicity 11. The equation which relates distribution at time tt to initial distribution,

ρj(t)=iρi(0)pij(t),\rho_{j}(t)=\sum_{i}\rho_{i}(0)p_{ij}(t),

can also be used to define signed measures evolving from an initial signed measure. For the initial measure θ\theta we have

θ(t)=e-t/τ2θ.\theta(t)=e^{-t/\tau_{2}}\theta.

For any signed measure ν=ν(0)\nu=\nu(0) with iνi(0)=0\sum_{i}\nu_{i}(0)=0 we have

ν(t)ce-t/τ2θ; c=iνi(0)θi/πi=iνi(0)g0(i).\nu(t)\sim ce^{-t/\tau_{2}}\theta;\ \ c=\sum_{i}\nu_{i}(0)\theta_{i}/\pi_{i}=% \sum_{i}\nu_{i}(0)g_{0}(i).

So θ\theta can be regarded as “the signed measure which relaxes to 00 most slowly”. For a probability measure ρ(0)\rho(0), considering ρ(0)-π\rho(0)-\pi gives

ρ(t)-πce-t/τ2θ, c=i(ρi(0)-πi)g0(i)=iρi(0)g0(i).\rho(t)-\pi\sim ce^{-t/\tau_{2}}\theta,\ \ c=\sum_{i}(\rho_{i}(0)-\pi_{i})g_{0% }(i)=\sum_{i}\rho_{i}(0)g_{0}(i). (3.80)

So θ\theta has the interpretation of “the asymptotic normalized difference between the true distribution at time tt and the stationary distribution”. Finally, from (3.80) with ρ(0)\rho(0) concentrated at ii (or from the spectral representation)

Pi(Xt)-πg0(i)e-t/τ2θ.P_{i}(X_{t}\in\cdot)-\pi\sim g_{0}(i)e^{-t/\tau_{2}}\theta.

So g0g_{0} has the interpretation of “the asymptotic normalized size of deviation from stationarity, as a function of the starting state”. When the state space has some geometric structure – jumps go to nearby states – one expects g0g_{0} to be a “smooth” function, exemplified by the cosine function arising in the nn-cycle (Chapter 5 margin: 4/22/96 versionExample 7).

3.6.4 Simple applications

Here is a fundamental “finite-time” result. margin: Good name for Lemma 3.26 – looks good to DA !

Lemma 3.26 (L2L^{2} contraction lemma)

Write ρ(t)=(ρj(t))\rho(t)=(\rho_{j}(t)) for the distribution at time tt of a continuous-time chain, with arbitrary initial distribution. Then

ρ(t)-π2e-t/τ2ρ(0)-π2.\|\rho(t)-\pi\|_{2}\leq e^{-t/\tau_{2}}\|\rho(0)-\pi\|_{2}.

Proof. Write fj(t)=ρj(t)/πjf_{j}(t)=\rho_{j}(t)/\pi_{j}. Then

ddtρ(t)-π22\displaystyle\frac{d}{dt}\|\rho(t)-\pi\|_{2}^{2} =\displaystyle= -2(f(t),f(t))  by Lemma 3.24\displaystyle-2\mbox{${\cal E}$}(f(t),f(t))\mbox{\ \ by Lemma~{}\ref{Efe}}
=\displaystyle= -2(f(t)-1,f(t)-1)\displaystyle-2\mbox{${\cal E}$}(f(t)-1,f(t)-1)
\displaystyle\leq -2f(t)-122τ2  by the extremal characterization of τ2\tau_{2}\displaystyle-2\,\frac{\|f(t)-1\|_{2}^{2}}{\tau_{2}}\mbox{\ \ by the extremal characterization of~{}$\tau_{2}$}
=\displaystyle= -2τ2ρ(t)-π22.\displaystyle\frac{-2}{\tau_{2}}\,\|\rho(t)-\pi\|_{2}^{2}.

Integrating, ρ(t)-π22e-2t/τ2ρ(0)-π22\|\rho(t)-\pi\|_{2}^{2}\leq e^{-2t/\tau_{2}}\|\rho(0)-\pi\|_{2}^{2}, and the result follows.    

Alternatively, Lemma 3.26 follows by observing that

ρ(t)-π22\|\rho(t)-\pi\|^{2}_{2} is CM with spectral gap at least 2λ2=2/τ22\lambda_{2}=2/\tau_{2} (3.81)

and applying Lemma 3.16. The fact (3.81) can be established directly from the spectral representation, but we will instead apply the observation at (3.50)–(3.51). Indeed, with g(i):ρi(0)/πig(i):\equiv\rho_{i}(0)/\pi_{i}, we have

Eπ[g(X2t)g(X0)]\displaystyle E_{\pi}\left[g(X_{2t})g(X_{0})\right] =\displaystyle= iρi(0)jpij(2t)ρj(0)πj\displaystyle\sum_{i}\rho_{i}(0)\sum_{j}p_{ij}(2t)\frac{\rho_{j}(0)}{\pi_{j}}
=\displaystyle= iρi(0)jkpik(t)pkj(t)ρj(0)πj\displaystyle\sum_{i}\rho_{i}(0)\sum_{j}\sum_{k}p_{ik}(t)p_{kj}(t)\frac{\rho_{% j}(0)}{\pi_{j}}
=\displaystyle= k1πk[iρi(0)pik(t)][jρj(0)pjk(t)]\displaystyle\sum_{k}\frac{1}{\pi_{k}}\left[\sum_{i}\rho_{i}(0)p_{ik}(t)\right% ]\left[\sum_{j}\rho_{j}(0)p_{jk}(t)\right]
=\displaystyle= k1πkρk2(t)=ρ(t)-π22+1.\displaystyle\sum_{k}\frac{1}{\pi_{k}}\rho^{2}_{k}(t)=\|\rho(t)-\pi\|^{2}_{2}+1.

Thus by (3.51)

ρ(t)-π22=m=2n(iπi1/2g(i)uim)2exp(-λmt).\|\rho(t)-\pi\|^{2}_{2}=\sum_{m=2}^{n}\left(\sum_{i}\pi^{1/2}_{i}g(i)u_{im}% \right)^{2}\exp(-\lambda_{m}t).

Our main use of the extremal characterization is to compare relaxation times of different chains on the same (or essentially the same) state space. Here are three instances. The first is a result we have already exploited in Section 3.5.

Corollary 3.27

Given a chain with relaxation time τ2\tau_{2}, let τ2A\tau^{A}_{2} be the relaxation time of the chain with subset AA collapsed to a singleton {a}\{a\} (Chapter 2 margin: 9/10/99 versionSection 7.3). Then τ2Aτ2\tau^{A}_{2}\leq\tau_{2}.

Proof. Any function gg on the states of the collapsed chain can be extended to the original state space by setting g=g(a)g=g(a) on AA, and (g,g)\mbox{${\cal E}$}(g,g) and iπig(i)\sum_{i}\pi_{i}g(i) and ||g||22||g||_{2}^{2} are unchanged. So consider a gg attaining the sup in the extremal characterization of τ2A\tau^{A}_{2} and use this as a test function in the extremal characterization of τ2\tau_{2}.    

Remark. An extension of Corollary 3.27 will be provided by the contraction principle (Chapter 4 margin: 10/11/94 versionProposition 44).

Corollary 3.28

Let τ2\tau_{2} be the relaxation time for a “fluid model” continuous-time chain associated with a graph with weights (we)(w_{e}) [recall (3.16)] and let τ2*\tau_{2}^{*} be the relaxation time when the weights are (we*)(w^{*}_{e}). If we*wew^{*}_{e}\geq w_{e} for all edges ee then τ2*τ2\tau_{2}^{*}\leq\tau_{2}.

Proof. Each stationary distribution is uniform, so g22=g2*2\|g\|_{2}^{2}=\|g\|_{2}^{*2} while *(g,g)(g,g)\mbox{${\cal E}$}^{*}(g,g)\geq\mbox{${\cal E}$}(g,g). So the result is immediate from the extremal characterization.    

The next result is a prototype for more complicated “indirect comparison” arguments later. margin: refer to not-net-written Poincare chapter It is convenient to state it in terms of random walk on a weighted graph. Recall (Section 3.2) that a reversible chain specifies a weighted graph with edge-weights wij=πipijw_{ij}=\pi_{i}p_{ij}, vertex-weights wi=πiw_{i}=\pi_{i}, and total weight w=1w=1.

Lemma 3.29 (the direct comparison lemma)

Let (we)(w_{e}) and (we*)(w^{*}_{e}) be edge-weights on a graph, let (wi)(w_{i}) and (wi*)(w^{*}_{i}) be the vertex-weights, and let τ2\tau_{2} and τ2*\tau_{2}^{*} be the relaxation times for the associated random walks. Then

mine(we/we*)maxi(wi/wi*)τ2τ2*maxi(wi/wi*)mine(we/we*)\frac{\min_{e}(w_{e}/w^{*}_{e})}{\max_{i}(w_{i}/w^{*}_{i})}\leq\frac{\tau_{2}}% {\tau_{2}^{*}}\leq\frac{\max_{i}(w_{i}/w^{*}_{i})}{\min_{e}(w_{e}/w^{*}_{e})}

where in mine\min_{e} we don’t count loops e=(v,v)e=(v,v).

Proof. For any gg, by (3.77)

w**(g,g)w(g,g)mine(we*/we).w^{*}\mbox{${\cal E}$}^{*}(g,g)\geq w\mbox{${\cal E}$}(g,g)\min_{e}(w^{*}_{e}/% w_{e}).

And since wg22=iwig2(i)w\|g\|_{2}^{2}=\sum_{i}w_{i}g^{2}(i),

w*g2*2wg22maxi(wi*/wi).w^{*}\|g\|_{2}^{*2}\leq w\|g\|_{2}^{2}\,\max_{i}(w^{*}_{i}/w_{i}).

So if gg has π*\pi^{*}-mean 00 and π\pi-mean bb then

g2*2*(g,g)g-b2*2*(g-b,g-b)g-b22(g-b,g-b)maxi(wi*/wi)mine(we*/we).\frac{\|g\|_{2}^{*2}}{\mbox{${\cal E}$}^{*}(g,g)}\leq\frac{\|g-b\|_{2}^{*2}}{% \mbox{${\cal E}$}^{*}(g-b,g-b)}\leq\frac{\|g-b\|_{2}^{2}}{\mbox{${\cal E}$}(g-% b,g-b)}\ \frac{\max_{i}(w^{*}_{i}/w_{i})}{\min_{e}(w^{*}_{e}/w_{e})}.

By considering the gg attaining the extremal characterization of τ2*\tau_{2}^{*},

τ2*τ2maxi(wi*/wi)mine(we*/we).\tau^{*}_{2}\leq\tau_{2}\ \frac{\max_{i}(w^{*}_{i}/w_{i})}{\min_{e}(w^{*}_{e}/% w_{e})}.

This is the lower bound in the lemma, and the upper bound follows by reversing the roles of wew_{e} and we*w^{*}_{e}.    

Remarks. Sometimes τ2\tau_{2} is very sensitive to apparently-small changes in the chain. Consider random walk on an unweighted graph. If we add extra edges, but keeping the total number of added edges small relative to the number of original edges, then we might guess that τ2\tau_{2} could not increase or decrease much. But the examples outlined below show that τ2\tau_{2} may in fact change substantially in either direction.

Example 3.30

Take two complete graphs on nn vertices and join with a single edge. Then w=2n(n-1)+2w=2n(n-1)+2 and τ2n2/2\tau_{2}\sim n^{2}/2. But if we extend the single join-edge to an nn-edge matching of the vertices in the original two complete graphs, then w*=2n(n-1)+2nww^{*}=2n(n-1)+2n\sim w but τ2*n/2\tau_{2}^{*}\sim n/2.

Example 3.31

Take a complete graph on nn vertices. Take k=o(n1/2)k=o(n^{1/2}) new vertices and attach each to distinct vertices of the original complete graph. Then w=n(n-1)+2kw=n(n-1)+2k and τ2\tau_{2} is bounded. But if we now add all edges within the new kk vertices, w*=n(n-1)+2k+k(k-1)ww^{*}=n(n-1)+2k+k(k-1)\sim w but τ2*k\tau_{2}^{*}\sim k provided kk\rightarrow\infty.

As these examples suggest, comparison arguments are most effective when the stationary distributions coincide. Specializing Lemma 3.29 to this case, and rephrasing in terms of (reversible) chains, gives

Lemma 3.32 (the direct comparison lemma)

For transition matrices 𝐏{\bf P} and 𝐏*{\bf P}^{*} with the same stationary distribution π\pi, if

pijδpij*  for all jij\neq ip_{ij}\geq\delta p^{*}_{ij}\mbox{\ \ for all $j\neq i$}

then τ2δ-1τ2*\tau_{2}\leq\delta^{-1}\tau^{*}_{2}.

Remarks. The hypothesis can be rephrased as 𝐏=δ𝐏*+(1-δ)𝐐{\bf P}=\delta{\bf P}^{*}+(1-\delta){\bf Q}, where 𝐐{\bf Q} is a (maybe not irreducible) reversible transition matrix with stationary distribution π\pi. When 𝐐=𝐈{\bf Q}={\bf I} we have τ2=δ-1τ2*\tau_{2}=\delta^{-1}\tau^{*}_{2}, so an interpretation of the lemma is that “combining transitions of 𝐏*{\bf P}^{*} with noise can’t increase mixing time any more than combining transitions with holds”.

3.6.5 Quasistationarity

Given a subset AA of states in a discrete-time chain, let 𝐏A{\bf P}^{A} be 𝐏{\bf P} restricted to AcA^{c}. Then 𝐏A{\bf P}^{A} will be a substochastic matrix, i.e., the row-sums are at most 11, and some row-sum is strictly less than 11. Suppose 𝐏A{\bf P}^{A} is irreducible. As a consequence of the Perron–Frobenius theorem (e.g., [183] Theorem 8.4.4) for the nonnegative matrix 𝐏A{\bf P}^{A}, there is a unique 0<λ<10<\lambda<1 (specifically, the largest eigenvalue of 𝐏A{\bf P}^{A}) such that there is a probability distribution α\alpha satisfying

α=0 on AA,  iαipij=λαj,  jAc.\alpha=0\mbox{\ on~{}$A$},\qquad\sum_{i}\alpha_{i}p_{ij}=\lambda\alpha_{j},\ j% \in A^{c}. (3.82)

Writing αA\alpha_{A} and λA\lambda_{A} to emphasize dependence on AA, (3.82) implies that under PαAP_{\alpha_{A}} the hitting time TAT_{A} has geometric distribution

PαA(TAm)=λAm,m0,P_{\alpha_{A}}(T_{A}\geq m)=\lambda_{A}^{m},\ \ m\geq 0,

whence

EαATA=11-λA.E_{\alpha_{A}}T_{A}=\frac{1}{1-\lambda_{A}}.

Call αA\alpha_{A} the quasistationary distribution and EαATAE_{\alpha_{A}}T_{A} the quasistationary mean exit time.

Similarly, for a continuous-time chain let 𝐐A{\bf Q}^{A} be 𝐐{\bf Q} restricted to AcA^{c}. Assuming irreducibility of the substochastic chain with generator 𝐐A{\bf Q}^{A}, there is a unique λλA>0\lambda\equiv\lambda_{A}>0 such that there is a probability distribution ααA\alpha\equiv\alpha_{A} (called the quasistationary distribution) satisfying

α=0 on AA,  iαiqij=-λαj,  jAc.\alpha=0\mbox{\ on~{}$A$},\qquad\sum_{i}\alpha_{i}q_{ij}=-\lambda\alpha_{j},\ % j\in A^{c}.

This implies that under PαAP_{\alpha_{A}} the hitting time TAT_{A} has exponential distribution

PαA(TA>t)=exp(-λAt),t>0,P_{\alpha_{A}}(T_{A}>t)=\exp(-\lambda_{A}t),\ \ t>0,

whence the quasistationary mean exit time is

EαATA=1/λA.E_{\alpha_{A}}T_{A}=1/\lambda_{A}. (3.83)

Note that both αA\alpha_{A} and EαATAE_{\alpha_{A}}T_{A} are unaffected by continuization of a discrete-time chain.

The facts above do not depend on reversibility, but invoking now our standing assumption that chains are reversible we will show in remark (c) following Theorem 3.33 that, for continuous-time chains, λA\lambda_{A} here agrees with the spectral gap λA\lambda_{A} discussed in Proposition 3.21, and we can also now prove our second extremal characterization.

Theorem 3.33

(Extremal characterization of quasistationary mean hitting time)  The quasistationary mean exit time satisfies

EαATA=sup{||g||22/(g,g):g0,  g=0 on AA}.E_{\alpha_{A}}T_{A}=\sup\{||g||_{2}^{2}/\mbox{${\cal E}$}(g,g):g\geq 0,\ g=0% \mbox{\rm\ on~{}$A$}\}. (3.84)

Proof. As usual, we give the proof in discrete time. The matrix (sijA=πi1/2pijAπj-1/2)(s^{A}_{ij}=\pi^{1/2}_{i}p^{A}_{ij}\pi^{-1/2}_{j}) is symmetric with largest eigenvalue λA\lambda_{A}. Putting xi=πi1/2g(i)x_{i}=\pi^{1/2}_{i}g(i) in the characterization (3.78) gives

λA=supgijπig(i)pijAg(j)iπig2(i).\lambda_{A}=\sup_{g}\frac{\sum_{i}\sum_{j}\pi_{i}g(i)p^{A}_{ij}g(j)}{\sum_{i}% \pi_{i}g^{2}(i)}.

Clearly the sup is attained by nonnegative gg, and though the sums above are technically over AcA^{c} we can sum over all II by setting g=0g=0 on AA. So

λA=sup{ijπig(i)pijAg(j)iπig2(i):g0,  g=0 on AA}.\lambda_{A}=\sup\left\{\frac{\sum_{i}\sum_{j}\pi_{i}g(i)p^{A}_{ij}g(j)}{\sum_{% i}\pi_{i}g^{2}(i)}:g\geq 0,\ g=0\mbox{\ on~{}$A$}\right\}.

As in the proof of Theorem 3.25 this rearranges to

11-λA=sup{||g||22/(g,g):g0,  g=0 on AA},\frac{1}{1-\lambda_{A}}=\sup\{||g||_{2}^{2}/\mbox{${\cal E}$}(g,g):g\geq 0,\ g% =0\mbox{\ on~{}$A$}\},

establishing Theorem 3.33.    

Remarks. (a) These remarks closely parallel the remarks at the end of Section 3.6.3. The sup in Theorem 3.33 is attained by the function g0g_{0} which is the right eigenvector associated with λA\lambda_{A}, and by reversibility this is

g0(i)=αA(i)/πi.g_{0}(i)=\alpha_{A}(i)/\pi_{i}. (3.85)

It easily follows from (3.82) that

PαA(Xt=j|TA>t)=αA(j) for all jj and tt,P_{\alpha_{A}}(X_{t}=j\,|\,T_{A}>t)=\alpha_{A}(j)\mbox{\ for all~{}$j$ and~{}$t$},

which explains the name quasistationary distribution for αA{\alpha_{A}}. A related interpretation of αA{\alpha_{A}} is as the distribution of the Markov chain conditioned on having been in AcA^{c} for the infinite past. More precisely, one can use Perron–Frobenius theory to prove that

P(Xt=j|TA>t)αA(j) as tt\rightarrow\inftyP(X_{t}=j\,|\,T_{A}>t)\rightarrow\alpha_{A}(j)\mbox{\ as $t\rightarrow\infty$} (3.86)

provided PAP^{A} is aperiodic as well as irreducible.

(b) Relation (3.86) holds in continuous time as well (assuming irreducibility of the chain restricted to AcA^{c}), yielding

exp(-λAt)\displaystyle\exp(-\lambda_{A}t) =\displaystyle= PαA(T>t)\displaystyle P_{\alpha_{A}}(T>t)
=\displaystyle= limsPπ(TA>t+s|TA>s)=limsPπ(TA>t+s)Pπ(TA>s).\displaystyle\lim_{s\rightarrow\infty}P_{\pi}(T_{A}>t+s\,|\,T_{A}>s)=\lim_{s% \rightarrow\infty}\frac{P_{\pi}(T_{A}>t+s)}{P_{\pi}(T_{A}>s)}.

Since by Proposition 3.21 the distribution of TAT_{A} for the stationary chain is CM with spectral gap (say) σA\sigma_{A}, the limit here is exp(-σAt)\exp(-\sigma_{A}t). Thus λA=σA\lambda_{A}=\sigma_{A}, that is, our two uses of λA\lambda_{A} refer to the same quantity.

(c) We conclude from remark (b), (3.83), and the final inequality in (3.69) that, margin: This is needed at the bottom of page 27 (9/22/96 version) in Chapter 5. in either continuous or discrete time,

EαATAEπTAπ(Ac)EπTA.E_{\alpha_{A}}T_{A}\geq\frac{E_{\pi}T_{A}}{\pi(A^{c})}\geq E_{\pi}T_{A}. (3.87)

Our fundamental use of quasistationarity is the following.

Corollary 3.34

For any subset AA, the quasistationary mean hitting time satisfies

EαATAτ2/π(A).E_{\alpha_{A}}T_{A}\leq\tau_{2}/\pi(A).

Proof. As at (3.85) set g(i)=αA(i)/πig(i)=\alpha_{A}(i)/\pi_{i}, so

EαATA=||g||22/(g,g).E_{\alpha_{A}}T_{A}=||g||_{2}^{2}/\mbox{${\cal E}$}(g,g). (3.88)

Now Eπg(X0)=1E_{\pi}g(X_{0})=1, so applying the extremal characterization of relaxation time to g-1g-1,

τ2g-122(g-1,g-1)=g22-1(g,g)=(EαATA)(1-1g22),\tau_{2}\geq\frac{\|g-1\|_{2}^{2}}{\mbox{${\cal E}$}(g-1,g-1)}=\frac{\|g\|_{2}% ^{2}-1}{\mbox{${\cal E}$}(g,g)}=\left(E_{\alpha_{A}}T_{A}\right)\left(1-\frac{% 1}{\|g\|_{2}^{2}}\right), (3.89)

the last equality using (3.88). Since αA{\alpha_{A}} is a probability distribution on AcA^{c} we have

1=Eπ[1Ac(X0)g(X0)]1=E_{\pi}\left[1_{A^{c}}(X_{0})g(X_{0})\right]

and so by Cauchy–Schwarz

12(Eπ1Ac(X0))×||g||22=(1-π(A))||g||22.1^{2}\leq(E_{\pi}1_{A^{c}}(X_{0}))\times||g||_{2}^{2}=(1-\pi(A))||g||_{2}^{2}.

Rearranging,

1-1||g||22π(A)1-\frac{1}{||g||_{2}^{2}}\geq\pi(A)

and substituting into (3.89) gives the desired bound.    

Combining Corollary 3.34 with (3.68) and (3.83) gives the result below. margin: DA has deleted discrete-time claim in previous version. It seems true but not worth sweating over.

Lemma 3.35
(continuous time)Pπ(TA>t)exp(-tπ(A)/τ2),t0\begin{array}[]{rccll}\mbox{{\rm(}continuous time{\rm)}}&P_{\pi}(T_{A}>t)&\leq% &\exp(-t\pi(A)/\tau_{2}),&\ \ t\geq 0\end{array}