# 3.6 Extremal characterizations of eigenvalues

## 3.6.1 The Dirichlet formalism

A reversible chain has an associated Dirichlet form ${\cal E}$, defined as follows. For functions $g:I\rightarrow R$ write

 $\mbox{{\cal E}}(g,g):={\textstyle\frac{1}{2}}\sum_{i}\sum_{j\neq i}\pi_{i}p_% {ij}(g(j)-g(i))^{2}$ (3.74)

in discrete time, and substitute $q_{ij}$ for $p_{ij}$ in continuous time. One can immediately check the following equivalent definitions. In discrete time

 $\mbox{{\cal E}}(g,g)={\textstyle\frac{1}{2}}E_{\pi}(g(X_{1})-g(X_{0}))^{2}=E% _{\pi}[g(X_{0})(g(X_{0})-g(X_{1}))].$ (3.75)

In continuous time

 $\displaystyle\mbox{{\cal E}}(g,g)$ $\displaystyle=$ $\displaystyle{\textstyle\frac{1}{2}}\lim_{t\rightarrow 0}t^{-1}E_{\pi}(g(X_{t}% )-g(X_{0}))^{2}$ (3.76) $\displaystyle=$ $\displaystyle\lim_{t\rightarrow 0}t^{-1}E_{\pi}[g(X_{0})(g(X_{0})-g(X_{t}))]$ $\displaystyle=$ $\displaystyle-\ \sum_{i}\sum_{j}\pi_{i}g(i)q_{ij}g(j)$

where the sum includes $j=i$. Note also that for random walk on a weighted graph, (3.74) becomes

 $\mbox{{\cal E}}(g,g):=\frac{1}{2}\sum_{i}\sum_{j\neq i}\frac{w_{ij}}{w}(g(j)% -g(i))^{2}.$ (3.77)

Recall from Chapter 2 margin: 9/10/99 versionSection 6.2 the discussion of $L^{2}$ norms for functions and measures. In particular

 $\displaystyle||g||_{2}^{2}$ $\displaystyle=$ $\displaystyle\sum_{i}\pi_{i}g^{2}(i)=E_{\pi}g^{2}(X_{0})$ $\displaystyle\|\mu-\pi\|_{2}^{2}$ $\displaystyle=$ $\mu$

The relevance of ${\cal E}$ can be seen in the following lemma.

###### Lemma 3.24

Write $\rho(t)=(\rho_{j}(t))$ for the distribution at time $t$ of a continuous-time chain, with arbitrary initial distribution. Write $f_{j}(t)=\rho_{j}(t)/\pi_{j}$. Then

 $\frac{d}{dt}\|\rho(t)-\pi\|_{2}^{2}=-2\mbox{{\cal E}}(f(t),f(t)).$

Proof. $\|\rho(t)-\pi\|_{2}^{2}=\sum_{j}\pi^{-1}_{j}\rho^{2}_{j}(t)\,-1$, so using the forward equations

 $\frac{d}{dt}\rho_{j}(t)=\sum_{i}\rho_{i}(t)q_{ij}$

we get

 $\displaystyle\frac{d}{dt}\|\rho(t)-\pi\|_{2}^{2}$ $\displaystyle=$ $\displaystyle\sum_{j}\sum_{i}2\pi^{-1}_{j}\rho_{j}(t)\rho_{i}(t)q_{ij}$ $\displaystyle=$ $\displaystyle 2\sum_{j}\sum_{i}f_{j}(t)f_{i}(t)\pi_{i}q_{ij}$

and the result follows from (3.76).

## 3.6.2 Summary of extremal characterizations

For ease of comparison we state below three results which will be proved in subsequent sections. These results are commonly presented “the other way up” using infs rather than sups, but our presentation is forced by our convention of consistently defining parameters to have dimensions of “time” rather than “$1/\mbox{time}$”. The sups are over functions $g:I\rightarrow R$ satisfying specified constraints, and excluding $g\equiv 0$. The results below are the same in continuous and discrete time—that is, continuization doesn’t change the numerical values of the quantities we consider. We shall give the proofs in discrete time.

Extremal characterization of relaxation time. The relaxation time $\tau_{2}$ satisfies

 $\tau_{2}=\sup\{||g||_{2}^{2}/\mbox{{\cal E}}(g,g):\sum_{i}\pi_{i}g(i)=0\}.$

Extremal characterization of quasistationary mean hitting time. Given a subset $A$, let $\alpha_{A}$ be the quasistationary distribution on $A^{c}$ defined at (3.82). Then the quasistationary mean exit time is

 $g\geq 0$

Extremal characterization of mean commute times. For distinct states $i,j$ the mean commute time satisfies

 $E_{i}T_{j}+E_{j}T_{i}=\sup\{1/\mbox{{\cal E}}(g,g):0\leq g\leq 1,\ \,g(i)=1,% \,\ g(j)=0\}.$

Because the state space is finite, the sups are attained, and there are theoretical descriptions of the $g$ attaining the extrema in all three cases. An immediate practical use of these characterizations in concrete examples is to obtain lower bounds on the parameters by inspired guesswork, that is by choosing some simple explicit “test function” $g$ which seems qualitatively right and computing the right-hand quantity. See Chapter 14 margin: 3/10/94 versionExample 32 for a typical example. Of course we cannot obtain upper bounds this way, but extremal characterizations can be used as a starting point for further theoretical work (see in particular the bounds on $\tau_{2}$ in Chapter 4 margin: 10/11/94 versionSection 4).

## 3.6.3 The extremal characterization of relaxation time

The first two extremal characterizations are in fact just reformulations of the classical Rayleigh–Ritz extremal characterization of eigenvalues, which goes as follows ([183] Theorem 4.2.2 and eq. 4.2.7). Let $S$ be a symmetric matrix with eigenvalues $\mu_{1}\geq\mu_{2}\geq\cdots$. Then

 $\mu_{1}=\sup_{{\bf x}}\frac{\sum_{i}\sum_{j}x_{i}s_{ij}x_{j}}{\sum_{i}x^{2}_{i}}$ (3.78)

and an ${\bf x}$ attaining the sup is an eigenvalue corresponding to $\mu_{1}$ (of course sups are over ${\bf x}\not\equiv 0$). And

 $\mu_{2}=\sup_{{\bf y}:\sum_{i}y_{i}x_{i}=0}\frac{\sum_{i}\sum_{j}y_{i}s_{ij}y_% {j}}{\sum_{i}y^{2}_{i}}$ (3.79)

and a ${\bf y}$ attaining the sup is an eigenvalue corresponding to $\mu_{2}$.

As observed in Section 3.4, given a discrete-time chain with transition matrix $P$, the symmetric matrix $(s_{ij}=\pi^{1/2}_{i}p_{ij}\pi^{-1/2}_{j})$ has maximal eigenvalue $1$ with corresponding eigenvector $(\pi^{1/2}_{i})$. So applying (3.79) and writing $y_{i}=\pi^{1/2}_{i}g(i)$, the second-largest eigenvalue (of $S$ and hence of $P$) is given by

 $\lambda_{2}=\sup_{g:\sum_{i}\pi_{i}g(i)=0}\frac{\sum_{i}\sum_{j}\pi_{i}g(i)p_{% ij}g(j)}{\sum_{i}\pi_{i}g^{2}(i)}.$

In probabilistic notation the fraction is

 $\frac{E_{\pi}[g(X_{0})g(X_{1})]}{E_{\pi}g^{2}(X_{0})}=1-\frac{E_{\pi}[g(X_{0})% (g(X_{1})-g(X_{0}))]}{E_{\pi}g^{2}(X_{0})}=1-\frac{\mbox{{\cal E}}(g,g)}{\|g% \|_{2}^{2}}.$

Since $\tau_{2}=1/(1-\lambda_{2})$ in discrete time we have proved the first of our extremal characterizations.

###### Theorem 3.25 (Extremal characterization of relaxation time)

The relaxation time $\tau_{2}$ satisfies

 $\tau_{2}=\sup\{||g||_{2}^{2}/\mbox{{\cal E}}(g,g):\sum_{i}\pi_{i}g(i)=0\}.$

A function $g_{0}$, say, attaining the sup in the extremal characterization is, by examining the argument above, a right eigenvector associated with $\lambda_{2}$:

 $\sum_{j}p_{ij}g_{0}(j)=\lambda_{2}g_{0}(i).$

(From this point on in the discussion, we assume $g_{0}$ is normalized so that $\|g_{0}\|_{2}=1$.)  The corresponding left eigenvector $\theta$ :

 $j$

is the signed measure $\theta$ such that $\theta_{i}=\pi_{i}g_{0}(i)$. To continue a somewhat informal discussion of the interpretation of $g_{0}$, it is convenient to switch to continuous time (to avoid issues of negative eigenvalues) and to assume $\lambda_{2}$ has multiplicity $1$. The equation which relates distribution at time $t$ to initial distribution,

 $\rho_{j}(t)=\sum_{i}\rho_{i}(0)p_{ij}(t),$

can also be used to define signed measures evolving from an initial signed measure. For the initial measure $\theta$ we have

 $\theta(t)=e^{-t/\tau_{2}}\theta.$

For any signed measure $\nu=\nu(0)$ with $\sum_{i}\nu_{i}(0)=0$ we have

 $\nu(t)\sim ce^{-t/\tau_{2}}\theta;\ \ c=\sum_{i}\nu_{i}(0)\theta_{i}/\pi_{i}=% \sum_{i}\nu_{i}(0)g_{0}(i).$

So $\theta$ can be regarded as “the signed measure which relaxes to $0$ most slowly”. For a probability measure $\rho(0)$, considering $\rho(0)-\pi$ gives

 $\rho(t)-\pi\sim ce^{-t/\tau_{2}}\theta,\ \ c=\sum_{i}(\rho_{i}(0)-\pi_{i})g_{0% }(i)=\sum_{i}\rho_{i}(0)g_{0}(i).$ (3.80)

So $\theta$ has the interpretation of “the asymptotic normalized difference between the true distribution at time $t$ and the stationary distribution”. Finally, from (3.80) with $\rho(0)$ concentrated at $i$ (or from the spectral representation)

 $P_{i}(X_{t}\in\cdot)-\pi\sim g_{0}(i)e^{-t/\tau_{2}}\theta.$

So $g_{0}$ has the interpretation of “the asymptotic normalized size of deviation from stationarity, as a function of the starting state”. When the state space has some geometric structure – jumps go to nearby states – one expects $g_{0}$ to be a “smooth” function, exemplified by the cosine function arising in the $n$-cycle (Chapter 5 margin: 4/22/96 versionExample 7).

## 3.6.4 Simple applications

Here is a fundamental “finite-time” result. margin: Good name for Lemma 3.26 – looks good to DA !

###### Lemma 3.26 ($L^{2}$ contraction lemma)

Write $\rho(t)=(\rho_{j}(t))$ for the distribution at time $t$ of a continuous-time chain, with arbitrary initial distribution. Then

 $\|\rho(t)-\pi\|_{2}\leq e^{-t/\tau_{2}}\|\rho(0)-\pi\|_{2}.$

Proof. Write $f_{j}(t)=\rho_{j}(t)/\pi_{j}$. Then

 $\displaystyle\frac{d}{dt}\|\rho(t)-\pi\|_{2}^{2}$ $\displaystyle=$ $\displaystyle-2\mbox{{\cal E}}(f(t),f(t))\mbox{\ \ by Lemma~{}\ref{Efe}}$ $\displaystyle=$ $\displaystyle-2\mbox{{\cal E}}(f(t)-1,f(t)-1)$ $\displaystyle\leq$ $\tau_{2}$ $\displaystyle=$ $\displaystyle\frac{-2}{\tau_{2}}\,\|\rho(t)-\pi\|_{2}^{2}.$

Integrating, $\|\rho(t)-\pi\|_{2}^{2}\leq e^{-2t/\tau_{2}}\|\rho(0)-\pi\|_{2}^{2}$, and the result follows.

Alternatively, Lemma 3.26 follows by observing that

 $\|\rho(t)-\pi\|^{2}_{2}$ is CM with spectral gap at least $2\lambda_{2}=2/\tau_{2}$ (3.81)

and applying Lemma 3.16. The fact (3.81) can be established directly from the spectral representation, but we will instead apply the observation at (3.50)–(3.51). Indeed, with $g(i):\equiv\rho_{i}(0)/\pi_{i}$, we have

 $\displaystyle E_{\pi}\left[g(X_{2t})g(X_{0})\right]$ $\displaystyle=$ $\displaystyle\sum_{i}\rho_{i}(0)\sum_{j}p_{ij}(2t)\frac{\rho_{j}(0)}{\pi_{j}}$ $\displaystyle=$ $\displaystyle\sum_{i}\rho_{i}(0)\sum_{j}\sum_{k}p_{ik}(t)p_{kj}(t)\frac{\rho_{% j}(0)}{\pi_{j}}$ $\displaystyle=$ $\displaystyle\sum_{k}\frac{1}{\pi_{k}}\left[\sum_{i}\rho_{i}(0)p_{ik}(t)\right% ]\left[\sum_{j}\rho_{j}(0)p_{jk}(t)\right]$ $\displaystyle=$ $\displaystyle\sum_{k}\frac{1}{\pi_{k}}\rho^{2}_{k}(t)=\|\rho(t)-\pi\|^{2}_{2}+1.$

Thus by (3.51)

 $\|\rho(t)-\pi\|^{2}_{2}=\sum_{m=2}^{n}\left(\sum_{i}\pi^{1/2}_{i}g(i)u_{im}% \right)^{2}\exp(-\lambda_{m}t).$

Our main use of the extremal characterization is to compare relaxation times of different chains on the same (or essentially the same) state space. Here are three instances. The first is a result we have already exploited in Section 3.5.

###### Corollary 3.27

Given a chain with relaxation time $\tau_{2}$, let $\tau^{A}_{2}$ be the relaxation time of the chain with subset $A$ collapsed to a singleton $\{a\}$ (Chapter 2 margin: 9/10/99 versionSection 7.3). Then $\tau^{A}_{2}\leq\tau_{2}$.

Proof. Any function $g$ on the states of the collapsed chain can be extended to the original state space by setting $g=g(a)$ on $A$, and $\mbox{{\cal E}}(g,g)$ and $\sum_{i}\pi_{i}g(i)$ and $||g||_{2}^{2}$ are unchanged. So consider a $g$ attaining the sup in the extremal characterization of $\tau^{A}_{2}$ and use this as a test function in the extremal characterization of $\tau_{2}$.

Remark. An extension of Corollary 3.27 will be provided by the contraction principle (Chapter 4 margin: 10/11/94 versionProposition 44).

###### Corollary 3.28

Let $\tau_{2}$ be the relaxation time for a “fluid model” continuous-time chain associated with a graph with weights $(w_{e})$ [recall (3.16)] and let $\tau_{2}^{*}$ be the relaxation time when the weights are $(w^{*}_{e})$. If $w^{*}_{e}\geq w_{e}$ for all edges $e$ then $\tau_{2}^{*}\leq\tau_{2}$.

Proof. Each stationary distribution is uniform, so $\|g\|_{2}^{2}=\|g\|_{2}^{*2}$ while $\mbox{{\cal E}}^{*}(g,g)\geq\mbox{{\cal E}}(g,g)$. So the result is immediate from the extremal characterization.

The next result is a prototype for more complicated “indirect comparison” arguments later. margin: refer to not-net-written Poincare chapter It is convenient to state it in terms of random walk on a weighted graph. Recall (Section 3.2) that a reversible chain specifies a weighted graph with edge-weights $w_{ij}=\pi_{i}p_{ij}$, vertex-weights $w_{i}=\pi_{i}$, and total weight $w=1$.

###### Lemma 3.29 (the direct comparison lemma)

Let $(w_{e})$ and $(w^{*}_{e})$ be edge-weights on a graph, let $(w_{i})$ and $(w^{*}_{i})$ be the vertex-weights, and let $\tau_{2}$ and $\tau_{2}^{*}$ be the relaxation times for the associated random walks. Then

 $\frac{\min_{e}(w_{e}/w^{*}_{e})}{\max_{i}(w_{i}/w^{*}_{i})}\leq\frac{\tau_{2}}% {\tau_{2}^{*}}\leq\frac{\max_{i}(w_{i}/w^{*}_{i})}{\min_{e}(w_{e}/w^{*}_{e})}$

where in $\min_{e}$ we don’t count loops $e=(v,v)$.

Proof. For any $g$, by (3.77)

 $w^{*}\mbox{{\cal E}}^{*}(g,g)\geq w\mbox{{\cal E}}(g,g)\min_{e}(w^{*}_{e}/% w_{e}).$

And since $w\|g\|_{2}^{2}=\sum_{i}w_{i}g^{2}(i)$,

 $w^{*}\|g\|_{2}^{*2}\leq w\|g\|_{2}^{2}\,\max_{i}(w^{*}_{i}/w_{i}).$

So if $g$ has $\pi^{*}$-mean $0$ and $\pi$-mean $b$ then

 $\frac{\|g\|_{2}^{*2}}{\mbox{{\cal E}}^{*}(g,g)}\leq\frac{\|g-b\|_{2}^{*2}}{% \mbox{{\cal E}}^{*}(g-b,g-b)}\leq\frac{\|g-b\|_{2}^{2}}{\mbox{{\cal E}}(g-% b,g-b)}\ \frac{\max_{i}(w^{*}_{i}/w_{i})}{\min_{e}(w^{*}_{e}/w_{e})}.$

By considering the $g$ attaining the extremal characterization of $\tau_{2}^{*}$,

 $\tau^{*}_{2}\leq\tau_{2}\ \frac{\max_{i}(w^{*}_{i}/w_{i})}{\min_{e}(w^{*}_{e}/% w_{e})}.$

This is the lower bound in the lemma, and the upper bound follows by reversing the roles of $w_{e}$ and $w^{*}_{e}$.

Remarks. Sometimes $\tau_{2}$ is very sensitive to apparently-small changes in the chain. Consider random walk on an unweighted graph. If we add extra edges, but keeping the total number of added edges small relative to the number of original edges, then we might guess that $\tau_{2}$ could not increase or decrease much. But the examples outlined below show that $\tau_{2}$ may in fact change substantially in either direction.

###### Example 3.30

Take two complete graphs on $n$ vertices and join with a single edge. Then $w=2n(n-1)+2$ and $\tau_{2}\sim n^{2}/2$. But if we extend the single join-edge to an $n$-edge matching of the vertices in the original two complete graphs, then $w^{*}=2n(n-1)+2n\sim w$ but $\tau_{2}^{*}\sim n/2$.

###### Example 3.31

Take a complete graph on $n$ vertices. Take $k=o(n^{1/2})$ new vertices and attach each to distinct vertices of the original complete graph. Then $w=n(n-1)+2k$ and $\tau_{2}$ is bounded. But if we now add all edges within the new $k$ vertices, $w^{*}=n(n-1)+2k+k(k-1)\sim w$ but $\tau_{2}^{*}\sim k$ provided $k\rightarrow\infty$.

As these examples suggest, comparison arguments are most effective when the stationary distributions coincide. Specializing Lemma 3.29 to this case, and rephrasing in terms of (reversible) chains, gives

###### Lemma 3.32 (the direct comparison lemma)

For transition matrices ${\bf P}$ and ${\bf P}^{*}$ with the same stationary distribution $\pi$, if

 $j\neq i$

then $\tau_{2}\leq\delta^{-1}\tau^{*}_{2}$.

Remarks. The hypothesis can be rephrased as ${\bf P}=\delta{\bf P}^{*}+(1-\delta){\bf Q}$, where ${\bf Q}$ is a (maybe not irreducible) reversible transition matrix with stationary distribution $\pi$. When ${\bf Q}={\bf I}$ we have $\tau_{2}=\delta^{-1}\tau^{*}_{2}$, so an interpretation of the lemma is that “combining transitions of ${\bf P}^{*}$ with noise can’t increase mixing time any more than combining transitions with holds”.

## 3.6.5 Quasistationarity

Given a subset $A$ of states in a discrete-time chain, let ${\bf P}^{A}$ be ${\bf P}$ restricted to $A^{c}$. Then ${\bf P}^{A}$ will be a substochastic matrix, i.e., the row-sums are at most $1$, and some row-sum is strictly less than $1$. Suppose ${\bf P}^{A}$ is irreducible. As a consequence of the Perron–Frobenius theorem (e.g., [183] Theorem 8.4.4) for the nonnegative matrix ${\bf P}^{A}$, there is a unique $0<\lambda<1$ (specifically, the largest eigenvalue of ${\bf P}^{A}$) such that there is a probability distribution $\alpha$ satisfying

 $A$ (3.82)

Writing $\alpha_{A}$ and $\lambda_{A}$ to emphasize dependence on $A$, (3.82) implies that under $P_{\alpha_{A}}$ the hitting time $T_{A}$ has geometric distribution

 $P_{\alpha_{A}}(T_{A}\geq m)=\lambda_{A}^{m},\ \ m\geq 0,$

whence

 $E_{\alpha_{A}}T_{A}=\frac{1}{1-\lambda_{A}}.$

Call $\alpha_{A}$ the quasistationary distribution and $E_{\alpha_{A}}T_{A}$ the quasistationary mean exit time.

Similarly, for a continuous-time chain let ${\bf Q}^{A}$ be ${\bf Q}$ restricted to $A^{c}$. Assuming irreducibility of the substochastic chain with generator ${\bf Q}^{A}$, there is a unique $\lambda\equiv\lambda_{A}>0$ such that there is a probability distribution $\alpha\equiv\alpha_{A}$ (called the quasistationary distribution) satisfying

 $A$

This implies that under $P_{\alpha_{A}}$ the hitting time $T_{A}$ has exponential distribution

 $P_{\alpha_{A}}(T_{A}>t)=\exp(-\lambda_{A}t),\ \ t>0,$

whence the quasistationary mean exit time is

 $E_{\alpha_{A}}T_{A}=1/\lambda_{A}.$ (3.83)

Note that both $\alpha_{A}$ and $E_{\alpha_{A}}T_{A}$ are unaffected by continuization of a discrete-time chain.

The facts above do not depend on reversibility, but invoking now our standing assumption that chains are reversible we will show in remark (c) following Theorem 3.33 that, for continuous-time chains, $\lambda_{A}$ here agrees with the spectral gap $\lambda_{A}$ discussed in Proposition 3.21, and we can also now prove our second extremal characterization.

###### Theorem 3.33

(Extremal characterization of quasistationary mean hitting time)  The quasistationary mean exit time satisfies

 $A$ (3.84)

Proof. As usual, we give the proof in discrete time. The matrix $(s^{A}_{ij}=\pi^{1/2}_{i}p^{A}_{ij}\pi^{-1/2}_{j})$ is symmetric with largest eigenvalue $\lambda_{A}$. Putting $x_{i}=\pi^{1/2}_{i}g(i)$ in the characterization (3.78) gives

 $\lambda_{A}=\sup_{g}\frac{\sum_{i}\sum_{j}\pi_{i}g(i)p^{A}_{ij}g(j)}{\sum_{i}% \pi_{i}g^{2}(i)}.$

Clearly the sup is attained by nonnegative $g$, and though the sums above are technically over $A^{c}$ we can sum over all $I$ by setting $g=0$ on $A$. So

 $A$

As in the proof of Theorem 3.25 this rearranges to

 $A$

establishing Theorem 3.33.

Remarks. (a) These remarks closely parallel the remarks at the end of Section 3.6.3. The sup in Theorem 3.33 is attained by the function $g_{0}$ which is the right eigenvector associated with $\lambda_{A}$, and by reversibility this is

 $g_{0}(i)=\alpha_{A}(i)/\pi_{i}.$ (3.85)

It easily follows from (3.82) that

 $j$

which explains the name quasistationary distribution for ${\alpha_{A}}$. A related interpretation of ${\alpha_{A}}$ is as the distribution of the Markov chain conditioned on having been in $A^{c}$ for the infinite past. More precisely, one can use Perron–Frobenius theory to prove that

 $t\rightarrow\infty$ (3.86)

provided $P^{A}$ is aperiodic as well as irreducible.

(b) Relation (3.86) holds in continuous time as well (assuming irreducibility of the chain restricted to $A^{c}$), yielding

 $\displaystyle\exp(-\lambda_{A}t)$ $\displaystyle=$ $\displaystyle P_{\alpha_{A}}(T>t)$ $\displaystyle=$ $\displaystyle\lim_{s\rightarrow\infty}P_{\pi}(T_{A}>t+s\,|\,T_{A}>s)=\lim_{s% \rightarrow\infty}\frac{P_{\pi}(T_{A}>t+s)}{P_{\pi}(T_{A}>s)}.$

Since by Proposition 3.21 the distribution of $T_{A}$ for the stationary chain is CM with spectral gap (say) $\sigma_{A}$, the limit here is $\exp(-\sigma_{A}t)$. Thus $\lambda_{A}=\sigma_{A}$, that is, our two uses of $\lambda_{A}$ refer to the same quantity.

(c) We conclude from remark (b), (3.83), and the final inequality in (3.69) that, margin: This is needed at the bottom of page 27 (9/22/96 version) in Chapter 5. in either continuous or discrete time,

 $E_{\alpha_{A}}T_{A}\geq\frac{E_{\pi}T_{A}}{\pi(A^{c})}\geq E_{\pi}T_{A}.$ (3.87)

Our fundamental use of quasistationarity is the following.

###### Corollary 3.34

For any subset $A$, the quasistationary mean hitting time satisfies

 $E_{\alpha_{A}}T_{A}\leq\tau_{2}/\pi(A).$

Proof. As at (3.85) set $g(i)=\alpha_{A}(i)/\pi_{i}$, so

 $E_{\alpha_{A}}T_{A}=||g||_{2}^{2}/\mbox{{\cal E}}(g,g).$ (3.88)

Now $E_{\pi}g(X_{0})=1$, so applying the extremal characterization of relaxation time to $g-1$,

 $\tau_{2}\geq\frac{\|g-1\|_{2}^{2}}{\mbox{{\cal E}}(g-1,g-1)}=\frac{\|g\|_{2}% ^{2}-1}{\mbox{{\cal E}}(g,g)}=\left(E_{\alpha_{A}}T_{A}\right)\left(1-\frac{% 1}{\|g\|_{2}^{2}}\right),$ (3.89)

the last equality using (3.88). Since ${\alpha_{A}}$ is a probability distribution on $A^{c}$ we have

 $1=E_{\pi}\left[1_{A^{c}}(X_{0})g(X_{0})\right]$

and so by Cauchy–Schwarz

 $1^{2}\leq(E_{\pi}1_{A^{c}}(X_{0}))\times||g||_{2}^{2}=(1-\pi(A))||g||_{2}^{2}.$

Rearranging,

 $1-\frac{1}{||g||_{2}^{2}}\geq\pi(A)$

and substituting into (3.89) gives the desired bound.

Combining Corollary 3.34 with (3.68) and (3.83) gives the result below. margin: DA has deleted discrete-time claim in previous version. It seems true but not worth sweating over.

###### Lemma 3.35
 $\begin{array}[]{rccll}\mbox{{\rm(}continuous time{\rm)}}&P_{\pi}(T_{A}>t)&\leq% &\exp(-t\pi(A)/\tau_{2}),&\ \ t\geq 0\end{array}$