3.6.1 The Dirichlet formalism
A reversible chain has an associated Dirichlet form ,
defined as follows.
For functions write
|
|
|
(3.74) |
in discrete time, and substitute for in continuous time.
One can immediately check the following equivalent definitions.
In discrete time
|
|
|
(3.75) |
In continuous time
|
|
|
|
|
(3.76) |
|
|
|
|
|
|
|
|
|
|
where the sum includes .
Note also that for random walk on a weighted graph, (3.74) becomes
|
|
|
(3.77) |
Recall from Chapter 2 ††margin: 9/10/99 versionSection 6.2 the discussion
of norms
for functions and measures.
In particular
|
|
|
|
|
|
|
|
|
|
The relevance of can be seen in the following lemma.
Lemma 3.24
Write for the distribution at time of
a continuous-time chain, with arbitrary initial distribution.
Write . Then
|
|
|
Proof.
,
so using the forward equations
|
|
|
we get
|
|
|
|
|
|
|
|
|
|
and the result follows from (3.76).
3.6.2 Summary of extremal characterizations
For ease of comparison we state below three results which will be
proved in subsequent sections.
These results are commonly presented “the other way up” using
infs rather than sups,
but our presentation is forced by our convention of consistently
defining parameters to have dimensions of “time” rather than
“”.
The sups are over functions satisfying specified
constraints, and excluding .
The results below are the same in continuous and
discrete time—that is, continuization doesn’t change the numerical
values of the quantities we consider.
We shall give the proofs in discrete time.
Extremal characterization of relaxation time.
The relaxation time satisfies
|
|
|
Extremal characterization of quasistationary mean hitting time.
Given a subset , let be the quasistationary distribution
on defined at (3.82).
Then the quasistationary mean exit time is
|
|
|
Extremal characterization of mean commute times.
For distinct states the mean commute time satisfies
|
|
|
Because the state space is finite, the sups are attained, and
there are theoretical descriptions of the attaining the extrema
in all three cases.
An immediate practical use of these characterizations in concrete examples
is to obtain lower bounds on the parameters
by inspired guesswork,
that is by choosing some simple explicit “test function” which
seems qualitatively
right and computing the right-hand quantity.
See
Chapter 14
††margin: 3/10/94 versionExample 32 for a typical example.
Of course we cannot obtain upper bounds this way, but extremal
characterizations can be used as a starting point for further
theoretical work (see in particular the bounds on in
Chapter 4
††margin: 10/11/94 versionSection 4).
3.6.3 The extremal characterization of relaxation time
The first two extremal characterizations are in fact just reformulations
of the classical Rayleigh–Ritz extremal characterization of
eigenvalues, which goes as follows
([183] Theorem 4.2.2 and eq. 4.2.7).
Let be a symmetric matrix with eigenvalues .
Then
|
|
|
(3.78) |
and an attaining the sup is an eigenvalue corresponding
to
(of course sups are over ).
And
|
|
|
(3.79) |
and a attaining the sup is an eigenvalue corresponding
to .
As observed in Section 3.4,
given a discrete-time chain with transition matrix ,
the symmetric matrix
has maximal eigenvalue with corresponding eigenvector .
So applying (3.79) and writing ,
the second-largest eigenvalue (of and hence of ) is given by
|
|
|
In probabilistic notation the fraction is
|
|
|
Since in discrete time we have proved the first of our extremal characterizations.
Theorem 3.25 (Extremal characterization of relaxation time)
The relaxation time satisfies
|
|
|
A function , say, attaining the sup in the extremal
characterization is,
by examining the argument above, a right eigenvector associated
with :
|
|
|
(From this point on in the discussion, we assume is normalized
so that .) The corresponding left eigenvector :
|
|
|
is the signed measure
such that .
To continue a somewhat informal discussion of the interpretation of ,
it is convenient to switch to continuous time
(to avoid issues of negative eigenvalues)
and to assume has multiplicity .
The equation which relates distribution at time to initial distribution,
|
|
|
can also be used to define signed measures evolving from an initial
signed measure.
For the initial measure we have
|
|
|
For any signed measure with
we have
|
|
|
So can be regarded as “the signed measure which relaxes to
most slowly”.
For a probability measure , considering gives
|
|
|
(3.80) |
So has the interpretation of
“the asymptotic normalized difference between the true distribution at
time and the stationary distribution”.
Finally, from (3.80) with concentrated at
(or from the spectral representation)
|
|
|
So has the interpretation of
“the asymptotic normalized size of deviation from stationarity,
as a function of the starting state”.
When the state space has some geometric structure – jumps go to nearby states –
one expects to be a “smooth” function, exemplified by
the cosine function arising in
the -cycle
(Chapter 5 ††margin: 4/22/96 versionExample 7).
3.6.4 Simple applications
Here is a fundamental “finite-time” result.
††margin: Good name for Lemma 3.26 – looks good to DA !
Lemma 3.26 ( contraction lemma)
Write for the distribution at time of
a continuous-time chain, with arbitrary initial distribution.
Then
|
|
|
Proof.
Write . Then
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Integrating,
,
and the result follows.
Alternatively, Lemma 3.26 follows by observing that
|
is CM with spectral gap at least |
|
(3.81) |
and applying Lemma 3.16. The fact (3.81) can be
established directly
from the spectral representation, but we will instead apply the observation
at (3.50)–(3.51). Indeed, with , we
have
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thus by (3.51)
|
|
|
Our main use of the extremal characterization is to compare relaxation
times of different chains on the same (or essentially the same)
state space.
Here are three instances.
The first is a result we have already exploited in Section 3.5.
Corollary 3.27
Given a chain with relaxation time , let be the
relaxation time of the chain with subset collapsed to a singleton
(Chapter 2 ††margin: 9/10/99 versionSection 7.3).
Then .
Proof.
Any function on the states of the collapsed chain can be extended
to the original state space by setting on ,
and and and are unchanged.
So consider a attaining the sup in the extremal characterization
of and use this as a test function in the extremal
characterization of .
Remark.
An extension of Corollary 3.27 will be provided by
the contraction principle (Chapter 4 ††margin: 10/11/94 versionProposition 44).
Corollary 3.28
Let be the relaxation time for a “fluid model” continuous-time
chain associated with a graph with weights
[recall (3.16)] and let be the
relaxation time
when the weights are .
If for all edges then .
Proof.
Each stationary distribution is uniform, so
while .
So the result is immediate from the extremal characterization.
The next result is a prototype for more complicated “indirect comparison”
arguments later.
††margin: refer to not-net-written Poincare chapter
It is convenient to state it in terms of random walk on a weighted graph.
Recall (Section 3.2) that a reversible chain specifies a weighted
graph with edge-weights , vertex-weights
, and total weight .
Lemma 3.29 (the direct comparison lemma)
Let and be edge-weights on a graph,
let and be the vertex-weights,
and let and be the relaxation times for the
associated random walks. Then
|
|
|
where in we don’t count loops .
Proof.
For any ,
by (3.77)
|
|
|
And since ,
|
|
|
So if has -mean and -mean then
|
|
|
By considering the attaining the extremal characterization of ,
|
|
|
This is the lower bound in the lemma,
and the upper bound follows by reversing the roles of and .
Remarks.
Sometimes is very sensitive to apparently-small changes
in the chain.
Consider random walk on an unweighted graph.
If we add extra edges, but keeping the total number of added edges
small relative to the number of original edges,
then we might guess that could not increase or decrease much.
But the examples outlined below show that may in fact
change substantially in either direction.
Example 3.30
Take two complete graphs on vertices and join with a single edge.
Then and .
But if we extend the single join-edge to an -edge matching of
the vertices in the original two complete graphs, then
but .
Example 3.31
Take a complete graph on vertices.
Take new vertices and attach each to distinct vertices
of the original complete graph.
Then and is bounded.
But if we now add all edges within the new vertices,
but provided .
As these examples suggest, comparison arguments are most
effective when the stationary distributions coincide.
Specializing Lemma 3.29 to this case, and rephrasing in terms
of (reversible) chains, gives
Lemma 3.32 (the direct comparison lemma)
For transition matrices and with the same stationary
distribution , if
|
|
|
then .
Remarks.
The hypothesis can be rephrased as
,
where is a (maybe not irreducible) reversible transition matrix
with stationary distribution .
When we have ,
so an interpretation of the lemma is that
“combining transitions of with noise can’t increase mixing time
any more than combining transitions with holds”.
3.6.5 Quasistationarity
Given a subset of states in a discrete-time chain, let
be restricted
to . Then will be a substochastic matrix, i.e., the
row-sums are at
most , and some row-sum is strictly less than .
Suppose is irreducible.
As a consequence of the Perron–Frobenius theorem
(e.g., [183] Theorem 8.4.4)
for the nonnegative matrix , there is a unique
(specifically, the largest eigenvalue of )
such that there is a probability distribution satisfying
|
|
|
(3.82) |
Writing and to emphasize dependence on ,
(3.82)
implies
that
under the hitting time has
geometric distribution
|
|
|
whence
|
|
|
Call
the quasistationary distribution and the
quasistationary mean exit time.
Similarly, for a continuous-time chain let be
restricted to .
Assuming irreducibility of the substochastic chain with
generator , there is a
unique such that there is a probability
distribution (called the quasistationary
distribution) satisfying
|
|
|
This implies that
under the hitting time has
exponential distribution
|
|
|
whence the quasistationary mean exit time is
|
|
|
(3.83) |
Note that both and are unaffected by continuization
of
a discrete-time chain.
The facts above do not depend on reversibility,
but invoking now our standing assumption that chains are reversible
we will show in remark (c) following Theorem 3.33 that, for
continuous-time chains, here agrees with the spectral
gap
discussed in Proposition 3.21, and we can also now prove our
second extremal
characterization.
Theorem 3.33
(Extremal characterization of quasistationary mean hitting time)
The quasistationary mean exit time satisfies
|
|
|
(3.84) |
Proof.
As usual, we give the proof in discrete time. The matrix is symmetric with largest eigenvalue .
Putting in the characterization (3.78) gives
|
|
|
Clearly the sup is attained by nonnegative , and though
the sums above are technically over we can sum over
all by setting on . So
|
|
|
As in the proof of Theorem 3.25 this rearranges to
|
|
|
establishing Theorem 3.33.
Remarks.
(a) These remarks closely parallel the remarks at the end of
Section 3.6.3.
The sup in Theorem 3.33 is attained by the function
which is the right eigenvector associated with ,
and by reversibility this is
|
|
|
(3.85) |
It easily follows from (3.82) that
|
|
|
which explains the name
quasistationary distribution
for .
A related interpretation of is as the distribution of the Markov
chain conditioned on having been in for the infinite past.
More precisely, one can use Perron–Frobenius theory to
prove that
|
|
|
(3.86) |
provided is aperiodic
as well as irreducible.
(b) Relation (3.86) holds in continuous time as well
(assuming irreducibility
of the chain restricted to ), yielding
|
|
|
|
|
|
|
|
|
|
Since by Proposition 3.21 the distribution of for the
stationary chain is CM
with spectral gap (say) , the limit here is . Thus
, that is, our two uses of refer to
the same quantity.
(c) We conclude from remark (b), (3.83), and the final
inequality in (3.69)
that,
††margin: This is needed at the bottom of page 27 (9/22/96 version)
in Chapter 5.
in either continuous or discrete time,
|
|
|
(3.87) |
Our
fundamental use of quasistationarity is the following.
Corollary 3.34
For any subset ,
the quasistationary mean hitting time satisfies
|
|
|
Proof.
As at (3.85)
set , so
|
|
|
(3.88) |
Now , so
applying the extremal characterization of relaxation time to ,
|
|
|
(3.89) |
the last equality using (3.88).
Since is a probability distribution on we have
|
|
|
and so by Cauchy–Schwarz
|
|
|
Rearranging,
|
|
|
and substituting into (3.89) gives the desired bound.
Combining
Corollary 3.34 with (3.68) and (3.83) gives the
result below.
††margin: DA has deleted discrete-time claim in previous version.
It seems true but not worth sweating over.
Lemma 3.35
|
|
|