One advantage of working in continuous time is to exploit
complete monotonicity properties.
Abstractly, call
completely monotone (CM)
if there is a nonnegative measure on
such that
|
|
|
(3.44) |
Our applications will use only
the special case of a finite sum
|
|
|
(3.45) |
but finiteness plays no essential role.
If is CM then (provided they exist) so are
|
|
|
(3.46) |
A probability distribution on is called
CM if its tail distribution function
is CM; equivalently, if its density function is
CM (except that here we must in the general case allow the possibility
). In more probabilistic language, is CM iff it can
be expressed as the distribution of ,
where and are independent random variables such that
|
|
|
(3.47) |
Given a CM function or distribution, the
spectral gap can be defined consistently by
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This controls the behavior of as .
A key property of CM functions is that their value at a general
time can be bounded in terms of their behavior at and at ,
as follows.
Proof.
By scaling we may suppose .
So we can rewrite (3.44) as
|
|
|
(3.48) |
where has distribution .
Then
.
Because is decreasing, the random
variables and are
negatively correlated
(this fact is sometimes called “Chebyshev’s other inequality”,
and makes a nice
exercise [Hint: Symmetrize!])
and so
.
This says
,
or in other words
.
Integrating gives
,
which is the leftmost inequality.
(Recall we scaled to make .)
For the second inequality,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Finally, from the definition of the spectral gap
it is clear that .
But has the same spectral gap as .
Returning to the study of continuous-time reversible chains,
the spectral representation (3.40)
says that is a CM function.
It is often convenient to subtract the limit and say
|
|
|
(3.49) |
More generally, given any function the function
|
|
|
(3.50) |
is CM for the stationary chain, because
by (3.33)
|
|
|
|
|
(3.51) |
|
|
|
|
|
Specializing to the case and conditioning,
|
|
|
(3.52) |
again assuming the stationary chain.
When is a singleton, this is (3.49).
Remark.
To study directly
discrete-time reversible chains, one would replace CM functions by
sequences of the form
|
|
|
But analogs of Lemma 3.16 and subsequent results (e.g., Proposition 3.22) become messier—so we prefer to derive
discrete-time results by continuization.
3.5.1 Lower bounds on mean hitting times
As a quick application, we give bounds on mean hitting times to
a single state from a stationary start.
Recall is the exit rate from ,
and is the relaxation time of the chain.
Lemma 3.17
For any state in a continuous-time chain,
|
|
|
By continuization, the Lemma holds in discrete time, replacing
by .
Proof.
The mean hitting time formula is
|
|
|
Write for the integrand.
We know is CM, and here by (3.40),
and ,
so the extreme bounds of Lemma 3.16 become,
after multiplying by ,
|
|
|
Integrating these bounds gives the result.
We can now give general lower bounds on some basic parameters
we will study in Chapter 4.
Proposition 3.18
For a discrete-time chain on states,
|
|
|
|
|
(3.53) |
|
|
|
|
|
(3.54) |
|
|
|
|
|
(3.55) |
|
|
|
|
|
(3.56) |
Remark.
These inequalities become equalities for random walk on the
complete graph (Chapter 5 ††margin: 4/22/96 versionExample 9).
By examining the proof, it can be shown that this is the only chain
where an equality holds.
Proof.
We go to the continuized chain, which has .
Then
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
giving (3.53).
By the eigentime identity,
|
|
|
and so (3.56) follows from (3.53).
Now fix and write ,
which (by the random target lemma) doesn’t depend on . Then
|
|
|
(3.57) |
If the right side were strictly less than for all , then
|
|
|
which implies
|
|
|
contradicting (3.53).
Therefore there exists an such that
|
|
|
and so there exists such that
.
This is (3.54), and (3.55) follows immediately.
There are several other results in the spirit of Lemma 3.17 and
Proposition 3.18.
For instance, (22) in Chapter 2 ††margin: 9/10/99 versionsays that
for a general discrete-time chain,
|
|
|
Appealing to Lemma 3.17 gives, after a little algebra,
Corollary 3.19
For any state in a discrete-time chain,
|
|
|
Again, equality holds for random walk on the complete graph.
3.5.2 Smoothness of convergence
We’re going to build some vague discussion around the following simple
result.
Lemma 3.20
|
|
|
(3.58) |
|
|
|
(3.59) |
|
|
|
(3.60) |
Proof.
|
|
|
by reversibility.
Putting , gives (3.58).
Rewriting the above equality as
|
|
|
and applying the Cauchy–Schwarz inequality,
we get the bound , where
|
|
|
This proves (3.59).
The cruder bound (3.60) is sometimes easier to use
than (3.59) and
is proved similarly.
Discussion.
Recalling from Chapter 2 ††margin: 9/10/99 versionSection 4.2 the
definition of distance
between distributions, (3.58) says
|
|
|
(3.61) |
In continuous time,
we may regard the assertion
“
is decreasing in ”
as a consequence of the equality in (3.61) and
the CM property of .
This assertion in fact holds for general chains,
as pointed out in Chapter 2 ††margin: 9/10/99 versionLemma 35.
Loosely, the general result of Chapter 2 ††margin: 9/10/99 versionLemma 35
says that in a
general chain the
ratios
considered as an unordered set tend to
smooth out as increases.
For a reversible chain, much more seems to be true.
There is some “intrinsic geometry” on the state space such that,
for the chain started at , the probability distribution
as time increases from
“spreads out smoothly” with respect to the geometry.
It’s hard to formalize that idea convincingly.
On the other hand, (3.61) does say convincingly that the
rate of convergence of the single probability
to is connected to a rate of convergence of the
entire distribution to .
This intimate connection between the local and the global behavior
of reversible chains underlies many of the technical inequalities
concerning mixing times
in Chapter 4 ††margin: 10/11/94 versionand subsequent chapters.
3.5.3 Inequalities for hitting time distributions on subsets
We mentioned in Chapter 2 ††margin: 9/10/99 versionSection 2.2 that most of
the simple
identities there for mean hitting times on singletons have
no simple analogs
for hitting times on subsets.
One exception is Kac’s formula (Chapter 2 ††margin: 9/10/99 versionCorollary 24), which says
that for a general discrete-time chain
|
|
|
(3.62) |
It turns out that for reversible chains there are useful inequalities
relating the distributions of under different initial distributions.
These are simplest in continuous time as consequences of CM: as always, interesting consequences may be applied to discrete-time
chains via continuization.
Recall is the stationary distribution conditioned to :
|
|
|
Trivially
|
|
|
|
|
(3.63) |
|
|
|
|
|
(3.64) |
Define the
ergodic exit distribution from by
|
|
|
(3.65) |
where is the ergodic flow rate out of :
|
|
|
(3.66) |
By stationarity, .
Proposition 3.21
Fix a subset in a continuous-time chain.
(i) has CM distribution when the initial distribution of the
chain is any of the three distributions
or or .
(ii) The three hitting time distributions determine each other via (3.63) and
|
|
|
(3.67) |
(iii) Write for the spectral gap associated with
(which is the same for each of the three initial distributions).
Then
|
|
|
(3.68) |
and in particular
|
|
|
(3.69) |
(iv)
|
|
|
(3.70) |
Remarks.
(a) In
††margin: Concerning (b): The results cited from later require that the chain
restricted to be irreducible, but I think that requirement can be dropped
using a limiting argument.
discrete time we can define and
by replacing by in (3.65)–(3.66),
and then (3.69) holds in discrete time.
The left equality of (3.69) is then a reformulation of
Kac’s formula (3.62), because
|
|
|
|
|
|
|
|
|
|
(b) Equation (3.83) and Corollary 3.34 [together with remark (b)
following Theorem 3.33] later show that
.
So (3.70) can be regarded as a consequence of (3.69).
Reverse inequalities will be studied in Chapter 4. ††margin: 10/11/94 version
Proof of Proposition 3.21.
First consider the case where is a singleton .
Then (3.70) is an immediate consequence of Lemma 3.17.
The equalities in (3.69) and in (3.67) are
general identities for stationary processes [(24) and (23) in
Chapter 2].
††margin: 9/10/99 versionWe shall prove below that is CM under .
Then by (3.63), (3.67), and (3.46), is also CM
under the other two initial distributions.
Then the second inequality of (3.68) is the upper bound in
Lemma 3.16,
and the first is a consequence of (3.67) and Lemma 3.16.
And (3.69) follows from (3.68) by integrating over .
To prove that is CM under ,
introduce a parameter
and consider the modified chain with transition rates
|
|
|
|
|
|
|
|
|
|
The modified chain remains reversible, and its stationary
distribution is of the form
|
|
|
where the weights depend only on and .
Now as with fixed,
|
|
|
(3.71) |
because the chain gets “stuck” upon hitting .
But the left side is CM by (3.52), so the right side
(which does not depend on ) is CM,
because the class of CM distributions is closed under pointwise limits.
(The last assertion is in general the continuity theorem for Laplace
transforms [133] p. 83,
though for our purposes we need only the simpler fact that the set
of functions of the form (3.45) with at most summands is closed.)
This completes the proof when is a singleton.
We now claim that the case of general follows from the collapsing
principle (Chapter 2 ††margin: 9/10/99 versionSection 7.3), i.e., by
applying the special
case to the chain in which the subset is collapsed into a single state.
This is clear for all the assertions of Proposition 3.21 except
for (3.70), for which we need the fact that the relaxation time
of the collapsed chain is at most .
This fact is proved as Corollary 3.27 below.
Remark.
Note that the CM property implies a supermultiplicitivity
property for hitting times from stationarity in a continuous-time
reversible chain:
|
|
|
Contrast with the general submultiplicitivity property
(Chapter 2 ††margin: 9/10/99 versionSection 4.3) which holds when
is replaced by
.
3.5.4 Approximate exponentiality of hitting times
In many circumstances, the distribution of the first hitting time
on a subset of states with small (equivalently,
with large)
can be approximated by the exponential distribution with the same
mean.
As with the issue of convergence to the stationary distribution, such
approximations can be proved for general chains (see Notes), but
it is easier to get explicit bounds in the reversible setting.
If has a CM distribution, then [as at (3.47), but replacing by ]
we may suppose . We calculate
|
|
|
and so
|
|
|
with equality iff is constant, i.e., iff has exponential
distribution.
This suggests that the difference
can be used as a measure of “deviation from exponentiality”.
Let us quote a result of Mark Brown ([72] Theorem 4.1(iii)) which
quantifies this idea in a very simple way.
Proposition 3.22
Let have CM distribution. Then
|
|
|
So we can use this bound for hitting times in a stationary reversible
chain.
At first sight the bound seems useful only if we can estimate
and accurately.
But the following remarkable variation shows that for the hitting
time distribution to be approximately exponential it is sufficient
that the mean hitting time be large compared to the relaxation time .
Proposition 3.23
For a subset of a continuous-time chain,
|
|
|
Proof.
By the collapsing principle (Chapter 2 ††margin: 9/10/99 versionSection 7.3)
we may suppose is a singleton ,
because (Corollary 3.27 below) collapsing cannot increase
the relaxation time.
Combining the mean hitting time formula with the expression (3.41)
for in terms of the spectral representation (3.33),
|
|
|
(3.72) |
A similar calculation, exhibited below, shows
|
|
|
(3.73) |
But
for ,
so the right side of (3.73) is bounded by ,
which by (3.72) equals .
Applying Proposition 3.22 gives Proposition 3.23.
We give a straightforward but tedious verification of (3.73)
(see also Notes).
The identity
starts the
††margin: Chapter 2 reference in following display is
to 9/10/99 version.
calculation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
by Chapter 2 Lemmas 12
and 15 (continuous-time version) |
|
|
|
|
|
|
Expanding the square, the cross-term vanishes and the first term
becomes , so
|
|
|
To finish the calculation,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
See the Notes for a related result, Theorem 3.43.