The parameter is the relaxation time, defined in terms
of the eigenvalue (Chapter 3 section yyy) as
|
|
|
|
|
|
|
|
|
|
In Chapter 3 yyy we proved a lower bound
for an -state discrete-time chain:
|
|
|
which is attained by random walk on the complete graph.
We saw in Chapter 3 Theorem yyy the extremal characterization
|
|
|
(4.22) |
The next three lemmas give inequalities between and
the parameters studied earlier in this chapter.
Write .
Proof of Lemma 4.23.
Consider first the continuous time case.
By the spectral representation,
as we have
with some .
But by Lemma 4.5 we have
.
This shows .
For the right inequality,
the spectral representation gives
|
|
|
(4.23) |
Recalling the definition (4.14) of ,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
and the result follows.
The upper bound on
holds in continuous time by Lemmas 4.11 and 4.12,
and so holds in discrete time because and
are unaffected by continuization.
Proof of Lemma 4.25.
Fix states such that
and fix a function attaining the sup in the extremal
characterization (Chapter 3 Theorem yyy), so that
|
|
|
Write .
Applying the extremal characterization of to the
centered function
,
|
|
|
But
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
establishing the lemma.
Simple examples show that the bounds in these Lemmas cannot
be much improved in general.
Specifically
(a) on the complete graph (Chapter 5 Example yyy)
and .
(b) On the barbell (Chapter 5 Example yyy),
and are asymptotic to each other.
(c) In the queue,
as .
In the context of Lemma 4.23,
if we want to relate itself to eigenvalues in discrete time
we need to take almost-periodicity into account and use
in place of .
Rephrasing the proof of Lemma 4.23 gives
Regarding a discrete-time chain as random walk on a weighted
graph, let be the diameter of the graph.
By considering the definition of the variation distance
and initial vertices at distance ,
it is obvious that for ,
and hence .
Combining with the upper bound in Lemma 4.26
leads to a relationship between the diameter and the eigenvalues
of a weighted graph.
4.4.1 Correlations and variances for the stationary chain
Perhaps the most natural probabilistic interpretation of
is as follows.
Recall that the correlation between random variables is
|
|
|
For a stationary Markov chain define the
maximal correlation function
|
|
|
This makes sense for general chains (see Notes for further comments),
but under our standing assumption of reversibility we have
Lemma 4.28
In continuous time,
|
|
|
In discrete time,
|
|
|
where .
This is a translation of the Rayleigh-Ritz characterization of
eigenvalues (Chapter 3 yyy) – we leave the details to the reader.
Now consider a function with
and .
Write
|
|
|
in continuous time |
|
|
|
|
in discrete time. |
|
Recall from Chapter 2 yyy
that for general chains there is a limit variance
.
Reversibility gives extra qualitative and quantitative information.
The first result refers to the stationary chain.
Proposition 4.29
In continuous time,
, where
|
|
|
And
,
where
|
|
|
In discrete time,
|
|
|
|
|
|
and so in particular
|
|
|
(4.24) |
Proof.
Consider first the continuous time case.
A brief calculation using
the spectral representation (Chapter 3 yyy) gives
|
|
|
(4.25) |
where .
So
|
|
|
|
|
(4.26) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.27) |
by change of variables in the integral defining .
The right side increases with to
|
|
|
(4.28) |
and the sum here is at most
.
On the other hand, is increasing, so
|
|
|
In discrete time the arguments are messier, and we will omit details of calculations. The analog of (4.26) becomes
|
|
|
In place of the change of variables argument for (4.27),
one needs an elementary calculation to get
|
|
|
(4.29) |
|
|
|
This shows
|
|
|
and the sum is bounded above by
|
|
|
Next, rewrite (4.29) as
|
|
|
Then the upper bound for follows by checking
|
|
|
For the lower bound, one has to verify
|
|
|
where in the sequel we may assume .
Then
|
|
|
and so
|
|
|
But
|
|
|
giving the lower bound.
Note that even in discrete time it is that matters
in Proposition 4.29. Eigenvalues near are irrelevant,
except that for a periodic chain we have
for one particular function (which?).
Continuing the study of
or its discrete analog
for a stationary chain,
standardize to the case where
.
Proposition 4.29 provides finite-time bounds for the
asymptotic approximation of variance.
One would like a similar finite-time bound for the asymptotic
Normal approximation of the distribution of .
Open Problem 4.30
Is there some explicit function
as , not depending on the chain, such that
for standardized and continuous-time chains,
|
|
|
where
and has Normal distribution?
See the Notes for further comments.
For the analogous result about large deviations see Chapter yyy.
4.4.2 Algorithmic issues
Suppose we want to estimate the
average of a function defined
on state space.
If we could sample i.i.d. from we would need
order samples to get an estimator
with error about .
Now consider the setting where we cannot directly sample from
but instead use the
“Markov Chain Monte Carlo” method of setting up a reversible chain
with stationary distribution .
How many steps of the chain do we need to get the same accuracy?
As in section 4.3.3, because we typically can’t quantify the closeness
to of a feasible initial distribution, we consider
bounds which hold for arbitrary initial states.
In assessing the number of steps required,
there are two opposite traps to avoid.
The first is to say (cf. Proposition 4.29) that
steps suffice.
This is wrong because
the relaxation time bounds apply to the stationary chain and
cannot be directly applied to a non-stationary chain.
The second trap is to say that because it take steps to
obtain one sample from the stationary distribution, we therefore
need order steps in order to get
independent samples.
This is wrong because we don’t need independent samples.
The correct answer is
order steps.
The conceptual idea (cf. the definition of )
is to find a stopping time achieving distribution and use it as
an initial state for
simulating the stationary chain.
More feasible to implement is the following algorithm.
Algorithm.
For a specified real number and an integer ,
generate with Poisson distribution.
Simulate the chain from arbitrary initial distribution for steps and calculate
|
|
|
Corollary 4.31
|
|
|
where is separation (recall section 4.3.1) for the continuized chain.
To make the right side approximately we may take
|
|
|
Since the mean number of steps is ,
this formalizes the idea that
we can estimate to within in
order steps.
Proof.
We may suppose .
Since has the distribution of the continuized chain at
time , we may use the definition of to write
|
|
|
for some probability distribution .
It follows that
|
|
|
|
|
|
Apply (4.24).
4.4.3 and distinguished paths
The extremal characterization (4.22) can be used to get lower
bounds on by considering a tractable test function .
(xxx list examples).
As mentioned in Chapter 3, it is an open problem to give an extremal
characterization of as exactly an inf over flows
or similar constructs.
As an alternative, Theorem 4.32 gives a non-exact upper bound
on involving quantities derived from arbitrary choices of paths
between states.
An elegant exposition of this idea, expressed by the first inequality in Theorem 4.32, was given by
Diaconis and Stroock [122],
and Sinclair [307] noted the second inequality.
We copy their proofs.
We first state the result in the setting of random walk on a weighted
graph.
As in section 4.1,
consider a path
,
and call this path .
This path has length and has “resistance”
|
|
|
where here and below denotes a directed edge.
Theorem 4.32
For each ordered pair of vertices in a weighted graph, let
be a path from to .
Then for discrete-time random walk,
|
|
|
|
|
|
Note that the two inequalities coincide on an unweighted graph.
Proof.
For an edge write .
The first equality below is the fact
for i.i.d. ’s,
and the first inequality is Cauchy-Schwarz.
|
|
|
|
|
(4.30) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(4.31) |
|
|
|
|
|
|
|
|
|
|
(4.32) |
where is the max in the first inequality in the statement of the Theorem.
The first inequality now follows from the extremal characterization
(4.22).
The second inequality makes a simpler use of the Cauchy-Schwarz
inequality, in which we replace (4.30,4.31,4.32) by
|
|
|
|
|
(4.33) |
|
|
|
|
|
|
|
|
|
|
where is the max in the second inequality in the statement of the Theorem.
Remarks.
(a)
Theorem 4.32 applies to continuous-time
(reversible) chains by setting .
(b) One can replace the deterministic choice of paths
by random paths of the form
of random length .
The second inequality extends in the natural way,
by taking expectations in (4.33) to give
|
|
|
and the conclusion is
Corollary 4.33
|
|
|
(c) Inequalities in the style of Theorem 4.32 are often called
Poincaré inequalities because, to quote [122],
they are “the discrete analog of the classical method of Poincaré
for estimating the spectral gap of the Laplacian on a domain
(see e.g. Bandle [39])”.
I prefer the descriptive name
the distinguished path method.
This method has the same spirit as the coupling method
for bounding (see Chapter yyy), in that we
get to use our skill and judgement in making wise choices of
paths in specific examples.
xxx list examples.
Though its main utility is in studying hard examples, we give
some simple illustrations of its use below.
Write the conclusion of Corollary 4.33 as
.
Consider a regular unweighted graph, and let be chosen
uniformly from the set of minimum-length paths from to .
Suppose that takes the same value for every directed edge .
A sufficient condition for this is that the graph be
arc-transitive (see Chapter 8 yyy).
Then, summing over edges in Corollary 4.33,
|
|
|
where is the number of directed edges.
Now , so we may reinterpret this inequality as follows.
Corollary 4.34
For random walk on an arc-transitive graph,
,
where is the distance between independent uniform
random vertices .
In the context of the -dimensional torus ,
the upper bound
is asymptotic (as ) to
where the are independent uniform ,
This bound is asymptotic to
.
Here (Chapter 5 Example yyy) in fact
, so for fixed the bound is off by only a constant.
On the -cube (Chapter 5 Example yyy), has Binomial distribution and so the bound is
, while in fact .
Intuitively one feels that the bound in Corollary 4.34 should hold
for more general graphs, but the following example illustrates
a difficulty.
Example 4.35
Consider the graph on vertices obtained from two complete
graphs on vertices by adding edges comprising a matching
of the two vertex-sets.
Here a straightforward implementation of Theorem 4.32 gives
an upper bound of , while in fact .
On the other hand the conclusion of Corollary 4.34 would give
an bound.
Thus even though this example has a strong symmetry property
(vertex-transitivity: Chapter 8 yyy)
no bound like Corollary 4.34 can hold.