Recall from Chapter 2 yyy that denotes variation distance and
We define the parameter
(4.11) |
The choice of constant , and of using instead of , are rather arbitrary, but this choice makes the numerical constants work out nicely (in particular, makes – see section 4.4). Submultiplicativity gives
.
The point of parameter is to formalize the idea of “time to approach stationarity, from worst starting-place”. The fact that variation distance is just one of several distances one could use may make seem a very arbitrary choice, but Theorem 4.6 below says that three other possible quantifications of this idea are equivalent. Here equivalent has a technical meaning: parameters and are equivalent if their ratio is bounded above and below by numerical constants not depending on the chain. (Thus (4.2) says and are equivalent parameters). More surprisingly, is also equivalent to two more parameters involving mean hitting times. We now define all these parameters.
xxx Warning. Parameters in this draft were parameters in the previous draft.
The first idea is to measure distance from stationarity by using ratios of probabilities. Define separation from stationarity to be
Then is submultiplicative, so we naturally define the separation threshold time to be
The second idea is to consider minimal random times at which the chain has exactly the stationary distribution. Let
where the is over stopping times such that . As a variation on this idea, let us temporarily write, for a probability distribution on the state space,
where the is over stopping times such that . Then define
Turning to the parameters involving mean hitting times, we define
(4.12) |
where the equality involves the fundamental matrix and holds by the mean hitting time formula. Parameter measures variability of mean hitting times as the starting place varies. The final parameter is
Here we can regard the right side as the ratio of , the Markov chain mean hitting time on , to , the mean hitting time under independent sampling from the stationary distribution.
The definitions above make sense in either discrete or continuous time, but the following notational convention turns out to be convenient. For a discrete-time chain we define to be the value obtained by applying the definition (4.11) to the continuized chain, and write for the value obtained for the discrete-time chain itself. Define similarly and . But the other parameters are defined directly in terms of the discrete-time chain. We now state the equivalence theorem, from Aldous [12].
(a) In either discrete or continuous time, the parameters
and
are equivalent.
(b) In discrete time,
and are equivalent,
and .
This will be (partially) proved in section 4.3.2, but let us first give a few remarks and examples. The parameter and total variation distance are closely related to the notion of coupling of Markov chains, discussed in Chapter 14. Analogously (see the Notes), the separation and the parameter are closely related to the notion of strong stationary times for which
(4.13) |
Under our standing assumption of reversibility there is a close connection between separation and variation distance, indicated by the next lemma.
(a) .
(b)
.
Proof. Part (a) is immediate from the definitions. For (b),
Note also that the definition of involves lower bounds in the convergence . One can make a definition involving upper bounds
(4.14) |
where the equality (Chapter 3 Lemma yyy) requires in discrete time that be even. This yields the following one-sided inequalities, but Example 4.9 shows there can be no such reverse inequality.
(a)
.
(b)
Proof. Part (b) follows from part (a) and the definitions. Part (a) is essentially just the “” inequality, but let’s write it out bare-hands.
Consider a continuous-time -state chain with transition rates
Here . It is easy to check that is bounded as . But as , and so by considering state we have as for any fixed .
Remark. In the nice examples discussed in Chapter 5 we can usually find a pair of states such that
The next example shows this is false in general.
Consider random walk on the weighted graph
for suitably small . As we have , the attained by pairs or . But as we have where and where the max is attained by pairs .
As a final comment, one might wonder whether the minimizing distribution in the definition of were always , i.e. whether always. But a counter-example is provided by random walk on the -star (Chapter 5 yyy) where (by taking to be concentrated on the center vertex) but .
We will prove
.
These lemmas hold in discrete and continuous time, interpreting as is discrete time. Incidentally, Lemmas 4.12, 4.13 and 4.14 do not depend on reversibility. To complete the proof of Theorem 4.6 in continuous time we would need to show
(4.15) |
for some absolute constant . The proof I know is too lengthy to repeat here – see [12]. Note that (from its definition) , so that (4.15) and the lemmas above imply in continuous time. We shall instead give a direct proof of a result weaker than (4.15):
.
Turning to the assertions of Theorem 4.6 is discrete time, (b) is given by the discrete-time versions of Lemmas 4.11 and 4.12. To prove (a), it is enough to show that the numerical values of the parameters are unchanged by continuizing the discrete-time chain. For and this is clear, because continuization doesn’t affect mean hitting times. For and it reduces to the following lemma.
Let be a discrete-time chain and be its continuization, both started with the same distribution. Let be a randomized stopping time for . Then there exists a randomized stopping time for such that and .
Proof of Lemma 4.11. The left inequality is immediate from Lemma 4.7(a), and the right inequality holds because
Proof of Lemma 4.12. The left inequality is immediate from the definitions. For the right inequality, fix . Write , so that
We can construct a stopping time such that
and then by induction on such that
Then and . So .
Remark. What the argument shows is that we can construct a strong stationary time (in the sense of (4.13)) such that
(4.16) |
Proof of Lemma 4.13. Consider the probability distribution attaining the min in the definition of , and the associated stopping times . Fix . Since ,
The random target lemma (4.7) says and so
Writing for the left sum, the definition of and the triangle inequality give , and the Lemma follows.
Proof of Lemma 4.14. Fix a subset and a starting state . Then for any ,
where is the hitting place distribution . So
Proof of Lemma 4.15. For small to be specified later, define
Note that Markov’s inequality and the definition of give
(4.17) |
Next, for any
by monotonicity of . Thus for we have
and applying Chapter 3 Lemma yyy (b)
Now let be arbitrary and let . For any ,
and so
(4.18) |
Now
using Markov’s inequality and the definition of . And , the first inequality being Lemma 4.14 and the second being an easy consequence of the definitions. Combining (4.18) and the subsequent inequalities shows that, for and arbitrary
Applying this to arbitrary and we get
Putting makes the bound .
Remark. The ingredients of the proof above are complete monotonicity and conditioning on carefully chosen hitting times. The proof of (4.15) in [12] uses these ingredients, plus the minimal hitting time construction in the recurrent balayage theorem (Chapter 2 yyy).
Outline proof of Lemma 4.16. The observant reader will have noticed (Chapter 2 yyy) that we avoided writing down a careful definition of stopping times in the continuous setting. The definition involves measure-theoretic issues which I don’t intend to engage, and giving a rigorous proof of the lemma is a challenging exercise in the measure-theoretic formulation of continuous-time chains. However, the underlying idea is very simple. Regard the chain as constructed from the chain and exponential() holds . Define , where is the Poisson counting process . Then by construction and by the optional sampling theorem for the martingale .
Of course for period- chains we don’t have convergence to stationarity in discrete time, so we regard . Such chains – random walks on bipartite weighted graphs – include several simple examples of unweighted graphs we will discuss in Chapter 5 (e.g. the -path and -cycle for even , and the -cube) and Chapter 7 (e.g. card-shuffling by random transpositions, if we insist on transposing distinct cards).
As mentioned in Chapter 1 xxx, a topic of much recent interest has been “Markov Chain Monte Carlo”, where one constructs a discrete-time reversible chain with specified stationary distribution and we wish to use the chain to sample from . We defer systematic discussion to xxx, but a few comments are appropriate here. We have to start a simulation somewhere. In practice one might use as initial distribution some distribution which is feasible to simulate and which looks intuitively “close” to , but this idea is hard to formalize and so in theoretical analysis we seek results which hold regardless of the initial distribution, i.e. “worst-case start” results. In this setting is, by definition, the minimum expected time to generate a sample with distribution . But the definition of merely says a stopping time exists, and doesn’t tell us how to implement it algorithmically. For algorithmic purposes we want rules which don’t involve detailed structure of the chain. The most natural idea – stopping at a deterministic time – requires one to worry unnecessarily about near-periodicity. One way to avoid this worry is to introduce holds into the discrete-time chain, i.e. simulate instead of . As an alternative, the distribution of the continuized chain at time can be obtained by simply running the discrete-time chain for a Poisson() number of steps. “In practice” there is little difference between these alternatives. But the continuization method, as well as being mathematically less artificial, allows us to avoid the occasional messiness of discrete-time theory (see e.g. Proposition 4.29 below). In this sense our use of for discrete-time chains as the value for continuous-time chains is indeed sensible: it measures the accuracy of a natural algorithmic procedure applied to a discrete-time chain.
Returning to technical matters, the fact that a periodic (reversible, by our standing assumption) chain can only have period suggests that the discrete-time periodicity effect could be eliminated by averaging over times and only, as follows.
Show there exist as and as such that, for any discrete-time chain,
where refers to the continuized chain.
See the Notes for some comments on this problem.
If one does wish to study distributions of discrete-time chains at deterministic times, then in place of one needs to use
(4.19) |
The spectral representation then implies
(4.20) |
In general may be much smaller than or . For instance, random walk on the complete graph has while . So we cannot (without extra assumptions) hope to improve much on the following result.
For an -state chain, in discrete or continuous time,
Lemmas 4.24 and 4.25 later are essentially stronger, giving corresponding upper bounds in terms of instead of . But a proof of Lemma 4.18 is interesting for comparison with the cat-and-mouse game below.
Proof of Lemma 4.18. By definition of , for the chain started at we can find stopping times such that
So has , and so
where the second inequality is justified below. The second assertion of the lemma is now clear, and the first holds by averaging over .
The second inequality is justified by the following martingale result, which is a simple application of the optional sampling theorem. The “equality” assertion is sometimes called Wald’s equation for martingales.
Let be such that
for a constant . Then for any stopping time ,
If in the hypothesis we replace “” by “”, then .
Cat-and-Mouse Game. Here is another variation on the type of game described in Chapter 3 section yyy. Fix a graph. The cat starts at some vertex and follows a continuous-time simple random walk. The mouse starts at some vertex and is allowed an arbitrary strategy. Recall the mouse can’t see the cat, so it must use a deterministic strategy, or a random strategy independent of the cat’s moves. The mouse seeks to maximize , the time until meeting. Write for the of over all starting positions and all strategies for the mouse. So just depends on the graph. Clearly , since the mouse can just stand still.
Does ? In other words, is it never better to run than to hide?
Here’s a much weaker upper bound on . Consider for simplicity a regular -vertex graph. Then
(4.21) |
Because as remarked at (4.16), we can construct a strong stationary time such that , say. So we can construct such that
So regardless of the mouse’s strategy, the cat has chance to meet the mouse at time , independently as varies, so the meeting time satisfies where is a stopping time with mean , and (4.21) follows from Lemma 4.19. This topic will be pursued in Chapter 6 yyy.
Since discrete-time chains can be identified with random walks on weighted graphs, relating properties of the chain to properties of “flows” on the graph is a recurring theme. Thompson’s principle (Chapter 3 yyy) identified mean commute times and mean hitting times from stationarity as infs over flows of certain quantities. Sinclair [307] noticed that could be related to “multicommodity flow” issues, and we give a streamlined version of his result (essentially Corollary 4.22) here. Recall from Chapter 3 section yyy the general notation of a unit flow from to , and the special flow induced by the Markov chain.
Consider a family , where, for each state , is a unit flow from to the stationary distribution . Define
in discrete time, and substitute for in continuous time. Let be the special flow induced by the chain. Then
where is the diameter of the transition graph.
Proof. We work in discrete time (the continuous case is similar). By Chapter 3 yyy
and so
Thus
The result now follows because by (4.12)
where and are not required to be neighbors.
.
Unfortunately it seems hard to get analogous upper bounds. In particular, it is not true that
To see why, consider first random walk on the -cycle (Chapter 5 Example yyy). Here and , so the upper bound in Lemma 4.21 is the right order of magnitude, since . Now modify the chain by allowing transitions between arbitrary pairs with equal chance . The new chain will still have , and by considering the special flow in the original chain we have , but now the diameter .