We have said several times that the theory in this book is fundamentally a theory of inequalities. “Universal” or “a priori” inequalities for reversible chains on finite state space, such as those in Chapter 4, should extend unchanged to the continuous space setting. Giving proofs of this, or giving the rigorous setup for continuous-space chains, is outside the scope of our intermediate-level treatment. Instead we just mention a few specific processes which parallel or give insight into topics treated earlier.
Let be one-dimensional standard Brownian motion (BM). Mentally picture a particle moving along an erratic continuous random trajectory. Briefly, for the increment has Normal distribution, and for non-overlapping intervals the increments are independent. See Norris [270] section 4.4, Karlin and Taylor [208] Chapter 7, or Durrett [133] Chapter 7 for successively more detailed introductions. One can do explicit calculations, directly in the continuous setting, of distributions of many random quantities associated with BM. A particular calculation we need ([133] equation 7.8.12) is
(13.1) |
where
(13.2) |
One can also regard BM as a limit of rescaled random walk, a result which generalizes the classical central limit theorem. If is simple symmetric random walk on , then the central limit theorem implies and the generalized result is
(13.3) |
where the convergence here is weak convergence of processes (see e.g. Ethier and Kurtz [141] for detailed treatment). For more general random flights on , that is with independent and and , we have Donsker’s theorem ([133] Theorem 7.6.6)
(13.4) |
Many asymptotic results for random walk on the integers or on the -cycle or on the -path, and their -dimensional counterparts, can be explained in terms of Brownian motion or its variants. The variants of interest to us take values in compact sets and have uniform stationary distributions.
Brownian motion on the circle can be defined by
and then random walk on the -cycle satisfies, by (13.3),
(13.5) |
The process has eigenvalues with eigenfunction for and two eigenfunctions and for . In particular the relaxation time is
The result for random walk on the -cycle (Chapter 5 Example 7)
can therefore be viewed as a consequence of the time-rescaling in (13.5) which takes random walk on the -cycle to Brownian motion on the circle. This argument is a prototype for the weak convergence paradigm: proving size-asymptotic results for discrete structures in terms of some limiting continuous structure.
Variation distance can be studied via coupling. Construct two Brownian motions on started from and as follows. Let be standard Brownian motion, and let
Then a.s. and we can define by
That is, the segment of over is the image of the corresponding segment of under the reflection which takes to . It is easy to see that is indeed Brownian motion started at . This is the reflection coupling for Brownian motion. We shall study analogous couplings for variant processes. Given Brownian motion on the circle started at , we can construct another Brownian motion on the circle started at via
where
Again, the segment of over is the image of the corresponding segment of under the reflection of the circle which takes to , so we call it the reflection coupling for Brownian motion on the circle. Because sample paths cannot cross without meeting, it is easy to see that the general coupling inequality (Chapter 4-3 section 1.1) becomes an equality:
The worst starting point is , and the hitting time in question can be written as the hitting time for standard Brownian motion, so
(13.6) |
by Brownian scaling, that is the property
(13.7) |
See the Notes for an alternative formula. Thus for Brownian motion on the circle
(13.8) |
If simple random walk is replaced by aperiodic random flight with step variance then the asymptotic values of and are replaced by and ; this may be deduced using the local central limit theorem ([133] Theorem 2.5.2).
Reflecting Brownian motion on the interval is very similar. Intuitively, imagine that upon hitting an endpoint or the particle is instantaneously inserted an infinitesimal distance into the interval. Formally one can construct as for the concertina map
The process has eigenvalues with eigenfunctions . In particular the relaxation time is
The result for random walk on the -path (Chapter 5 Example 8)
is another instance of the weak convergence paradigm, a consequence of the time-rescaling which takes random walk on the -path to reflecting Brownian motion on the interval. The variation distance function for can be expressed in terms of the corresponding quantity (write as ) for . Briefly, it is easy to check
and then to deduce . Then using (13.8)
(13.9) |
Standard -dimensional Brownian motion can be written as
where the component processes are independent one-dimensional standard Brownian motions. A useful property of is isotropy: its distribution is invariant under rotations of . In approximating simple random walk on one needs to be a little careful with scaling constants. The analog of (13.3) is
(13.10) |
where the factor arises because the components of the random walk have variance — see (13.4). Analogous to (13.5), random walk on the discrete torus converges to Brownian motion on the continuous torus :
(13.11) |
Fix a convex polyhedron . One can define reflecting Brownian motion in ; heuristically, when the particle hits a face it is replaced an infinitesimal distance inside , orthogonal to the face. As in the previous examples, the stationary distribution is uniform on . We will outline a proof of
For Brownian motion in a convex polyhedron
which is a subset of the ball of radius ,
(i)
(ii) .
Proof. By the -dimensional version of Brownian scaling (13.7) we can reduce to the case . The essential fact is
Let be reflecting Brownian motion on started at , and let be its hitting time on . Versions of Brownian motion in started from arbitrary points of can be constructed jointly with such that
(13.12) |
Granted this fact, for Brownian motion on satisfies
where the final equality holds because has the same distribution as the time for Brownian motion stared at to exit the interval . This establishes (i) for . Then from the asymptotics of in (13.1) we have , implying by Lemma LABEL:L-Chap4 and establishing (ii).
Sketch proof of Lemma. Details require familiarity with stochastic calculus, but this outline provides the idea. For two Brownian motions in started from and from , one can define the reflection coupling by making the first coordinates evolve as the one-dimensional reflection coupling, and making the other coordinate processes be identical in the two motions. Use isotropy to entend the definition of reflection coupling to arbitrary starting points. Note that the distance between the processes evolves as times one-dimensional Brownian motion, until they meet. The desired joint distribution of is obtained by specifying that while both processes are in the interior of , they evolve as the reflection coupling (and each process reflects orthogonally at faces). As the figure illustrates, the effect of reflection can only be to decrease distance between the two Brownian particles.
For a motion hitting the boundary at , if the unreflected process is at or an infinitesimal time later then the reflected process is at or . By convexity, for any we have ; so reflection can only decrease distance between coupled particles. To argue the inequality carefully, let be the vector normal to the face. The projection satisfies . Further, , implying . Therefore by Pythagoras .
We can therefore write, in stochastic calculus notation,
where is a one-dimensional Brownian motion and is an increasing process (representing the contribution from reflections off faces) which increases only when one process is at a face. But we can construct reflecting Brownian motion in terms of the same underlying by
where (representing the contribution from reflections off the endpoint ) is increasing until . At time we have (because )
We have shown
If the desired inequality (13.12 fails then it fails at some first time , which can only be a time when is increasing, that is when , at which times the inequality holds a priori. .
Proposition 13.1 suggests an approach to the algorithmic question of simulating a uniform random point in a convex set where is large, discussed in Chapter 9 section 5.1. If we could simulate the discrete-time chain defined as reflecting Brownian motion on examined at time intervals of for some small (so that the length of a typical step is of order ), then Proposition 13.1 implies that steps are enough to approach the stationary distribution. Since the convex set is available only via an oracle, one can attempt to do the simulation via acceptance/rejection. That is, from we propose a move to where has standard -variate Normal distribution, and accept the move iff . While this leads to a plausible heuristic argument, the rigorous difficulty is that it is not clear how close an acceptance/rejection step is to the true step of reflecting Brownian motion. No rigorous argument based directly on Brownian motion has yet been found, though the work of Bubley et al [78] on coupling of random walks has elements in common with reflection coupling.
Discrete-time, continuous-space chains arise in many settings, in particular (Chapter MCMC) in Markov Chain Monte Carlo sampling from a target distribution on . As discussed in that chapter, estimating mixing times for such chains with general target distributions is extremely difficult. The techniques in this book are more directly applicable to chains with (roughly) uniform stationary distribution. The next example is intended to give the flavor of how techniques might be adapted to the continuous setting: we will work through the details of a coupling argument.
A random walk on the simplex.
Fix and consider the simplex . Consider the discrete-time Markov chain on with steps:
from state , pick 2 distinct coordinates uniformly at random, and replace the 2 entries by where is uniform on .
The stationary distribution is the uniform distribution on . We will show that the mixing time satisfies
(13.13) |
The process is somewhat reminiscent of card shuffling by random transpositions (Chapter 7 Example 18), so by analogy with that example we expect that in fact . What we show here is that the coupling analysis of that example (Chapter 4-3 section 1.7) extends fairly easily to the present example.
As a preliminary, let us specify two distinct couplings of the uniform and the uniform distributions. In the scaling coupling we take for with uniform distribution. In the greedy coupling we make have its maximal value, which is , and we say the coupling works if .
Fix . We now specify a coupling of the chains started with and with having the uniform distribution. (This is an atypical couplig argument, in that it matters that one version is the stationary version).
From state , choose the same random pair for each process, and link the new values and (which are uniform on different intervals) via the scaling coupling for the first steps, then via the greedy coupling for the next steps.
We shall show that, for any fixed constant ,
(13.14) |
establishing (13.13).
Consider the effect on distance of a step of the scaling coupling using coordinates . The change is
Thus
()
because . So
Because , it follows that after steps using the scaling coupling,
So by taking , after steps we have
(13.15) |
Now consider the greedy coupling. If a step works, the distance cannot increase. The chance that a step from involving coordinates works is
So unconditionally
(13.16) |
Now the uniform distribution on the simplex has the property (use [133] Exercise 2.6.10 and the fact that the uniform distribution on the simplex is the joint distribution of spacings between uniform variables and the endpoint )
Since is the stationary chain and ,
and since this bound is . In other words
Combining this with (13.15,13.16) and the non-increase of distance, we deduce
(13.17) |
Now consider the number of unmatched coordinates at time , that is, the number of with . Provided the greedy coupling works, this number cannot increase, and decreases by at least each time two unmatched coordinates are chosen. So we can compare with the chain with and
As in the analysis of the shuffling example, the time has . When the number goes strictly below it must become , and so
Parallel to random flights on finite groups one can discuss discrete-time random flights on classical (continuous) compact groups such as the orthogonal group of real orthogonal matrices. For instance, specify a reflection to be an automorphism which fixes the points in some hyperplane, so that a reflection matrix can be written as
where is the identity matrix and is a unit-length vector in . Assigning to the Haar measure on the -sphere creates a uniform random reflection, and a sequence of uniform random reflections define a random flight on . Porod [285] shows that the variation threshold satisfies
and that the cut-off phenomenon occurs. The result, and its proof via group representation theory, are reminiscent of card-shuffling via random transpositions (Chapter 7 Example 18).
Constructions and properties of analogs of Brownian motion taking values in fractal subsets of have been studied in great detail over the last 15 years. Since these processes are most easily viewed as limits of random walks on graphs, we shall say a little about the simplest example. The figure illustrates the first two stages of the construction of the well-known Sierpinski gasket.
In the topology setting one may regard as a closed subset of , that is as a set of line-segments, and then the closure of is the Sierpinski gasket (this is equivalent to the usual construction by “cutting out middle triangles”). In the graph setting, regard as a graph and write for discrete-time random walk on started at point . Let be the number of steps of until first hitting point or . Using symmetry properties of the graphs, there is a simple relationship between the distributions of and . For the walk on , the length of the time segment until first hitting or is distributed as ; successive segments (periods until next hitting one of other than the current one) are like successive steps of the walk on , so the number of segments is distributed as . Using the same argument for general gives
is distributed as the ’th generation size in a Galton-Watson branching process with individual in generation and offspring distributed as .
It is easy to calculate ; indeed the distribution of is determined by its generating function, which can be calculated to be . So . This suggests the existence of a limit process on after rescaling time, that is a limit
In fact we can be more constructive. Branching process theory ([133] Example 4.4.1) shows that where and where has the self-consistency property
(13.18) |
where are independent, and . Now in the topological setting, the vertices of are a subset of . Let be the process on whose sequence of jumps is as the jumps of the discrete-time walk but where the times between jumps are independent with distribution . Using (13.18) we can construct the processes jointly for all such that the process , watched only at the times of hitting (successively distinct) points of , is exactly the process . These coupled processes specify a process on at a random subset of times . It can be shown that this random subset is dense and that sample paths extend continuously to all , and it is natural to call Brownian motion on the Sierpinski gasket.