# 13.1 Continuous state space

We have said several times that the theory in this book is fundamentally a theory of inequalities. “Universal” or “a priori” inequalities for reversible chains on finite state space, such as those in Chapter 4, should extend unchanged to the continuous space setting. Giving proofs of this, or giving the rigorous setup for continuous-space chains, is outside the scope of our intermediate-level treatment. Instead we just mention a few specific processes which parallel or give insight into topics treated earlier.

## 13.1.1 One-dimensional Brownian motion and variants

Let $(B_{t},0\leq t<\infty)$ be one-dimensional standard Brownian motion (BM). Mentally picture a particle moving along an erratic continuous random trajectory. Briefly, for $s the increment $B_{t}-B_{s}$ has Normal$(0,t-s)$ distribution, and for non-overlapping intervals $(s_{i},t_{i})$ the increments $B_{t_{i}}-B_{s_{i}}$ are independent. See Norris [270] section 4.4, Karlin and Taylor [208] Chapter 7, or Durrett [133] Chapter 7 for successively more detailed introductions. One can do explicit calculations, directly in the continuous setting, of distributions of many random quantities associated with BM. A particular calculation we need ([133] equation 7.8.12) is

 $G(t):=P\left(\sup_{0\leq s\leq t}|B_{s}|<1\right)=\frac{4}{\pi}\sum_{m=0}^{% \infty}\frac{(-1)^{m}}{2m+1}\exp(-(2m+1)^{2}\pi^{2}t/8)$ (13.1)

where

 $G^{-1}(1/e)=1.006.$ (13.2)

One can also regard BM as a limit of rescaled random walk, a result which generalizes the classical central limit theorem. If $(X_{m},m=0,1,2,\ldots)$ is simple symmetric random walk on $Z$, then the central limit theorem implies $m^{-1/2}X_{m}\ \stackrel{d}{\rightarrow}\ B_{1}$ and the generalized result is

 $(m^{-1/2}X_{\lfloor mt\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ (B_% {t},0\leq t<\infty)$ (13.3)

where the convergence here is weak convergence of processes (see e.g. Ethier and Kurtz [141] for detailed treatment). For more general random flights on $Z$, that is $X_{m}=\sum_{j=1}^{m}\xi_{j}$ with $\xi_{1},\xi_{2},\ldots$ independent and $E\xi=0$ and ${\rm var}\ \xi=\sigma^{2}<\infty$, we have Donsker’s theorem ([133] Theorem 7.6.6)

 $(m^{-1/2}X_{\lfloor mt\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ (% \sigma B_{t},0\leq t<\infty).$ (13.4)

Many asymptotic results for random walk on the integers or on the $n$-cycle or on the $n$-path, and their $d$-dimensional counterparts, can be explained in terms of Brownian motion or its variants. The variants of interest to us take values in compact sets and have uniform stationary distributions.

Brownian motion on the circle can be defined by

 $B^{\circ}_{t}:=B_{t}\bmod 1$

and then random walk $(X^{(n)}_{m},m=0,1,2,\ldots)$ on the $n$-cycle $\{0,1,2,\ldots,n-1\}$ satisfies, by (13.3),

 $n^{-1}(X^{(n)}_{\lfloor n^{2}t\rfloor},0\leq t<\infty)\ \stackrel{d}{% \rightarrow}\ (B^{\circ}_{t},0\leq t<\infty)\mbox{ as }n\to\infty.$ (13.5)

The process $B^{\circ}$ has eigenvalues $\{2\pi^{2}j^{2},\quad 0\leq j<\infty\}$ with eigenfunction $\equiv 1$ for $j=0$ and two eigenfunctions $\cos(2\pi jx)$ and $\sin(2\pi jx)$ for $j\geq 1$. In particular the relaxation time is

 $\tau_{2}={\textstyle\frac{1}{2\pi^{2}}}.$

The result for random walk on the $n$-cycle (Chapter 5 Example 7)

 $\tau_{2}\sim{\textstyle\frac{n^{2}}{2\pi^{2}}}\mbox{ as }n\to\infty$

can therefore be viewed as a consequence of the $n^{2}$ time-rescaling in (13.5) which takes random walk on the $n$-cycle to Brownian motion on the circle. This argument is a prototype for the weak convergence paradigm: proving size-asymptotic results for discrete structures in terms of some limiting continuous structure.

Variation distance can be studied via coupling. Construct two Brownian motions on $R$ started from $0$ and $x>0$ as follows. Let $B^{(1)}$ be standard Brownian motion, and let

 $T_{x/2}:=\inf\{t:B^{(1)}_{t}=x/2\}.$

Then $T_{x/2}<\infty$ a.s. and we can define $B^{(2)}$ by

 $\displaystyle B^{(2)}_{t}$ $\displaystyle=$ $\displaystyle x-B^{(1)}_{t},\quad 0\leq t\leq T_{x/2}$ $\displaystyle=$ $\displaystyle B^{(1)}_{t},\quad T_{x/2}\leq t<\infty.$

That is, the segment of $B^{(2)}$ over $0\leq t\leq T_{x/2}$ is the image of the corresponding segment of $B^{(1)}$ under the reflection which takes $0$ to $x$. It is easy to see that $B^{(2)}$ is indeed Brownian motion started at $x$. This is the reflection coupling for Brownian motion. We shall study analogous couplings for variant processes. Given Brownian motion on the circle $B^{\circ 1}$ started at $0$, we can construct another Brownian motion on the circle $B^{\circ 2}$ started at $0 via

 $\displaystyle B^{\circ 2}_{t}$ $\displaystyle=$ $\displaystyle x-B^{\circ 1}_{t}\bmod 1,\quad 0\leq t\leq T_{\{\frac{x}{2},% \frac{x}{2}+\frac{1}{2}\}}$ $\displaystyle=$ $\displaystyle B^{\circ 1}_{t},\quad T_{\{\frac{x}{2},\frac{x}{2}+\frac{1}{2}\}% }\leq t<\infty$

where

 $T_{\{\frac{x}{2},\frac{x}{2}+\frac{1}{2}\}}:=\inf\{t:B^{\circ 1}_{t}={% \textstyle\frac{x}{2}}\mbox{ or }{\textstyle\frac{x}{2}}+{\textstyle\frac{1}{2% }}\}.$

Again, the segment of $B^{\circ 2}$ over $0\leq t\leq T_{\{\frac{x}{2},\frac{x}{2}+\frac{1}{2}\}}$ is the image of the corresponding segment of $B^{\circ 1}$ under the reflection of the circle which takes $0$ to $x$, so we call it the reflection coupling for Brownian motion on the circle. Because sample paths cannot cross without meeting, it is easy to see that the general coupling inequality (Chapter 4-3 section 1.1) becomes an equality:

 $||P_{0}(B^{\circ}_{t}\in\cdot)-P_{x}(B^{\circ}_{t}\in\cdot)||=P(T_{\{\frac{x}{% 2},\frac{x}{2}+\frac{1}{2}\}}>t).$

The worst starting point is $x=1/2$, and the hitting time in question can be written as the hitting time $T_{\{-1/4,1/4\}}$ for standard Brownian motion, so

 $\bar{d}(t)=P(T_{\{-1/4,1/4\}}>t)=G(16t)$ (13.6)

by Brownian scaling, that is the property

 $(B_{c^{2}t},0\leq t<\infty)\ \stackrel{d}{=}\ (cB(t),0\leq t<\infty).$ (13.7)

See the Notes for an alternative formula. Thus for Brownian motion on the circle

 $\tau_{1}={\textstyle\frac{1}{16}}G^{-1}(1/e)=0.063.$ (13.8)

If simple random walk is replaced by aperiodic random flight with step variance $\sigma^{2}$ then the asymptotic values of $\tau_{2}$ and $\tau_{1}$ are replaced by $\tau_{2}/\sigma^{2}$ and $\tau_{1}/\sigma^{2}$; this may be deduced using the local central limit theorem ([133] Theorem 2.5.2).

Reflecting Brownian motion $\bar{B}$ on the interval $[0,1]$ is very similar. Intuitively, imagine that upon hitting an endpoint $0$ or $1$ the particle is instantaneously inserted an infinitesimal distance into the interval. Formally one can construct $\bar{B}_{t}$ as $\bar{B}_{t}:=\phi(B_{t})$ for the concertina map

 $\phi(2j+x)=x,\quad\phi(2j+1+x)=1-x;\quad 0\leq x\leq 1,\ j=\ldots-2,-1,0,1,2,\ldots.$

The process $\bar{B}$ has eigenvalues $\{\pi^{2}j^{2}/2,\quad 0\leq j<\infty\}$ with eigenfunctions $\cos(\pi jx)$. In particular the relaxation time is

 $\tau_{2}={\textstyle\frac{2}{\pi^{2}}}.$

The result for random walk on the $n$-path (Chapter 5 Example 8)

 $\tau_{2}\sim{\textstyle\frac{2n^{2}}{\pi^{2}}}\mbox{ as }n\to\infty$

is another instance of the weak convergence paradigm, a consequence of the $n^{2}$ time-rescaling which takes random walk on the $n$-path to reflecting Brownian motion on the interval. The variation distance function $\bar{d}(t)$ for $\bar{B}$ can be expressed in terms of the corresponding quantity (write as $d^{\circ}(t)$) for $B^{\circ}$. Briefly, it is easy to check

 $(\bar{B}_{t},0\leq t<\infty)\ \stackrel{d}{=}\ (2\min(B^{\circ}_{t/4},1-B^{% \circ}_{t/4}),0\leq t<\infty)$

and then to deduce $\bar{d}(t)=d^{\circ}(t/4)$. Then using (13.8)

 $\tau_{1}={\textstyle\frac{1}{4}}G^{-1}(1/e)=0.252.$ (13.9)

## 13.1.2 $d$-dimensional Brownian motion

Standard $d$-dimensional Brownian motion can be written as

 ${\bf B}_{t}=(B^{(1)}_{t},\ldots,B^{(d)}_{t})$

where the component processes $(B^{(i)}_{t},i=1,\ldots,d)$ are independent one-dimensional standard Brownian motions. A useful property of ${\bf B}$ is isotropy: its distribution is invariant under rotations of $R^{d}$. In approximating simple random walk $(X_{m},m=0,1,2,\ldots)$ on $Z^{d}$ one needs to be a little careful with scaling constants. The analog of (13.3) is

 $(m^{-1/2}X_{\lfloor mt\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ (d^% {-1/2}{\bf B}_{t},0\leq t<\infty)$ (13.10)

where the factor $d^{-1/2}$ arises because the components of the random walk have variance $1/d$ — see (13.4). Analogous to (13.5), random walk $(X^{(n)}_{m},m=0,1,2,\ldots)$ on the discrete torus $Z^{d}_{n}$ converges to Brownian motion ${\bf B}^{\circ}$ on the continuous torus $[0,1)^{d}$:

 $n^{-1}(X^{(n)}_{\lfloor n^{2}t\rfloor},0\leq t<\infty)\ \stackrel{d}{% \rightarrow}\ (d^{-1/2}{\bf B}^{\circ}_{t},0\leq t<\infty)\mbox{ as }n\to\infty.$ (13.11)

## 13.1.3 Brownian motion in a convex set

Fix a convex polyhedron $K\subset R^{d}$. One can define reflecting Brownian motion in $K$; heuristically, when the particle hits a face it is replaced an infinitesimal distance inside $K$, orthogonal to the face. As in the previous examples, the stationary distribution is uniform on $K$. We will outline a proof of

###### Proposition 13.1

For Brownian motion ${\bf B}$ in a convex polyhedron $K$ which is a subset of the ball of radius $r$,

(i) $\tau_{1}\leq G^{-1}(1/e)\ r^{2}$

(ii) $\tau_{2}\leq 8\pi^{-2}r^{2}$.

Proof. By the $d$-dimensional version of Brownian scaling (13.7) we can reduce to the case $r=1$. The essential fact is

###### Lemma 13.2

Let $\bar{B}_{t}$ be reflecting Brownian motion on $[0,1]$ started at $1$, and let $T_{0}$ be its hitting time on $0$. Versions ${\bf B}^{(1)},{\bf B}^{(2)}$ of Brownian motion in $K$ started from arbitrary points of $K$ can be constructed jointly with $\bar{B}$ such that

 $|{\bf B}^{(1)}_{t}-{\bf B}^{(2)}_{t}|\leq 2\bar{B}_{\min(t,T_{0})},\quad 0\leq t% <\infty.$ (13.12)

Granted this fact, $\bar{d}(t)$ for Brownian motion on $K$ satisfies

 $\bar{d}(t)=\max_{{\rm startingpoints}}P({\bf B}^{(1)}_{t}\neq{\bf B}^{(2)}_{t}% )\leq P(T_{0}>t)=G(t)$

where the final equality holds because $T_{0}$ has the same distribution as the time for Brownian motion stared at $1$ to exit the interval $(0,2)$. This establishes (i) for $r=1$. Then from the $t\to\infty$ asymptotics of $G(t)$ in (13.1) we have $\bar{d}(t)=O(\exp(-\pi^{2}t/8))$, implying $\tau_{2}\leq 8/\pi^{2}$ by Lemma LABEL:L-Chap4 and establishing (ii).

Sketch proof of Lemma. Details require familiarity with stochastic calculus, but this outline provides the idea. For two Brownian motions in $R^{d}$ started from $(0,0,\ldots,0)$ and from $(x,0,\ldots,0)$, one can define the reflection coupling by making the first coordinates evolve as the one-dimensional reflection coupling, and making the other coordinate processes be identical in the two motions. Use isotropy to entend the definition of reflection coupling to arbitrary starting points. Note that the distance between the processes evolves as $2$ times one-dimensional Brownian motion, until they meet. The desired joint distribution of $(({\bf B}^{(1)}_{t},{\bf B}^{(2)}_{t}),\quad 0\leq t<\infty)$ is obtained by specifying that while both processes are in the interior of $K$, they evolve as the reflection coupling (and each process reflects orthogonally at faces). As the figure illustrates, the effect of reflection can only be to decrease distance between the two Brownian particles.

For a motion hitting the boundary at $a$, if the unreflected process is at $b$ or $c$ an infinitesimal time later then the reflected process is at $b^{\prime}$ or $c^{\prime}$. By convexity, for any $x\in K$ we have $|b^{\prime}-x|\leq|b-x|$; so reflection can only decrease distance between coupled particles. To argue the inequality carefully, let $\alpha$ be the vector normal to the face. The projection ${\cal P}_{\alpha}$ satisfies $|{\cal P}_{\alpha}(b^{\prime}-x)|\leq|{\cal P}_{\alpha}(b-x)|$. Further, $b-b^{\prime}\perp\alpha$, implying $P_{\alpha^{\perp}}(b^{\prime}-x)=P_{\alpha^{\perp}}(b-x)$. Therefore by Pythagoras $|b^{\prime}-x|\leq|b-x|$.

We can therefore write, in stochastic calculus notation,

 $d|{\bf B}^{(1)}_{t}-{\bf B}^{(2)}_{t}|=d(2B_{t})-dA_{t}$

where $B_{t}$ is a one-dimensional Brownian motion and $A_{t}$ is an increasing process (representing the contribution from reflections off faces) which increases only when one process is at a face. But we can construct reflecting Brownian motion $\bar{B}$ in terms of the same underlying $B_{t}$ by

 $d(2\bar{B}_{t})=d(2B_{t})-dC_{t}$

where $C_{t}$ (representing the contribution from reflections off the endpoint $1$) is increasing until $T_{0}$. At time $0$ we have (because $r=1$)

 $|{\bf B}^{(1)}_{0}-{\bf B}^{(2)}_{0}|\leq 2=2\bar{B}_{0}.$

We have shown

 $d(|{\bf B}^{(1)}_{t}-{\bf B}^{(2)}_{t}|-2\bar{B}_{t})=-dA_{t}+dC_{t}.$

If the desired inequality (13.12 fails then it fails at some first time $t$, which can only be a time when $dC_{t}$ is increasing, that is when $\bar{B}_{t}=1$, at which times the inequality holds a priori. $\ \ \rule{4.3pt}{4.3pt}$.

Proposition 13.1 suggests an approach to the algorithmic question of simulating a uniform random point in a convex set $K\subset R^{d}$ where $d$ is large, discussed in Chapter 9 section 5.1. If we could simulate the discrete-time chain defined as reflecting Brownian motion ${\bf B}$ on $K$ examined at time intervals of $h^{2}/d$ for some small $h$ (so that the length of a typical step is of order $\sqrt{(h^{2}/d)\times d}=h$), then Proposition 13.1 implies that $O(d/h^{2})$ steps are enough to approach the stationary distribution. Since the convex set is available only via an oracle, one can attempt to do the simulation via acceptance/rejection. That is, from ${\bf x}$ we propose a move to ${\bf x}^{\prime}={\bf x}+\sqrt{h^{2}/d}\ Z$ where $Z$ has standard $d$-variate Normal distribution, and accept the move iff ${\bf x}^{\prime}\in K$. While this leads to a plausible heuristic argument, the rigorous difficulty is that it is not clear how close an acceptance/rejection step is to the true step of reflecting Brownian motion. No rigorous argument based directly on Brownian motion has yet been found, though the work of Bubley et al [78] on coupling of random walks has elements in common with reflection coupling.

## 13.1.4 Discrete-time chains: an example on the simplex

Discrete-time, continuous-space chains arise in many settings, in particular (Chapter MCMC) in Markov Chain Monte Carlo sampling from a target distribution on $R^{d}$. As discussed in that chapter, estimating mixing times for such chains with general target distributions is extremely difficult. The techniques in this book are more directly applicable to chains with (roughly) uniform stationary distribution. The next example is intended to give the flavor of how techniques might be adapted to the continuous setting: we will work through the details of a coupling argument.

###### Example 13.3

A random walk on the simplex.

Fix $d$ and consider the simplex $\Delta=\{{\bf x}=(x_{1},\ldots,x_{d}\}:x_{i}\geq 0,\sum_{i}x_{i}=1\}$. Consider the discrete-time Markov chain $({\bf X}(t),t=0,1,2,\ldots)$ on $\Delta$ with steps:

from state ${\bf x}$, pick 2 distinct coordinates $\{i,j\}$ uniformly at random, and replace the 2 entries $\{x_{i},x_{j}\}$ by $\{U,x_{i}+x_{j}-U\}$ where $U$ is uniform on $(0,x_{i}+x_{j})$.

The stationary distribution $\pi$ is the uniform distribution on $\Delta$. We will show that the mixing time $\tau_{1}$ satisfies

 $\tau_{1}=O(d^{2}\log d)\mbox{ as }d\rightarrow\infty.$ (13.13)

The process is somewhat reminiscent of card shuffling by random transpositions (Chapter 7 Example 18), so by analogy with that example we expect that in fact $\tau_{1}=\Theta(d\log d)$. What we show here is that the coupling analysis of that example (Chapter 4-3 section 1.7) extends fairly easily to the present example.

As a preliminary, let us specify two distinct couplings $(A,B)$ of the uniform$(0,a)$ and the uniform$(0,b)$ distributions. In the scaling coupling we take $(A,B)=(aU,bU)$ for $U$ with uniform$(0,1)$ distribution. In the greedy coupling we make $P(A=B)$ have its maximal value, which is $\min(a,b)/\max(a,b)$, and we say the coupling works if $A=B$.

Fix ${\bf x}(0)\in\Delta$. We now specify a coupling $({\bf X}(t),{\bf Y}(t))$ of the chains started with ${\bf X}(0)={\bf x}(0)$ and with ${\bf Y}(0)$ having the uniform distribution. (This is an atypical couplig argument, in that it matters that one version is the stationary version).

From state $({\bf x},{\bf y})$, choose the same random pair $\{i,j\}$ for each process, and link the new values $x_{i}^{\prime}$ and $y_{i}^{\prime}$ (which are uniform on different intervals) via the scaling coupling for the first $t_{1}=3d^{2}\log d$ steps, then via the greedy coupling for the next $t_{2}=Cd^{2}$ steps.

We shall show that, for any fixed constant $C>0$,

 $P({\bf X}(t_{1}+t_{2})={\bf Y}(t_{1}+t_{2}))\geq 1-C^{-1}-o(1)\mbox{ as }d\to\infty$ (13.14)

establishing (13.13).

Consider the effect on $l_{1}$ distance $||{\bf x}-{\bf y}||:=\sum_{i}|x_{i}-y_{i}|$ of a step of the scaling coupling using coordinates $\{i,j\}$. The change is

 $|U(x_{i}+x_{j})-U(y_{i}+y_{j})|+|(1-U)(x_{i}+x_{j})-(1-U)(y_{i}+y_{j})|-|x_{i}% -y_{i}|-|x_{j}-y_{j}|$
 $=|(x_{i}+x_{j})-(y_{i}+y_{j})|-|x_{i}-y_{i}|-|x_{j}-y_{j}|\quad\quad\quad$
 $=\left\{\begin{array}[]{ll}0&\mbox{ if }{\rm sgn}\ (x_{i}-y_{i})={\rm sgn}\ (x% _{j}-y_{j})\\ -2\min(|x_{i}-y_{i}|,|x_{j}-y_{j}|)&\mbox{ if not }.\end{array}\right.$

Thus

 $E_{({\bf x},{\bf y})}\left(||{\bf X}(1)-{\bf Y}(1)||-||{\bf x}-{\bf y}||\right)$
 $=\frac{-2}{d(d-1)}\sum_{i}\sum_{j\neq i}\min(|x_{i}-y_{i}|,|x_{j}-y_{j}|)1_{({% \rm sgn}\ (x_{i}-y_{i})\neq{\rm sgn}\ (x_{j}-y_{j}))}$
 $=\frac{-4}{d(d-1)}\sum_{i\in A}\sum_{j\in B}\min(c_{i},d_{j})$

($\mbox{where }c_{i}:=x_{i}-y_{i}\mbox{ on }A:=\{i:x_{i}>y_{i}\};\ d_{j}:=y_{j}-% x_{j}\mbox{ on }B:=\{j:y_{j}>x_{j}\}$)

 $\displaystyle=$ $\displaystyle\frac{-4}{d(d-1)}\sum_{i\in A}\sum_{j\in B}\frac{c_{i}d_{j}}{\max% (c_{i},d_{j})}$ $\displaystyle\leq$ $\displaystyle\frac{-4}{d(d-1)}\sum_{i\in A}\sum_{j\in B}\frac{c_{i}d_{j}}{||{% \bf x}-{\bf y}||/2}$ $\displaystyle=$ $\displaystyle\frac{-2}{d(d-1)}||{\bf x}-{\bf y}||$

because $\sum_{i\in A}c_{i}=\sum_{j\in B}d_{j}=||{\bf x}-{\bf y}||/2$. So

 $E_{({\bf x},{\bf y})}||{\bf X}(1)-{\bf Y}(1)||\leq\left(1-\frac{2}{d(d-1)}% \right)||{\bf x}-{\bf y}||.$

Because $||{\bf X}(0)-{\bf Y}(0)||\leq 2$, it follows that after $t$ steps using the scaling coupling,

 $E||{\bf X}(t)-{\bf Y}(t)||\leq 2\left(1-\frac{2}{d(d-1)}\right)^{t}.$

So by taking $t_{1}\sim 3d^{2}\log d$, after $t_{1}$ steps we have

 $P(||{\bf X}(t_{1})-{\bf Y}(t_{1})||\leq d^{-5})=1-o(1).$ (13.15)

Now consider the greedy coupling. If a step works, the $l_{1}$ distance $||{\bf X}(t)-{\bf Y}(t)||$ cannot increase. The chance that a step from $({\bf x},{\bf y})$ involving coordinates $\{i,j\}$ works is

 $\displaystyle\frac{\min(x_{i}+x_{j},y_{i}+y_{j})}{\max(x_{i}+x_{j},y_{i}+y_{j})}$ $\displaystyle\geq$ $\displaystyle\frac{y_{i}+y_{j}-||{\bf x-\bf y}||}{\max(x_{i}+x_{j},y_{i}+y_{j})}$ $\displaystyle\geq$ $\displaystyle\frac{y_{i}+y_{j}-||{\bf x-\bf y}||}{y_{i}+y_{j}+||{\bf x-\bf y}||}$ $\displaystyle\geq$ $\displaystyle\frac{y_{i}+y_{j}-2||{\bf x-\bf y}||}{y_{i}+y_{j}}$ $\displaystyle\geq$ $\displaystyle 1-\frac{||{\bf x-\bf y}||}{\min(y_{i},y_{j})}.$

So unconditionally

 $P_{({\bf x},{\bf y})}(\mbox{greedy coupling works on first step})\geq 1-\frac{% ||{\bf x-\bf y}||}{\min_{k}y_{k}}.$ (13.16)

Now the uniform distribution $(Y^{(d)}_{1},\ldots,Y^{(d)}_{d})$ on the simplex has the property (use [133] Exercise 2.6.10 and the fact that the uniform distribution on the simplex is the joint distribution of spacings between $d-1$ uniform$(0,1)$ variables and the endpoint $1$)

 $a_{d}>0$

Since $({\bf Y}(t))$ is the stationary chain and $Y^{(d)}_{i}\ \stackrel{d}{=}\ Y^{(d)}_{1}$,

 $P(\min_{1\leq k\leq d}Y_{k}(t)\leq d^{-4.5}\mbox{ for some }t_{1}

and since $t_{2}=O(d^{2})$ this bound is $o(1)$. In other words

 $P(\min_{1\leq k\leq d}Y_{k}(t)\geq d^{-4.5}\mbox{ for all }t_{1}

Combining this with (13.15,13.16) and the non-increase of $l_{1}$ distance, we deduce

 $P(\mbox{greedy coupling works for all}t_{1} (13.17)

Now consider the number $M(t)$ of unmatched coordinates $i$ at time $t\geq t_{1}$, that is, the number of $i$ with $X_{i}(t)\neq Y_{i}(t)$. Provided the greedy coupling works, this number $M(t)$ cannot increase, and decreases by at least $1$ each time two unmatched coordinates are chosen. So we can compare $(M(t_{1}+t),t\geq 0)$ with the chain $(N(t),t\geq 0)$ with $N(0)=d$ and

 $P(N(t+1)=m-1|N(t)=m)=\frac{m(m-1)}{d(d-1)}=1-P(N(t+1)=m|N(t)=m).$

As in the analysis of the shuffling example, the time $T=\min\{t:N(t)=1\}$ has $ET=\sum_{m=2}^{d}\frac{d(d-1)}{m(m-1)}\leq d^{2}$. When the number $M(t)$ goes strictly below $2$ it must become $0$, and so

 $\displaystyle P(\mbox{greedy coupling works for all }t_{1} $\displaystyle=$ $\displaystyle P(\mbox{greedy coupling works for all }t_{1}1)$ $\displaystyle\leq$ $\displaystyle P(T>t_{2})\leq 1/C_{2}.$

This and (13.17) establish (13.14).

## 13.1.5 Compact groups

Parallel to random flights on finite groups one can discuss discrete-time random flights on classical (continuous) compact groups such as the orthogonal group $O(d)$ of $d\times d$ real orthogonal matrices. For instance, specify a reflection to be an automorphism which fixes the points in some hyperplane, so that a reflection matrix can be written as

 $A=I-2xx^{T}$

where $I$ is the $d\times d$ identity matrix and $x$ is a unit-length vector in $R^{d}$. Assigning to $x$ the Haar measure on the $(d-1)$-sphere creates a uniform random reflection, and a sequence of uniform random reflections define a random flight on $O(d)$. Porod [285] shows that the variation threshold satisfies

 $\tau_{1}\sim{\textstyle\frac{1}{2}}d\log d$

and that the cut-off phenomenon occurs. The result, and its proof via group representation theory, are reminiscent of card-shuffling via random transpositions (Chapter 7 Example 18).

## 13.1.6 Brownian motion on a fractal set

Constructions and properties of analogs of Brownian motion taking values in fractal subsets of $R^{d}$ have been studied in great detail over the last 15 years. Since these processes are most easily viewed as limits of random walks on graphs, we shall say a little about the simplest example. The figure illustrates the first two stages of the construction of the well-known Sierpinski gasket.

In the topology setting one may regard $G_{d}$ as a closed subset of $R^{2}$, that is as a set of line-segments, and then the closure of $\cup_{d=1}^{\infty}G_{d}$ is the Sierpinski gasket $G$ (this is equivalent to the usual construction by “cutting out middle triangles”). In the graph setting, regard $G_{d}$ as a graph and write $(X^{(d)}_{t},t=0,1,2,\ldots)$ for discrete-time random walk on $G_{d}$ started at point $0$. Let $M_{d}$ be the number of steps of $X^{(d)}$ until first hitting point $a_{1}$ or $a_{2}$. Using symmetry properties of the graphs, there is a simple relationship between the distributions of $M_{1}$ and $M_{2}$. For the walk on $G_{2}$, the length of the time segment until first hitting $b_{1}$ or $b_{2}$ is distributed as $M_{1}$; successive segments (periods until next hitting one of $\{0,a_{1},a_{2},b_{1},b_{2},b_{3}\}$ other than the current one) are like successive steps of the walk on $G_{1}$, so the number of segments is distributed as $M_{1}$. Using the same argument for general $d$ gives

$M_{d}$ is distributed as the $d$’th generation size in a Galton-Watson branching process with $1$ individual in generation $0$ and offspring distributed as $M_{1}$.

It is easy to calculate $EM_{1}=5$; indeed the distribution of $M_{1}$ is determined by its generating function, which can be calculated to be $Ez^{M_{1}}=z^{2}/(4-3z)$. So $EM_{d}=5^{d}$. This suggests the existence of a limit process on $G$ after rescaling time, that is a limit

 $(X^{(d)}_{\lfloor 5^{-d}t\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ % (X^{(\infty)}_{t},0\leq t<\infty).$

In fact we can be more constructive. Branching process theory ([133] Example 4.4.1) shows that $M_{d}/5^{d}\ \stackrel{d}{\rightarrow}\ W$ where $EW=1$ and where $W$ has the self-consistency property

 $\sum_{i=1}^{M}W_{i}\ \stackrel{d}{=}\ 5W$ (13.18)

where $(M;W_{1},W_{2},\ldots)$ are independent, $M\ \stackrel{d}{=}\ M_{1}$ and $W_{i}\ \stackrel{d}{=}\ W$. Now in the topological setting, the vertices of $G_{d}$ are a subset of $G$. Let $\tilde{X}^{(d)}_{t}$ be the process on $G_{d}\subset G$ whose sequence of jumps is as the jumps of the discrete-time walk $X^{(d)}$ but where the times between jumps are independent with distribution $5^{-d}W$. Using (13.18) we can construct the processes $\tilde{X}^{(d)}$ jointly for all $d$ such that the process $\tilde{X}^{(d)}$, watched only at the times of hitting (successively distinct) points of $G_{d-1}$, is exactly the process $\tilde{X}^{(d-1)}$. These coupled processes specify a process $X^{(\infty)}_{t}$ on $G$ at a random subset of times $t$. It can be shown that this random subset is dense and that sample paths extend continuously to all $t$, and it is natural to call $X^{(\infty)}$ Brownian motion on the Sierpinski gasket.