13 Continuous State, Infinite State and Random Environment (June 23, 2001)

13.1 Continuous state space

We have said several times that the theory in this book is fundamentally a theory of inequalities. “Universal” or “a priori” inequalities for reversible chains on finite state space, such as those in Chapter 4, should extend unchanged to the continuous space setting. Giving proofs of this, or giving the rigorous setup for continuous-space chains, is outside the scope of our intermediate-level treatment. Instead we just mention a few specific processes which parallel or give insight into topics treated earlier.

13.1.1 One-dimensional Brownian motion and variants

Let (Bt,0t<)(B_{t},0\leq t<\infty) be one-dimensional standard Brownian motion (BM). Mentally picture a particle moving along an erratic continuous random trajectory. Briefly, for s<ts<t the increment Bt-BsB_{t}-B_{s} has Normal(0,t-s)(0,t-s) distribution, and for non-overlapping intervals (si,ti)(s_{i},t_{i}) the increments Bti-BsiB_{t_{i}}-B_{s_{i}} are independent. See Norris [270] section 4.4, Karlin and Taylor [208] Chapter 7, or Durrett [133] Chapter 7 for successively more detailed introductions. One can do explicit calculations, directly in the continuous setting, of distributions of many random quantities associated with BM. A particular calculation we need ([133] equation 7.8.12) is

G(t):=P(sup0st|Bs|<1)=4πm=0(-1)m2m+1exp(-(2m+1)2π2t/8)G(t):=P\left(\sup_{0\leq s\leq t}|B_{s}|<1\right)=\frac{4}{\pi}\sum_{m=0}^{% \infty}\frac{(-1)^{m}}{2m+1}\exp(-(2m+1)^{2}\pi^{2}t/8) (13.1)

where

G-1(1/e)=1.006.G^{-1}(1/e)=1.006. (13.2)

One can also regard BM as a limit of rescaled random walk, a result which generalizes the classical central limit theorem. If (Xm,m=0,1,2,)(X_{m},m=0,1,2,\ldots) is simple symmetric random walk on ZZ, then the central limit theorem implies m-1/2XmdB1m^{-1/2}X_{m}\ \stackrel{d}{\rightarrow}\ B_{1} and the generalized result is

(m-1/2Xmt,0t<)d(Bt,0t<)(m^{-1/2}X_{\lfloor mt\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ (B_% {t},0\leq t<\infty) (13.3)

where the convergence here is weak convergence of processes (see e.g. Ethier and Kurtz [141] for detailed treatment). For more general random flights on ZZ, that is Xm=j=1mξjX_{m}=\sum_{j=1}^{m}\xi_{j} with ξ1,ξ2,\xi_{1},\xi_{2},\ldots independent and Eξ=0E\xi=0 and varξ=σ2<{\rm var}\ \xi=\sigma^{2}<\infty, we have Donsker’s theorem ([133] Theorem 7.6.6)

(m-1/2Xmt,0t<)d(σBt,0t<).(m^{-1/2}X_{\lfloor mt\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ (% \sigma B_{t},0\leq t<\infty). (13.4)

Many asymptotic results for random walk on the integers or on the nn-cycle or on the nn-path, and their dd-dimensional counterparts, can be explained in terms of Brownian motion or its variants. The variants of interest to us take values in compact sets and have uniform stationary distributions.

Brownian motion on the circle can be defined by

Bt:=Btmod1B^{\circ}_{t}:=B_{t}\bmod 1

and then random walk (Xm(n),m=0,1,2,)(X^{(n)}_{m},m=0,1,2,\ldots) on the nn-cycle {0,1,2,,n-1}\{0,1,2,\ldots,n-1\} satisfies, by (13.3),

n-1(Xn2t(n),0t<)d(Bt,0t<) as n.n^{-1}(X^{(n)}_{\lfloor n^{2}t\rfloor},0\leq t<\infty)\ \stackrel{d}{% \rightarrow}\ (B^{\circ}_{t},0\leq t<\infty)\mbox{ as }n\to\infty. (13.5)

The process BB^{\circ} has eigenvalues {2π2j2, 0j<}\{2\pi^{2}j^{2},\quad 0\leq j<\infty\} with eigenfunction 1\equiv 1 for j=0j=0 and two eigenfunctions cos(2πjx)\cos(2\pi jx) and sin(2πjx)\sin(2\pi jx) for j1j\geq 1. In particular the relaxation time is

τ2=12π2.\tau_{2}={\textstyle\frac{1}{2\pi^{2}}}.

The result for random walk on the nn-cycle (Chapter 5 Example 7)

τ2n22π2 as n\tau_{2}\sim{\textstyle\frac{n^{2}}{2\pi^{2}}}\mbox{ as }n\to\infty

can therefore be viewed as a consequence of the n2n^{2} time-rescaling in (13.5) which takes random walk on the nn-cycle to Brownian motion on the circle. This argument is a prototype for the weak convergence paradigm: proving size-asymptotic results for discrete structures in terms of some limiting continuous structure.

Variation distance can be studied via coupling. Construct two Brownian motions on RR started from 00 and x>0x>0 as follows. Let B(1)B^{(1)} be standard Brownian motion, and let

Tx/2:=inf{t:Bt(1)=x/2}.T_{x/2}:=\inf\{t:B^{(1)}_{t}=x/2\}.

Then Tx/2<T_{x/2}<\infty a.s. and we can define B(2)B^{(2)} by

Bt(2)\displaystyle B^{(2)}_{t} =\displaystyle= x-Bt(1), 0tTx/2\displaystyle x-B^{(1)}_{t},\quad 0\leq t\leq T_{x/2}
=\displaystyle= Bt(1), Tx/2t<.\displaystyle B^{(1)}_{t},\quad T_{x/2}\leq t<\infty.

That is, the segment of B(2)B^{(2)} over 0tTx/20\leq t\leq T_{x/2} is the image of the corresponding segment of B(1)B^{(1)} under the reflection which takes 00 to xx. It is easy to see that B(2)B^{(2)} is indeed Brownian motion started at xx. This is the reflection coupling for Brownian motion. We shall study analogous couplings for variant processes. Given Brownian motion on the circle B1B^{\circ 1} started at 00, we can construct another Brownian motion on the circle B2B^{\circ 2} started at 0<x1/20<x\leq 1/2 via

Bt2\displaystyle B^{\circ 2}_{t} =\displaystyle= x-Bt1mod1, 0tT{x2,x2+12}\displaystyle x-B^{\circ 1}_{t}\bmod 1,\quad 0\leq t\leq T_{\{\frac{x}{2},% \frac{x}{2}+\frac{1}{2}\}}
=\displaystyle= Bt1, T{x2,x2+12}t<\displaystyle B^{\circ 1}_{t},\quad T_{\{\frac{x}{2},\frac{x}{2}+\frac{1}{2}\}% }\leq t<\infty

where

T{x2,x2+12}:=inf{t:Bt1=x2 or x2+12}.T_{\{\frac{x}{2},\frac{x}{2}+\frac{1}{2}\}}:=\inf\{t:B^{\circ 1}_{t}={% \textstyle\frac{x}{2}}\mbox{ or }{\textstyle\frac{x}{2}}+{\textstyle\frac{1}{2% }}\}.

Again, the segment of B2B^{\circ 2} over 0tT{x2,x2+12}0\leq t\leq T_{\{\frac{x}{2},\frac{x}{2}+\frac{1}{2}\}} is the image of the corresponding segment of B1B^{\circ 1} under the reflection of the circle which takes 00 to xx, so we call it the reflection coupling for Brownian motion on the circle. Because sample paths cannot cross without meeting, it is easy to see that the general coupling inequality (Chapter 4-3 section 1.1) becomes an equality:

||P0(Bt)-Px(Bt)||=P(T{x2,x2+12}>t).||P_{0}(B^{\circ}_{t}\in\cdot)-P_{x}(B^{\circ}_{t}\in\cdot)||=P(T_{\{\frac{x}{% 2},\frac{x}{2}+\frac{1}{2}\}}>t).

The worst starting point is x=1/2x=1/2, and the hitting time in question can be written as the hitting time T{-1/4,1/4}T_{\{-1/4,1/4\}} for standard Brownian motion, so

d¯(t)=P(T{-1/4,1/4}>t)=G(16t)\bar{d}(t)=P(T_{\{-1/4,1/4\}}>t)=G(16t) (13.6)

by Brownian scaling, that is the property

(Bc2t,0t<)=d(cB(t),0t<).(B_{c^{2}t},0\leq t<\infty)\ \stackrel{d}{=}\ (cB(t),0\leq t<\infty). (13.7)

See the Notes for an alternative formula. Thus for Brownian motion on the circle

τ1=116G-1(1/e)=0.063.\tau_{1}={\textstyle\frac{1}{16}}G^{-1}(1/e)=0.063. (13.8)

If simple random walk is replaced by aperiodic random flight with step variance σ2\sigma^{2} then the asymptotic values of τ2\tau_{2} and τ1\tau_{1} are replaced by τ2/σ2\tau_{2}/\sigma^{2} and τ1/σ2\tau_{1}/\sigma^{2}; this may be deduced using the local central limit theorem ([133] Theorem 2.5.2).

Reflecting Brownian motion B¯\bar{B} on the interval [0,1][0,1] is very similar. Intuitively, imagine that upon hitting an endpoint 00 or 11 the particle is instantaneously inserted an infinitesimal distance into the interval. Formally one can construct B¯t\bar{B}_{t} as B¯t:=ϕ(Bt)\bar{B}_{t}:=\phi(B_{t}) for the concertina map

ϕ(2j+x)=x, ϕ(2j+1+x)=1-x; 0x1,  j=-2,-1,0,1,2,.\phi(2j+x)=x,\quad\phi(2j+1+x)=1-x;\quad 0\leq x\leq 1,\ j=\ldots-2,-1,0,1,2,\ldots.

The process B¯\bar{B} has eigenvalues {π2j2/2, 0j<}\{\pi^{2}j^{2}/2,\quad 0\leq j<\infty\} with eigenfunctions cos(πjx)\cos(\pi jx). In particular the relaxation time is

τ2=2π2.\tau_{2}={\textstyle\frac{2}{\pi^{2}}}.

The result for random walk on the nn-path (Chapter 5 Example 8)

τ22n2π2 as n\tau_{2}\sim{\textstyle\frac{2n^{2}}{\pi^{2}}}\mbox{ as }n\to\infty

is another instance of the weak convergence paradigm, a consequence of the n2n^{2} time-rescaling which takes random walk on the nn-path to reflecting Brownian motion on the interval. The variation distance function d¯(t)\bar{d}(t) for B¯\bar{B} can be expressed in terms of the corresponding quantity (write as d(t)d^{\circ}(t)) for BB^{\circ}. Briefly, it is easy to check

(B¯t,0t<)=d(2min(Bt/4,1-Bt/4),0t<)(\bar{B}_{t},0\leq t<\infty)\ \stackrel{d}{=}\ (2\min(B^{\circ}_{t/4},1-B^{% \circ}_{t/4}),0\leq t<\infty)

and then to deduce d¯(t)=d(t/4)\bar{d}(t)=d^{\circ}(t/4). Then using (13.8)

τ1=14G-1(1/e)=0.252.\tau_{1}={\textstyle\frac{1}{4}}G^{-1}(1/e)=0.252. (13.9)

13.1.2 dd-dimensional Brownian motion

Standard dd-dimensional Brownian motion can be written as

𝐁t=(Bt(1),,Bt(d)){\bf B}_{t}=(B^{(1)}_{t},\ldots,B^{(d)}_{t})

where the component processes (Bt(i),i=1,,d)(B^{(i)}_{t},i=1,\ldots,d) are independent one-dimensional standard Brownian motions. A useful property of 𝐁{\bf B} is isotropy: its distribution is invariant under rotations of RdR^{d}. In approximating simple random walk (Xm,m=0,1,2,)(X_{m},m=0,1,2,\ldots) on ZdZ^{d} one needs to be a little careful with scaling constants. The analog of (13.3) is

(m-1/2Xmt,0t<)d(d-1/2𝐁t,0t<)(m^{-1/2}X_{\lfloor mt\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ (d^% {-1/2}{\bf B}_{t},0\leq t<\infty) (13.10)

where the factor d-1/2d^{-1/2} arises because the components of the random walk have variance 1/d1/d — see (13.4). Analogous to (13.5), random walk (Xm(n),m=0,1,2,)(X^{(n)}_{m},m=0,1,2,\ldots) on the discrete torus ZndZ^{d}_{n} converges to Brownian motion 𝐁{\bf B}^{\circ} on the continuous torus [0,1)d[0,1)^{d}:

n-1(Xn2t(n),0t<)d(d-1/2𝐁t,0t<) as n.n^{-1}(X^{(n)}_{\lfloor n^{2}t\rfloor},0\leq t<\infty)\ \stackrel{d}{% \rightarrow}\ (d^{-1/2}{\bf B}^{\circ}_{t},0\leq t<\infty)\mbox{ as }n\to\infty. (13.11)

13.1.3 Brownian motion in a convex set

Fix a convex polyhedron KRdK\subset R^{d}. One can define reflecting Brownian motion in KK; heuristically, when the particle hits a face it is replaced an infinitesimal distance inside KK, orthogonal to the face. As in the previous examples, the stationary distribution is uniform on KK. We will outline a proof of

Proposition 13.1

For Brownian motion 𝐁{\bf B} in a convex polyhedron KK which is a subset of the ball of radius rr,

(i) τ1G-1(1/e)r2\tau_{1}\leq G^{-1}(1/e)\ r^{2}

(ii) τ28π-2r2\tau_{2}\leq 8\pi^{-2}r^{2}.

Proof. By the dd-dimensional version of Brownian scaling (13.7) we can reduce to the case r=1r=1. The essential fact is

Lemma 13.2

Let B¯t\bar{B}_{t} be reflecting Brownian motion on [0,1][0,1] started at 11, and let T0T_{0} be its hitting time on 00. Versions 𝐁(1),𝐁(2){\bf B}^{(1)},{\bf B}^{(2)} of Brownian motion in KK started from arbitrary points of KK can be constructed jointly with B¯\bar{B} such that

|𝐁t(1)-𝐁t(2)|2B¯min(t,T0), 0t<.|{\bf B}^{(1)}_{t}-{\bf B}^{(2)}_{t}|\leq 2\bar{B}_{\min(t,T_{0})},\quad 0\leq t% <\infty. (13.12)

Granted this fact, d¯(t)\bar{d}(t) for Brownian motion on KK satisfies

d¯(t)=maxstartingpointsP(𝐁t(1)𝐁t(2))P(T0>t)=G(t)\bar{d}(t)=\max_{{\rm startingpoints}}P({\bf B}^{(1)}_{t}\neq{\bf B}^{(2)}_{t}% )\leq P(T_{0}>t)=G(t)

where the final equality holds because T0T_{0} has the same distribution as the time for Brownian motion stared at 11 to exit the interval (0,2)(0,2). This establishes (i) for r=1r=1. Then from the tt\to\infty asymptotics of G(t)G(t) in (13.1) we have d¯(t)=O(exp(-π2t/8))\bar{d}(t)=O(\exp(-\pi^{2}t/8)), implying τ28/π2\tau_{2}\leq 8/\pi^{2} by Lemma LABEL:L-Chap4 and establishing (ii).

Sketch proof of Lemma. Details require familiarity with stochastic calculus, but this outline provides the idea. For two Brownian motions in RdR^{d} started from (0,0,,0)(0,0,\ldots,0) and from (x,0,,0)(x,0,\ldots,0), one can define the reflection coupling by making the first coordinates evolve as the one-dimensional reflection coupling, and making the other coordinate processes be identical in the two motions. Use isotropy to entend the definition of reflection coupling to arbitrary starting points. Note that the distance between the processes evolves as 22 times one-dimensional Brownian motion, until they meet. The desired joint distribution of ((𝐁t(1),𝐁t(2)), 0t<)(({\bf B}^{(1)}_{t},{\bf B}^{(2)}_{t}),\quad 0\leq t<\infty) is obtained by specifying that while both processes are in the interior of KK, they evolve as the reflection coupling (and each process reflects orthogonally at faces). As the figure illustrates, the effect of reflection can only be to decrease distance between the two Brownian particles.

\starbbbb^{\prime}cccc^{\prime}aaxx

For a motion hitting the boundary at aa, if the unreflected process is at bb or cc an infinitesimal time later then the reflected process is at bb^{\prime} or cc^{\prime}. By convexity, for any xKx\in K we have |b-x||b-x||b^{\prime}-x|\leq|b-x|; so reflection can only decrease distance between coupled particles. To argue the inequality carefully, let α\alpha be the vector normal to the face. The projection 𝒫α{\cal P}_{\alpha} satisfies |𝒫α(b-x)||𝒫α(b-x)||{\cal P}_{\alpha}(b^{\prime}-x)|\leq|{\cal P}_{\alpha}(b-x)|. Further, b-bαb-b^{\prime}\perp\alpha, implying Pα(b-x)=Pα(b-x)P_{\alpha^{\perp}}(b^{\prime}-x)=P_{\alpha^{\perp}}(b-x). Therefore by Pythagoras |b-x||b-x||b^{\prime}-x|\leq|b-x|.

We can therefore write, in stochastic calculus notation,

d|𝐁t(1)-𝐁t(2)|=d(2Bt)-dAtd|{\bf B}^{(1)}_{t}-{\bf B}^{(2)}_{t}|=d(2B_{t})-dA_{t}

where BtB_{t} is a one-dimensional Brownian motion and AtA_{t} is an increasing process (representing the contribution from reflections off faces) which increases only when one process is at a face. But we can construct reflecting Brownian motion B¯\bar{B} in terms of the same underlying BtB_{t} by

d(2B¯t)=d(2Bt)-dCtd(2\bar{B}_{t})=d(2B_{t})-dC_{t}

where CtC_{t} (representing the contribution from reflections off the endpoint 11) is increasing until T0T_{0}. At time 00 we have (because r=1r=1)

|𝐁0(1)-𝐁0(2)|2=2B¯0.|{\bf B}^{(1)}_{0}-{\bf B}^{(2)}_{0}|\leq 2=2\bar{B}_{0}.

We have shown

d(|𝐁t(1)-𝐁t(2)|-2B¯t)=-dAt+dCt.d(|{\bf B}^{(1)}_{t}-{\bf B}^{(2)}_{t}|-2\bar{B}_{t})=-dA_{t}+dC_{t}.

If the desired inequality (13.12 fails then it fails at some first time tt, which can only be a time when dCtdC_{t} is increasing, that is when B¯t=1\bar{B}_{t}=1, at which times the inequality holds a priori.  \ \ \rule{4.3pt}{4.3pt}.

Proposition 13.1 suggests an approach to the algorithmic question of simulating a uniform random point in a convex set KRdK\subset R^{d} where dd is large, discussed in Chapter 9 section 5.1. If we could simulate the discrete-time chain defined as reflecting Brownian motion 𝐁{\bf B} on KK examined at time intervals of h2/dh^{2}/d for some small hh (so that the length of a typical step is of order (h2/d)×d=h\sqrt{(h^{2}/d)\times d}=h), then Proposition 13.1 implies that O(d/h2)O(d/h^{2}) steps are enough to approach the stationary distribution. Since the convex set is available only via an oracle, one can attempt to do the simulation via acceptance/rejection. That is, from 𝐱{\bf x} we propose a move to 𝐱=𝐱+h2/dZ{\bf x}^{\prime}={\bf x}+\sqrt{h^{2}/d}\ Z where ZZ has standard dd-variate Normal distribution, and accept the move iff 𝐱K{\bf x}^{\prime}\in K. While this leads to a plausible heuristic argument, the rigorous difficulty is that it is not clear how close an acceptance/rejection step is to the true step of reflecting Brownian motion. No rigorous argument based directly on Brownian motion has yet been found, though the work of Bubley et al [78] on coupling of random walks has elements in common with reflection coupling.

13.1.4 Discrete-time chains: an example on the simplex

Discrete-time, continuous-space chains arise in many settings, in particular (Chapter MCMC) in Markov Chain Monte Carlo sampling from a target distribution on RdR^{d}. As discussed in that chapter, estimating mixing times for such chains with general target distributions is extremely difficult. The techniques in this book are more directly applicable to chains with (roughly) uniform stationary distribution. The next example is intended to give the flavor of how techniques might be adapted to the continuous setting: we will work through the details of a coupling argument.

Example 13.3

A random walk on the simplex.

Fix dd and consider the simplex Δ={𝐱=(x1,,xd}:xi0,ixi=1}\Delta=\{{\bf x}=(x_{1},\ldots,x_{d}\}:x_{i}\geq 0,\sum_{i}x_{i}=1\}. Consider the discrete-time Markov chain (𝐗(t),t=0,1,2,)({\bf X}(t),t=0,1,2,\ldots) on Δ\Delta with steps:

from state 𝐱{\bf x}, pick 2 distinct coordinates {i,j}\{i,j\} uniformly at random, and replace the 2 entries {xi,xj}\{x_{i},x_{j}\} by {U,xi+xj-U}\{U,x_{i}+x_{j}-U\} where UU is uniform on (0,xi+xj)(0,x_{i}+x_{j}).

The stationary distribution π\pi is the uniform distribution on Δ\Delta. We will show that the mixing time τ1\tau_{1} satisfies

τ1=O(d2logd) as d.\tau_{1}=O(d^{2}\log d)\mbox{ as }d\rightarrow\infty. (13.13)

The process is somewhat reminiscent of card shuffling by random transpositions (Chapter 7 Example 18), so by analogy with that example we expect that in fact τ1=Θ(dlogd)\tau_{1}=\Theta(d\log d). What we show here is that the coupling analysis of that example (Chapter 4-3 section 1.7) extends fairly easily to the present example.

As a preliminary, let us specify two distinct couplings (A,B)(A,B) of the uniform(0,a)(0,a) and the uniform(0,b)(0,b) distributions. In the scaling coupling we take (A,B)=(aU,bU)(A,B)=(aU,bU) for UU with uniform(0,1)(0,1) distribution. In the greedy coupling we make P(A=B)P(A=B) have its maximal value, which is min(a,b)/max(a,b)\min(a,b)/\max(a,b), and we say the coupling works if A=BA=B.

Fix 𝐱(0)Δ{\bf x}(0)\in\Delta. We now specify a coupling (𝐗(t),𝐘(t))({\bf X}(t),{\bf Y}(t)) of the chains started with 𝐗(0)=𝐱(0){\bf X}(0)={\bf x}(0) and with 𝐘(0){\bf Y}(0) having the uniform distribution. (This is an atypical couplig argument, in that it matters that one version is the stationary version).

From state (𝐱,𝐲)({\bf x},{\bf y}), choose the same random pair {i,j}\{i,j\} for each process, and link the new values xix_{i}^{\prime} and yiy_{i}^{\prime} (which are uniform on different intervals) via the scaling coupling for the first t1=3d2logdt_{1}=3d^{2}\log d steps, then via the greedy coupling for the next t2=Cd2t_{2}=Cd^{2} steps.

We shall show that, for any fixed constant C>0C>0,

P(𝐗(t1+t2)=𝐘(t1+t2))1-C-1-o(1) as dP({\bf X}(t_{1}+t_{2})={\bf Y}(t_{1}+t_{2}))\geq 1-C^{-1}-o(1)\mbox{ as }d\to\infty (13.14)

establishing (13.13).

Consider the effect on l1l_{1} distance ||𝐱-𝐲||:=i|xi-yi|||{\bf x}-{\bf y}||:=\sum_{i}|x_{i}-y_{i}| of a step of the scaling coupling using coordinates {i,j}\{i,j\}. The change is

|U(xi+xj)-U(yi+yj)|+|(1-U)(xi+xj)-(1-U)(yi+yj)|-|xi-yi|-|xj-yj||U(x_{i}+x_{j})-U(y_{i}+y_{j})|+|(1-U)(x_{i}+x_{j})-(1-U)(y_{i}+y_{j})|-|x_{i}% -y_{i}|-|x_{j}-y_{j}|
=|(xi+xj)-(yi+yj)|-|xi-yi|-|xj-yj|=|(x_{i}+x_{j})-(y_{i}+y_{j})|-|x_{i}-y_{i}|-|x_{j}-y_{j}|\quad\quad\quad
={0 if sgn(xi-yi)=sgn(xj-yj)-2min(|xi-yi|,|xj-yj|) if not .=\left\{\begin{array}[]{ll}0&\mbox{ if }{\rm sgn}\ (x_{i}-y_{i})={\rm sgn}\ (x% _{j}-y_{j})\\ -2\min(|x_{i}-y_{i}|,|x_{j}-y_{j}|)&\mbox{ if not }.\end{array}\right.

Thus

E(𝐱,𝐲)(||𝐗(1)-𝐘(1)||-||𝐱-𝐲||)E_{({\bf x},{\bf y})}\left(||{\bf X}(1)-{\bf Y}(1)||-||{\bf x}-{\bf y}||\right)
=-2d(d-1)ijimin(|xi-yi|,|xj-yj|)1(sgn(xi-yi)sgn(xj-yj))=\frac{-2}{d(d-1)}\sum_{i}\sum_{j\neq i}\min(|x_{i}-y_{i}|,|x_{j}-y_{j}|)1_{({% \rm sgn}\ (x_{i}-y_{i})\neq{\rm sgn}\ (x_{j}-y_{j}))}
=-4d(d-1)iAjBmin(ci,dj)=\frac{-4}{d(d-1)}\sum_{i\in A}\sum_{j\in B}\min(c_{i},d_{j})

(where ci:=xi-yi on A:={i:xi>yi};  dj:=yj-xj on B:={j:yj>xj}\mbox{where }c_{i}:=x_{i}-y_{i}\mbox{ on }A:=\{i:x_{i}>y_{i}\};\ d_{j}:=y_{j}-% x_{j}\mbox{ on }B:=\{j:y_{j}>x_{j}\})

=\displaystyle= -4d(d-1)iAjBcidjmax(ci,dj)\displaystyle\frac{-4}{d(d-1)}\sum_{i\in A}\sum_{j\in B}\frac{c_{i}d_{j}}{\max% (c_{i},d_{j})}
\displaystyle\leq -4d(d-1)iAjBcidj||𝐱-𝐲||/2\displaystyle\frac{-4}{d(d-1)}\sum_{i\in A}\sum_{j\in B}\frac{c_{i}d_{j}}{||{% \bf x}-{\bf y}||/2}
=\displaystyle= -2d(d-1)||𝐱-𝐲||\displaystyle\frac{-2}{d(d-1)}||{\bf x}-{\bf y}||

because iAci=jBdj=||𝐱-𝐲||/2\sum_{i\in A}c_{i}=\sum_{j\in B}d_{j}=||{\bf x}-{\bf y}||/2. So

E(𝐱,𝐲)||𝐗(1)-𝐘(1)||(1-2d(d-1))||𝐱-𝐲||.E_{({\bf x},{\bf y})}||{\bf X}(1)-{\bf Y}(1)||\leq\left(1-\frac{2}{d(d-1)}% \right)||{\bf x}-{\bf y}||.

Because ||𝐗(0)-𝐘(0)||2||{\bf X}(0)-{\bf Y}(0)||\leq 2, it follows that after tt steps using the scaling coupling,

E||𝐗(t)-𝐘(t)||2(1-2d(d-1))t.E||{\bf X}(t)-{\bf Y}(t)||\leq 2\left(1-\frac{2}{d(d-1)}\right)^{t}.

So by taking t13d2logdt_{1}\sim 3d^{2}\log d, after t1t_{1} steps we have

P(||𝐗(t1)-𝐘(t1)||d-5)=1-o(1).P(||{\bf X}(t_{1})-{\bf Y}(t_{1})||\leq d^{-5})=1-o(1). (13.15)

Now consider the greedy coupling. If a step works, the l1l_{1} distance ||𝐗(t)-𝐘(t)||||{\bf X}(t)-{\bf Y}(t)|| cannot increase. The chance that a step from (𝐱,𝐲)({\bf x},{\bf y}) involving coordinates {i,j}\{i,j\} works is

min(xi+xj,yi+yj)max(xi+xj,yi+yj)\displaystyle\frac{\min(x_{i}+x_{j},y_{i}+y_{j})}{\max(x_{i}+x_{j},y_{i}+y_{j})} \displaystyle\geq yi+yj-||𝐱-𝐲||max(xi+xj,yi+yj)\displaystyle\frac{y_{i}+y_{j}-||{\bf x-\bf y}||}{\max(x_{i}+x_{j},y_{i}+y_{j})}
\displaystyle\geq yi+yj-||𝐱-𝐲||yi+yj+||𝐱-𝐲||\displaystyle\frac{y_{i}+y_{j}-||{\bf x-\bf y}||}{y_{i}+y_{j}+||{\bf x-\bf y}||}
\displaystyle\geq yi+yj-2||𝐱-𝐲||yi+yj\displaystyle\frac{y_{i}+y_{j}-2||{\bf x-\bf y}||}{y_{i}+y_{j}}
\displaystyle\geq 1-||𝐱-𝐲||min(yi,yj).\displaystyle 1-\frac{||{\bf x-\bf y}||}{\min(y_{i},y_{j})}.

So unconditionally

P(𝐱,𝐲)(greedy coupling works on first step)1-||𝐱-𝐲||minkyk.P_{({\bf x},{\bf y})}(\mbox{greedy coupling works on first step})\geq 1-\frac{% ||{\bf x-\bf y}||}{\min_{k}y_{k}}. (13.16)

Now the uniform distribution (Y1(d),,Yd(d))(Y^{(d)}_{1},\ldots,Y^{(d)}_{d}) on the simplex has the property (use [133] Exercise 2.6.10 and the fact that the uniform distribution on the simplex is the joint distribution of spacings between d-1d-1 uniform(0,1)(0,1) variables and the endpoint 11)

 if constants ad>0a_{d}>0 satisfy dad0da_{d}\to 0 then P(Y1(d)ad)dad.\mbox{ if constants $a_{d}>0$ satisfy $da_{d}\to 0$ then }P(Y^{(d)}_{1}\leq a_% {d})\sim da_{d}.

Since (𝐘(t))({\bf Y}(t)) is the stationary chain and Yi(d)=dY1(d)Y^{(d)}_{i}\ \stackrel{d}{=}\ Y^{(d)}_{1},

P(min1kdYk(t)d-4.5 for some t1<tt1+t2)t2dP(Y1(d)<d-4.5)P(\min_{1\leq k\leq d}Y_{k}(t)\leq d^{-4.5}\mbox{ for some }t_{1}<t\leq t_{1}+% t_{2})\leq t_{2}dP(Y^{(d)}_{1}<d^{-4.5})

and since t2=O(d2)t_{2}=O(d^{2}) this bound is o(1)o(1). In other words

P(min1kdYk(t)d-4.5 for all t1<tt1+t2)=1-o(1) as d.P(\min_{1\leq k\leq d}Y_{k}(t)\geq d^{-4.5}\mbox{ for all }t_{1}<t\leq t_{1}+t% _{2})=1-o(1)\mbox{ as }d\to\infty.

Combining this with (13.15,13.16) and the non-increase of l1l_{1} distance, we deduce

P(greedy coupling works for allt1<tt1+t2)=1-o(1).P(\mbox{greedy coupling works for all}t_{1}<t\leq t_{1}+t_{2})=1-o(1). (13.17)

Now consider the number M(t)M(t) of unmatched coordinates ii at time tt1t\geq t_{1}, that is, the number of ii with Xi(t)Yi(t)X_{i}(t)\neq Y_{i}(t). Provided the greedy coupling works, this number M(t)M(t) cannot increase, and decreases by at least 11 each time two unmatched coordinates are chosen. So we can compare (M(t1+t),t0)(M(t_{1}+t),t\geq 0) with the chain (N(t),t0)(N(t),t\geq 0) with N(0)=dN(0)=d and

P(N(t+1)=m-1|N(t)=m)=m(m-1)d(d-1)=1-P(N(t+1)=m|N(t)=m).P(N(t+1)=m-1|N(t)=m)=\frac{m(m-1)}{d(d-1)}=1-P(N(t+1)=m|N(t)=m).

As in the analysis of the shuffling example, the time T=min{t:N(t)=1}T=\min\{t:N(t)=1\} has ET=m=2dd(d-1)m(m-1)d2ET=\sum_{m=2}^{d}\frac{d(d-1)}{m(m-1)}\leq d^{2}. When the number M(t)M(t) goes strictly below 22 it must become 00, and so

P(greedy coupling works for all t1<tt1+t2,𝐗(t1+t2)𝐘(t1+t2))\displaystyle P(\mbox{greedy coupling works for all }t_{1}<t\leq t_{1}+t_{2},{% \bf X}(t_{1}+t_{2})\neq{\bf Y}(t_{1}+t_{2}))
=\displaystyle= P(greedy coupling works for all t1<tt1+t2,M(t1+t2)>1)\displaystyle P(\mbox{greedy coupling works for all }t_{1}<t\leq t_{1}+t_{2},M% (t_{1}+t_{2})>1)
\displaystyle\leq P(T>t2)1/C2.\displaystyle P(T>t_{2})\leq 1/C_{2}.

This and (13.17) establish (13.14).

13.1.5 Compact groups

Parallel to random flights on finite groups one can discuss discrete-time random flights on classical (continuous) compact groups such as the orthogonal group O(d)O(d) of d×dd\times d real orthogonal matrices. For instance, specify a reflection to be an automorphism which fixes the points in some hyperplane, so that a reflection matrix can be written as

A=I-2xxTA=I-2xx^{T}

where II is the d×dd\times d identity matrix and xx is a unit-length vector in RdR^{d}. Assigning to xx the Haar measure on the (d-1)(d-1)-sphere creates a uniform random reflection, and a sequence of uniform random reflections define a random flight on O(d)O(d). Porod [285] shows that the variation threshold satisfies

τ112dlogd\tau_{1}\sim{\textstyle\frac{1}{2}}d\log d

and that the cut-off phenomenon occurs. The result, and its proof via group representation theory, are reminiscent of card-shuffling via random transpositions (Chapter 7 Example 18).

13.1.6 Brownian motion on a fractal set

Constructions and properties of analogs of Brownian motion taking values in fractal subsets of RdR^{d} have been studied in great detail over the last 15 years. Since these processes are most easily viewed as limits of random walks on graphs, we shall say a little about the simplest example. The figure illustrates the first two stages of the construction of the well-known Sierpinski gasket.

00a1a_{1}a2a_{2}b1b_{1}b2b_{2}b3b_{3}Graph G1G_{1}00a1a_{1}a2a_{2}b1b_{1}b2b_{2}b3b_{3}Graph G2G_{2}

In the topology setting one may regard GdG_{d} as a closed subset of R2R^{2}, that is as a set of line-segments, and then the closure of d=1Gd\cup_{d=1}^{\infty}G_{d} is the Sierpinski gasket GG (this is equivalent to the usual construction by “cutting out middle triangles”). In the graph setting, regard GdG_{d} as a graph and write (Xt(d),t=0,1,2,)(X^{(d)}_{t},t=0,1,2,\ldots) for discrete-time random walk on GdG_{d} started at point 00. Let MdM_{d} be the number of steps of X(d)X^{(d)} until first hitting point a1a_{1} or a2a_{2}. Using symmetry properties of the graphs, there is a simple relationship between the distributions of M1M_{1} and M2M_{2}. For the walk on G2G_{2}, the length of the time segment until first hitting b1b_{1} or b2b_{2} is distributed as M1M_{1}; successive segments (periods until next hitting one of {0,a1,a2,b1,b2,b3}\{0,a_{1},a_{2},b_{1},b_{2},b_{3}\} other than the current one) are like successive steps of the walk on G1G_{1}, so the number of segments is distributed as M1M_{1}. Using the same argument for general dd gives

MdM_{d} is distributed as the dd’th generation size in a Galton-Watson branching process with 11 individual in generation 00 and offspring distributed as M1M_{1}.

It is easy to calculate EM1=5EM_{1}=5; indeed the distribution of M1M_{1} is determined by its generating function, which can be calculated to be EzM1=z2/(4-3z)Ez^{M_{1}}=z^{2}/(4-3z). So EMd=5dEM_{d}=5^{d}. This suggests the existence of a limit process on GG after rescaling time, that is a limit

(X5-dt(d),0t<)d(Xt(),0t<).(X^{(d)}_{\lfloor 5^{-d}t\rfloor},0\leq t<\infty)\ \stackrel{d}{\rightarrow}\ % (X^{(\infty)}_{t},0\leq t<\infty).

In fact we can be more constructive. Branching process theory ([133] Example 4.4.1) shows that Md/5ddWM_{d}/5^{d}\ \stackrel{d}{\rightarrow}\ W where EW=1EW=1 and where WW has the self-consistency property

i=1MWi=d 5W\sum_{i=1}^{M}W_{i}\ \stackrel{d}{=}\ 5W (13.18)

where (M;W1,W2,)(M;W_{1},W_{2},\ldots) are independent, M=dM1M\ \stackrel{d}{=}\ M_{1} and Wi=dWW_{i}\ \stackrel{d}{=}\ W. Now in the topological setting, the vertices of GdG_{d} are a subset of GG. Let X~t(d)\tilde{X}^{(d)}_{t} be the process on GdGG_{d}\subset G whose sequence of jumps is as the jumps of the discrete-time walk X(d)X^{(d)} but where the times between jumps are independent with distribution 5-dW5^{-d}W. Using (13.18) we can construct the processes X~(d)\tilde{X}^{(d)} jointly for all dd such that the process X~(d)\tilde{X}^{(d)}, watched only at the times of hitting (successively distinct) points of Gd-1G_{d-1}, is exactly the process X~(d-1)\tilde{X}^{(d-1)}. These coupled processes specify a process Xt()X^{(\infty)}_{t} on GG at a random subset of times tt. It can be shown that this random subset is dense and that sample paths extend continuously to all tt, and it is natural to call X()X^{(\infty)} Brownian motion on the Sierpinski gasket.