12 Coupling Theory and Examples (October 11, 1999)

12.1 Using coupling to bound variation distance

Recall from Chapter 2 section 4.1 (yyy 9/10/99 version) several ways in which variation distance is used to measure the deviation from stationarity of the time-tt distribution of a Markov chain:

di(t):\displaystyle d_{i}(t): =\displaystyle= ||Pi(Xt=)-π()||\displaystyle||P_{i}(X_{t}=\cdot)-\pi(\cdot)||
d(t):\displaystyle d(t): =\displaystyle= maxidi(t)\displaystyle\max_{i}d_{i}(t)
d¯(t):\displaystyle\bar{d}(t): =\displaystyle= maxij||Pi(Xt=)-Pj(Xt=)||.\displaystyle\max_{ij}||P_{i}(X_{t}=\cdot)-P_{j}(X_{t}=\cdot)||.

Recall also from yyy the definition of variation threshold time

τ1:=min{t:d¯(t)e-1}.\tau_{1}:=\min\{t:\bar{d}(t)\leq e^{-1}\}.

Since τ1\tau_{1} is affected by continuization, when using the above definition with a discrete-time chain we write τ1disc\tau_{1}^{{\rm disc}} for emphasis.

Coupling provides a methodology for seeking to upper bound d¯(t)\bar{d}(t) and hence τ1\tau_{1}. After giving the (very simple) theory in section 12.1.1 and some discussion in section 12.1.2, we proceed to give a variety of examples of its use. We shall say more about theory and applications of coupling in Chapter 8 (where we discuss the related idea of coupling from the past) and in Chapter 10 (on interacting random walks).

12.1.1 The coupling inequality

Consider a finite-state chain in discrete or continuous time. Fix states i,ji,j. Suppose we construct a coupling, that is a joint process ((Xt(i),Xt(j)),  t0)((X^{(i)}_{t},X^{(j)}_{t}),\ t\geq 0) such that

(Xt(i),t0)(X^{(i)}_{t},t\geq 0) is distributed as the chain started at ii
(Xt(j),t0)(X^{(j)}_{t},t\geq 0) is distributed as the chain started at jj.\displaystyle\mbox{$(X^{(j)}_{t},t\geq 0)$ is distributed as the chain started% at $j$}. (12.1)

And suppose there is a random time TijT^{ij}\leq\infty such that

Xt(i)=Xt(j),  Tijt<.X^{(i)}_{t}=X^{(j)}_{t},\ T^{ij}\leq t<\infty. (12.2)

Call such a TijT^{ij} a coupling time. Then the coupling inequality is

||Pi(Xt)-Pj(Xt)||P(Tij>t), 0t<.||P_{i}(X_{t}\in\cdot)-P_{j}(X_{t}\in\cdot)||\leq P(T^{ij}>t),\ 0\leq t<\infty. (12.3)

The inequality holds because

||Pi(Xt)-Pj(Xt)||\displaystyle||P_{i}(X_{t}\in\cdot)-P_{j}(X_{t}\in\cdot)|| =\displaystyle= ||P(Xt(i))-P(Xt(j))||\displaystyle||P(X^{(i)}_{t}\in\cdot)-P(X^{(j)}_{t}\in\cdot)||
\displaystyle\leq P(Xt(i)Xt(j))\displaystyle P(X^{(i)}_{t}\neq X^{(j)}_{t})
\displaystyle\leq P(Tij>t).\displaystyle P(T^{ij}>t).

12.1.2 Comments on coupling methodology

The coupling inequality provides a method of bounding the variation distance d¯(t)\bar{d}(t), because if we can construct a coupling for an arbitrary pair (i,j)(i,j) of initial states then

d¯(t)maxi,jP(Tij>t).\bar{d}(t)\leq\max_{i,j}P(T^{ij}>t).

The reader may wish to look at a few of the examples before reading this section in detail.

In applying coupling methodology there are two issues. First we need to specify the coupling, then we need to analyze the coupling time. The most common strategy for constructing couplings is via Markov couplings, as follows. Suppose the underlying chain has state space II and (to take the continuous-time case) transition rate matrix 𝐐=(q(i,k)){\bf Q}=(q(i,k)). Consider a transition rate matrix 𝐐~\widetilde{{\bf Q}} on the product space I×II\times I. Write the entries of 𝐐~\widetilde{{\bf Q}} as q~(i,j;k,l)\tilde{q}(i,j;k,l) instead of the logical-but-fussy q~((i,j),(k,l))\tilde{q}((i,j),(k,l)). Suppose that, for each pair (i,j)(i,j) with jij\neq i,

q~(i,j;,) has marginals q(i,) and q(j,)\tilde{q}(i,j;\cdot,\cdot)\mbox{ has marginals }q(i,\cdot)\mbox{ and }q(j,\cdot) (12.4)

in other words lq~(i,j;k,l)=q(i,k)\sum_{l}\tilde{q}(i,j;k,l)=q(i,k) and kq~(i,j;k,l)=q(j,l)\sum_{k}\tilde{q}(i,j;k,l)=q(j,l). And suppose that

q~(i,i;k,k)\displaystyle\tilde{q}(i,i;k,k) =\displaystyle= q(i,k) for all k\displaystyle q(i,k)\mbox{ for all }k
q~(i,i;k,l)\displaystyle\tilde{q}(i,i;k,l) =\displaystyle= 0 for lk.\displaystyle 0\mbox{ for }l\neq k.

Take (Xt(i),Xt(j))(X^{(i)}_{t},X^{(j)}_{t}) to be the chain on I×II\times I with transition rate matrix 𝐐~\widetilde{{\bf Q}} and initial position (i,j)(i,j), Then (12.1) must hold, and Tij:=min{t:Xt(i)=Xt(j)}T^{ij}:=\min\{t:X^{(i)}_{t}=X^{(j)}_{t}\} is a coupling time. This construction gives a natural Markov coupling, and all the examples where we use the coupling inequality will be of this form. In practice it is much more understandable to define the joint process in words, and we usually do so.

In constructing and analyzing couplings, we often exploit (explicitly or implicitly) some integer-valued metric ρ(i,j)\rho(i,j) on the state space II. Then with a Markovian coupling,

P(Tij>t)=P(ρ(Xti,Xt(j))1)P(T^{ij}>t)=P(\rho(X^{i}_{t},X^{(j)}_{t})\geq 1)

and it is enough to study the integer-valued process Zt:=ρ(Xt(i),Xt(j))Z_{t}:=\rho(X^{(i)}_{t},X^{(j)}_{t}). Typically (Zt)(Z_{t}) is not Markov, but one can try to compare it with some integer-valued Markov process (Zt*)(Z^{*}_{t}). Indeed, in defining the coupling one has in mind trying to make such a comparison possible. Often one shows that for any initial (i,j)(i,j) the random time TijT^{ij} is stochastically smaller than the hitting time Ta0*T^{*}_{a0} for the comparison chain (Zt*)(Z^{*}_{t}) to reach 00 starting from a:=maxi,jρ(i,j)a:=\max_{i,j}\rho(i,j). This would imply

d¯(t)P(Ta0*>t).\bar{d}(t)\leq P(T^{*}_{a0}>t).

Finally, one does calculations with the integer-valued chain (Zt*)(Z^{*}_{t}), either bounding the tail probability P(Ta0*>t)P(T^{*}_{a0}>t) directly or (what is often simpler) just bounding the expectation ETa0*ET^{*}_{a0}, so that by Markov’s inequality and the submultiplicativity property (Chapter 2 Lemma 20) (yyy 9/10/99 version) we have in continuous time

τ1eETa0*; d¯(t)exp(1-tτ1).\tau_{1}\leq eET^{*}_{a0};\quad\bar{d}(t)\leq\exp(1-{\textstyle\frac{t}{\tau_{% 1}}}).

Here is perhaps the simplest comparison lemma, whose proof is left to the reader.

Lemma 12.1 (Decreasing functional lemma)

Let (Yt)(Y_{t}) be a Markov chain on SS and f:S{0,1,2,,Δ}f:S\to\{0,1,2,\ldots,\Delta\} a function. Suppose that for each 1iΔ1\leq i\leq\Delta and each initial state yy with f(y)=if(y)=i,

(i) f(Y1)if(Y_{1})\leq i;

(ii) P(f(Y1)i-1)ai>0P(f(Y_{1})\leq i-1)\geq a_{i}>0.

Then

maxySEyTAi=1Δ1/ai\max_{y\in S}E_{y}T_{A}\leq\sum_{i=1}^{\Delta}1/a_{i}

where A:={y:f(y)=0}A:=\{y:f(y)=0\}.

We now start a series of examples. Note that when presenting a coupling proof we don’t need to explicitly check irreducibility, because the conclusion of a bound on coupling time obviously implies irreducibility.

12.1.3 Random walk on a dense regular graph

(Chapter 5 Example 16).

Consider an rr-regular nn-vertex graph. Write 𝒩(v)\mbox{${\cal N}$}(v) for the set of neighbors of vv. For any pair v,wv,w we can define a 1-11-1 map θv,w:𝒩(v)𝒩(w)\theta_{v,w}:\mbox{${\cal N}$}(v)\rightarrow\mbox{${\cal N}$}(w) such that θv,w(x)=x\theta_{v,w}(x)=x for x𝒩(v)𝒩(w)x\in\mbox{${\cal N}$}(v)\cap\mbox{${\cal N}$}(w). Consider discrete-time random walk on the graph. We define a “greedy coupling” by specifying the joint transition matrix

p~(v,w;x,θv,w(x))=1/r, x𝒩(v).\tilde{p}(v,w;x,\theta_{v,w}(x))=1/r,\ \ x\in\mbox{${\cal N}$}(v).

That is, from vertices vv and ww, if the first chain jumps to xx then the second chain jumps to θv,w(x)\theta_{v,w}(x), and we maximize the chance of the two chains meeting after a single step. In general one cannot get useful bounds on the coupling time. But consider the dense case, where r>n/2r>n/2. As observed in Chapter 5 Example 16, here |𝒩(v)𝒩(w)|2r-n|\mbox{${\cal N}$}(v)\cap\mbox{${\cal N}$}(w)|\geq 2r-n and so the coupled process (Xt,Yt)(X_{t},Y_{t}) has the property that for wvw\neq v

P(Xt+1=Yt+1|Xt=v,Yt=w)=|𝒩(v)𝒩(w)|r2r-nrP(X_{t+1}=Y_{t+1}|X_{t}=v,Y_{t}=w)=\frac{|\mbox{${\cal N}$}(v)\cap\mbox{${\cal N% }$}(w)|}{r}\geq\frac{2r-n}{r}

implying that the coupling time TT (for any initial pair of states) satisfies

P(T>t)(n-rr)t.P(T>t)\leq\left(\frac{n-r}{r}\right)^{t}.

So the coupling inequality implies d¯(t)(n-rr)t\bar{d}(t)\leq(\frac{n-r}{r})^{t}. In particular the variation threshold satisfies

τ1disc=O(1) as n,  r/nα>1/2.\tau_{1}^{{\rm disc}}=O(1)\mbox{ as }n\to\infty,\ r/n\to\alpha>1/2.

12.1.4 Continuous-time random walk on the dd-cube

(Chapter 5 Example 15).

For 𝐢=(i1,,id){\bf i}=(i_{1},\ldots,i_{d}) and 𝐣=(j1,,jd){\bf j}=(j_{1},\ldots,j_{d}) in I={0,1}dI=\{0,1\}^{d}, let 𝒟(𝐢,𝐣){\cal D}({\bf i},{\bf j}) be the set of coordinates uu where 𝐢{\bf i} and 𝐣{\bf j} differ. Write 𝐢u{\bf i}^{u} for the state obtained by changing the uu’th coordinate of 𝐢{\bf i}. Recall that in continuous time the components move independently as 22-state chains with transition rates 1/d1/d. Define a coupling in words as “run unmatched coordinates independently until they match, and then run them together”. Formally, the non-zero transitions of the joint process are

q~(𝐢,𝐣;𝐢u,𝐣u)\displaystyle\tilde{q}({\bf i},{\bf j};{\bf i}^{u},{\bf j}^{u}) =\displaystyle= 1/d if u𝒟(𝐢,𝐣)\displaystyle 1/d\mbox{ if }u\not\in{\cal D}({\bf i},{\bf j})
q~(𝐢,𝐣;𝐢u,𝐣)\displaystyle\tilde{q}({\bf i},{\bf j};{\bf i}^{u},{\bf j}) =\displaystyle= 1/d if u𝒟(𝐢,𝐣)\displaystyle 1/d\mbox{ if }u\in{\cal D}({\bf i},{\bf j})
q~(𝐢,𝐣;𝐢,𝐣u)\displaystyle\tilde{q}({\bf i},{\bf j};{\bf i},{\bf j}^{u}) =\displaystyle= 1/d if u𝒟(𝐢,𝐣).\displaystyle 1/d\mbox{ if }u\in{\cal D}({\bf i},{\bf j}).

For each coordinate which is initially unmatched, it takes exponential (rate 2/d2/d) time until it is matched, and so the coupling time T=T𝐢𝐣T=T^{{\bf ij}} satisfies

T=dmax(ξ1,,ξd0)T\ \stackrel{d}{=}\ \max(\xi_{1},\ldots,\xi_{d_{0}})

where the (ξu)(\xi_{u}) are independent exponential (rate 2/d2/d) and d0=|𝒟(𝐢,𝐣)|d_{0}=|{\cal D}({\bf i},{\bf j})| is the initial number of unmatched coordinates. So

P(Tt)=(1-exp(-2t/d))d0P(T\leq t)=(1-\exp(-2t/d))^{d_{0}}

and the coupling inequality bounds variation distance as

d¯(t)1-(1-exp(-2t/d))d.\bar{d}(t)\leq 1-(1-\exp(-2t/d))^{d}.

This leads to an upper bound on the variation threshold time

τ1(12+o(1))dlogd as d.\tau_{1}\leq({\textstyle\frac{1}{2}}+o(1))d\log d\mbox{ as }d\rightarrow\infty.

This example is discussed in more detail in Chapter 5 Example 15 (yyy 4/23/96 version) where it is shown that

τ114dlogd as d\tau_{1}\sim{\textstyle\frac{1}{4}}d\log d\mbox{ as }d\rightarrow\infty

so the coupling bound is off by a factor of 22.

12.1.5 The graph-coloring chain

Fix a nn-vertex graph with maximal degree rr. Fix an integer cr+2c\geq r+2 and consider [c]:={1,2,,c}[c]:=\{1,2,\ldots,c\} as a set of cc colors. Let col(G,c){\rm col}(G,c) be the set of cc-colorings of GG, where a cc-coloring is an assignment of a color to each vertex, in such a way that no two adjacent vertices have the same color. One can put a natural “product graph” structure on col(G,c){\rm col}(G,c), in which two colorings are adjacent if they differ at only one vertex. It is not hard to check that the condition cr+2c\geq r+2 ensures that col(G,c){\rm col}(G,c) is non-empty and the associated graph is connected. There is a natural discrete-time Markov chain on col(G,c){\rm col}(G,c):

Pich a vertex vv of GG uniformly at random, pick a color γ\gamma uniformly at random, assign color γ\gamma to vertex vv if feasible (i.e. if no neighbor of vv has color γ\gamma), else retain existing color of vv.

Under certain conditions a simple coupling analysis succeeds in bounding the mixing time. (The bound is far from sharp – see Notes).

Proposition 12.2

If c>4rc>4r then d¯(t)nexp(-(c-4r)tcn)\bar{d}(t)\leq n\exp(-{\textstyle\frac{(c-4r)t}{cn}}) and so τ1disc1+cnc-4r(1+logn)\tau_{1}^{{\rm disc}}\leq 1+\frac{cn}{c-4r}(1+\log n).

Proof. We couple two versions of the chain by simply using the same vv and γ\gamma in both chains at each step. Write DtD_{t} for the number of vertices at which the colors in the two chains differ. Then Dt+1-Dt{-1,0,1}D_{t+1}-D_{t}\in\{-1,0,1\} and the key estimate is the following.

Lemma 12.3

Conditional on the state of the coupled process at time tt,

P(Dt+1=Dt+1)\displaystyle P(D_{t+1}=D_{t}+1) \displaystyle\leq 2rDtcn\displaystyle\frac{2rD_{t}}{cn} (12.5)
P(Dt+1=Dt-1)\displaystyle P(D_{t+1}=D_{t}-1) \displaystyle\geq (c-2r)Dtcn\displaystyle\frac{(c-2r)D_{t}}{cn} (12.6)

Proof. In order that Dt+1=Dt+1D_{t+1}=D_{t}+1 it is necessary that the chosen pair (v,γ)(v,\gamma) is such that

(*) there exists a neighbor (ww, say) of vv such that ww has color γ\gamma in one chain but not in the other chain.

But the total number of pairs (v,γ)(v,\gamma) equals ncnc while the number of pairs satisfying (*) is at most Dt2rD_{t}\cdot 2r. This establishes (12.5). Similarly, for Dt+1=Dt-1D_{t+1}=D_{t}-1 it is sufficient that vv is currently unmatched and that no neighbor of vv in either chain has color γ\gamma; the number of such pairs (v,γ)(v,\gamma) is at least Dt(c-2r)D_{t}\cdot(c-2r). \Box

Lemma 12.3 implies E(Dt+1-Dt|Dt)-(c-4r)Dt/(cn)E(D_{t+1}-D_{t}|D_{t})\leq-(c-4r)D_{t}/(cn) and so

EDt+1κEDt; κ:=1-c-4rcn.ED_{t+1}\leq\kappa ED_{t};\quad\kappa:=1-{\textstyle\frac{c-4r}{cn}}.

Since D0nD_{0}\leq n we have, for any initial pair of states,

P(Dt1)EDtκtnnexp(-(c-4r)tcn)P(D_{t}\geq 1)\leq ED_{t}\leq\kappa^{t}n\leq n\exp(-{\textstyle\frac{(c-4r)t}{% cn}})

and the coupling lemma establishes the Proposition.

12.1.6 Permutations and words

The examples in sections 12.1.4 and 12.1.5 were simple prototypes of interacting particle systems, more examples of which appear in Chapter 10, whose characteristic property is that a step of the chain involves only “local” change. Chains making “global” changes are often hard to analyze, but here is a simple example.

Fix a finite alphabet AA of size |A||A|. Fix mm, and consider the set AmA^{m} of “words” 𝐱=(x1,,xm){\bf x}=(x_{1},\ldots,x_{m}) with each xiAx_{i}\in A. Consider the discrete-time Markov chain on AmA^{m} in which a step 𝐱𝐲{\bf x}\rightarrow{\bf y} is specified by the following two-stage procedure.

Stage 1. Pick a permutation σ\sigma of {1,2,,m}\{1,2,\ldots,m\} uniformly at random from the set of permutations σ\sigma satisfying xσ(i)=xiix_{\sigma(i)}=x_{i}\forall i.

Stage 2. Let (cj(σ);j1)(c_{j}(\sigma);j\geq 1) be the cycles of σ\sigma. For each jj, and independently as jj varies, pick uniformly an element αj\alpha_{j} of AA, and define yi=αjy_{i}=\alpha_{j} for every icj(σ)i\in c_{j}(\sigma).

Here is an alternative description. Write Π\Pi for the set of permutations of {1,,m}\{1,\ldots,m\}. Consider the bipartite graph on vertices AmΠA^{m}\cup\Pi with edge-set {(𝐱,σ):xσ(i)=xii}\{({\bf x},\sigma):x_{\sigma(i)}=x_{i}\forall i\}. Then the chain is random walk on this bipartite graph, watched every second step (that is, when it is in AmA^{m}).

From the second description, it is clear that the stationary probabilities π(𝐱)\pi({\bf x}) are proportional to the degree of 𝐱{\bf x} in the bipartite graph, giving

π(𝐱)ana(𝐱)!\pi({\bf x})\propto\prod_{a}n_{a}({\bf x})!

where na(𝐱)=|{i:xi=a}|n_{a}({\bf x})=|\{i:x_{i}=a\}|. We shall use a coupling argument to establish the following bound on variation distance:

d¯(t)m(1-1|A|)t\bar{d}(t)\leq m\left(1-\frac{1}{|A|}\right)^{t} (12.7)

implying that the variation threshold satisfies

τ1disc1+1+logm-log(1-1|A|)1+(1+logm)|A|.\tau_{1}^{{\rm disc}}\leq 1+\frac{1+\log m}{-\log(1-\frac{1}{|A|})}\leq 1+(1+% \log m)|A|.

The construction of the coupling depends on the following lemma.

Lemma 12.4

Given finite sets F1,F2F^{1},F^{2} we can construct (for u=1,2u=1,2) a uniform random permutation σu\sigma^{u} of FuF^{u} with cycles (Cju;j1)(C^{u}_{j};j\geq 1), where the cycles are labeled such that

Cj1F1F2=Cj2F1F2 for all j.C^{1}_{j}\cap F^{1}\cap F^{2}=C^{2}_{j}\cap F^{1}\cap F^{2}\mbox{ for all }j.

In the equality we interpret the CjuC^{u}_{j} as sets.

Proof. Given a permutation σ\sigma of a finite set GG, there is an induced permutation on a subset GG^{\prime} obtained by deleting from the cycle representation of σ\sigma those elements not in GG^{\prime}. It is easy to check that, for a uniform random permutation of GG, the induced random permutation of GG^{\prime} is also uniform. In the setting of the lemma, take a uniform random permutation σ\sigma of F1F2F^{1}\cup F^{2}, and let σu\sigma^{u} be the induced random permutations of FuF^{u}. Then the equality holds because each side is representing the cycles of the induced permutation on F1F2F^{1}\cap F^{2}. \Box

We construct a step (𝐱1,𝐱2)(𝐘1,𝐘2)({\bf x}^{1},{\bf x}^{2})\rightarrow({\bf Y}^{1},{\bf Y}^{2}) of the coupled processes as follows. For each aAa\in A, set F1,a={i:xi1=a},  F2,a={i:xi2=a}F^{1,a}=\{i:x^{1}_{i}=a\},\ F^{2,a}=\{i:x^{2}_{i}=a\}. Take random permutations σ1,a,  σ2,a\sigma^{1,a},\ \sigma^{2,a} as in the lemma, with cycles Cj1,a,Cj2,aC^{1,a}_{j},C^{2,a}_{j}. Then (σ1,a,aA)(\sigma^{1,a},a\in A) define a uniform random permutation σ1\sigma^{1} of {1,,m}\{1,\ldots,m\}, and similarly for σ2\sigma^{2}. This completes stage 1. For stage 2, for each pair (a,j)(a,j) pick a uniform random element αja\alpha^{a}_{j} of AA and set

Yi1=αja for every iCj1,aY^{1}_{i}=\alpha^{a}_{j}\mbox{ for every }i\in C^{1,a}_{j}
Yi2=αja for every iCj2,a.Y^{2}_{i}=\alpha^{a}_{j}\mbox{ for every }i\in C^{2,a}_{j}.

This specifies a Markov coupling. By construction

if xi1=xi2\displaystyle\mbox{if }x^{1}_{i}=x^{2}_{i} then Yi1=Yi2\displaystyle\mbox{ then }Y^{1}_{i}=Y^{2}_{i}
if xi1xi2\displaystyle\mbox{if }x^{1}_{i}\neq x^{2}_{i} then P(Yi1=Yi2)=1/|A|\displaystyle\mbox{ then }P(Y^{1}_{i}=Y^{2}_{i})=1/|A|

because Yi1Y^{1}_{i} and Yi2Y^{2}_{i} are independent uniform choices from AA. So the coupled processes (𝐗1(t),𝐗2(t))({\bf X}^{1}(t),{\bf X}^{2}(t)) satisfy

P(Xi1(t)Xi2(t))=(1-1|A|)tP(Xi1(0)Xi2(0)).P(X^{1}_{i}(t)\neq X^{2}_{i}(t))=\left(1-\frac{1}{|A|}\right)^{t}P(X^{1}_{i}(0% )\neq X^{2}_{i}(0)).

In particular P(𝐗1(t)𝐗2(t))m(1-1/|A|)tP({\bf X}^{1}(t)\neq{\bf X}^{2}(t))\leq m(1-1/|A|)^{t} and the coupling inequality (12.3) gives (12.7).

12.1.7 Card-shuffling by random transpositions

We mentioned in Chapter 1 section 1.4 (yyy 7/20/99 version) that card-shuffling questions provided a natural extrinsic motivation for the study of mixing times. The example here and in section 12.1.9 give a first study of mathematically (if not physically) simple random shuffles, and these discrete-time chains are prototypes for more complex chains arising in other contexts.

Consider a dd-card deck. The random transpositions shuffle is:

Make two independent uniform choices of cards, and interchange them.

With chance 1/d1/d the two choices are the same card, so no change results. To make a coupling analysis, we first give an equivalent reformulation.

Pick a label aa and a position ii uniformly at random; interchange the label-aa card with the card in position ii.

This reformulation suggests the coupling in which the same choice of (a,i)(a,i) is used for each chain. In the coupled process (with two arbitrary starting states) let DtD_{t} be the number of unmatched cards (that is, cards whose positions in the two decks are different) after tt steps. Then

(i) Dt+1DtD_{t+1}\leq D_{t}.

(ii) P(Dt+1j-1|Dt=j)j2/d2P(D_{t+1}\leq j-1|D_{t}=j)\geq j^{2}/d^{2}.

Here (i) is clear, and (ii) holds because whenever the card labeled aa and the card in position ii are both unmatched, the step of the coupled chain creates at least one new match (of the card labeled aa).

Noting that DtD_{t} cannot take value 11, we can use the decreasing functional lemma (Lemma 12.1) to show that the coupling time T:=min{t:Dt=0}T:=\min\{t:D_{t}=0\} satisfies

ETj=2dd2/j2d2(π26-1).ET\leq\sum_{j=2}^{d}d^{2}/j^{2}\leq d^{2}({\textstyle\frac{\pi^{2}}{6}}-1).

In particular, the coupling inequality implies τ1disc=O(d2)\tau_{1}^{{\rm disc}}=O(d^{2}).

We revisit this example in Chapter 7 Example 18 (yyy 1/31/94 version) where it is observed that in fact

τ12dlogd\tau\sim{\textstyle\frac{1}{2}}d\log d (12.8)

An analogous continuous-space chain on the simplex is studied in Chapter 13-4 Example 3 (yyy 7/29/99 version)

12.1.8 Reflection coupling on the nn-cycle

Consider continuous-time random walk on the nn-cycle I={0,1,2,,n-1}I=\{0,1,2,\ldots,n-1\}. That is, the transition rates are

i1/2i+1; i1/2i-1i\stackrel{1/2}{\longrightarrow}i+1;\quad i\stackrel{1/2}{\longrightarrow}i-1

where here and below ±1\pm 1 is interpreted modulo nn. One can define a coupling by specifying the following transition rates for the bivariate process.

(i,i)1/2(i+1,i+1);\displaystyle(i,i)\stackrel{1/2}{\longrightarrow}(i+1,i+1); (i,i)1/2(i-1,i-1)\displaystyle(i,i)\stackrel{1/2}{\longrightarrow}(i-1,i-1)
(if |j-i|>1|j-i|>1) (i,j)1/2(i+1,j-1);\displaystyle(i,j)\stackrel{1/2}{\longrightarrow}(i+1,j-1); (i,j)1/2(i-1,j+1)\displaystyle(i,j)\stackrel{1/2}{\longrightarrow}(i-1,j+1)
(i,i+1)1/2(i,i);\displaystyle(i,i+1)\stackrel{1/2}{\longrightarrow}(i,i); (i,i+1)1/2(i+1,i+1);\displaystyle(i,i+1)\stackrel{1/2}{\longrightarrow}(i+1,i+1); (i,i+1)1/2(i-1,i+2)\displaystyle(i,i+1)\stackrel{1/2}{\longrightarrow}(i-1,i+2) (12.9)

and symmetrically for (i+1,i)(i+1,i). The joint process ((Xt(0),Xt(k)),t0)((X^{(0)}_{t},X^{(k)}_{t}),t\geq 0) started at (0,k)(0,k) can be visualized as follows. Let ϕ(i):=k-imodn\phi(i):=k-i\bmod n. Picture the operation of ϕ\phi as reflection in a mirror which passes through the points {x1,x2}={k/2,k/2+n/2modn}\{x_{1},x_{2}\}=\{k/2,k/2+n/2\bmod n\} each of which is either a vertex or the middle of an edge. In the simplest case, where x1x_{1} and x2x_{2} are vertices, let (Xt0)(X^{0}_{t}) be the chain started at vertex 00, let T0k=min{t:Xt{x1,x2}}T^{0k}=\min\{t:X_{t}\in\{x_{1},x_{2}\}\} and define

Xt(k)\displaystyle X^{(k)}_{t} =\displaystyle= ϕ(Xt(0)),  tT0k\displaystyle\phi(X^{(0)}_{t}),\ t\leq T^{0k}
=\displaystyle= Xt(0),  t>T0k.\displaystyle X^{(0)}_{t},\ t>T^{0k}.

This constructs a bivariate process with the transition rates specified above, with coupling time T0kT^{0k}, and the pre-T0kT^{0k} path of X(k)X^{(k)} is just the reflection of the pre-T0kT^{0k} path of X(0)X^{(0)}. In the case where a mirror point is the middle of an edge (j,j+1)(j,j+1) and the two moving particles are at jj and j+1j+1, we don’t want simultaneous jumps across that edge; instead (12.9) specifies that attempted jumps occur at independent times, and the process is coupled at the time of the first such jump.

It’s noteworthy that in this example the coupling inequality

||P0(Xt)-Pk(Xt)||P(Xt(0)Xt(k))||P_{0}(X_{t}\in\cdot)-P_{k}(X_{t}\in\cdot)||\leq P(X^{(0)}_{t}\neq X^{(k)}_{t})

is in fact an equality. Indeed this assertion, at a given time tt, is equivalent to the assertion

P(Xt(0),T>t) and P(Xt(k),T>t) have disjoint support.P(X^{(0)}_{t}\in\cdot,T>t)\mbox{ and }P(X^{(k)}_{t}\in\cdot,T>t)\mbox{ have % disjoint support}.

But the support A0A^{0} of the first measure is the set of vertices which can be reached from 00 without meeting or crossing any mirror point (and similarly for AkA^{k}); and A0A^{0} and AkA^{k} are indeed disjoint.

It is intuitively clear that the minimum over kk of T0kT^{0k} is attained by k=n/2k=\lfloor n/2\rfloor: we leave the reader to find the simple non-computational proof. It follows, taking e.g. the simplest case where nn is multiple of 44, that we can write

d¯(t)=P(T{-n/4,n/4}>t)\bar{d}(t)=P(T_{\{-n/4,n/4\}}>t) (12.10)

where T{-n/4,n/4}T_{\{-n/4,n/4\}} is the hitting time for continuous-time random walk on the integers.

Parallel results hold in discrete time but only when the chains are suitably lazy. The point is that (12.9) isn’t allowable as transition probabilities. However, if we fix 0<a1/30<a\leq 1/3 then the chain with transition probabilities

iai+1; iai-1i\stackrel{a}{\longrightarrow}i+1;\quad i\stackrel{a}{\longrightarrow}i-1

(and which holds with the remaining probability) permits a coupling of the form (12.9) with all transition probabilities being aa instead of 1/21/2. The analysis goes through as above, leading to (12.10) where TT refers to the discrete-time lazy walk on the integers.

Similar results hold for random walk on the nn-path (Chapter 5 Example 8) (yyy 4/23/96 version). and we call couplings of this form reflection couplings. They are simpler in the context of continuous-path Brownian motion – see Chapter 13-4 section 1 (yyy 7/29/99 version).

12.1.9 Card-shuffling by random adjacent transpositions

As in section 12.1.7 we take a dd-card deck; here we define a (lazy) shuffle by

With probability 1/21/2 make no change; else pick a uniform random position i{1,2,,d}i\in\{1,2,\ldots,d\} and interchange the cards in positions ii and i+1i+1 (interpret d+1d+1 as 11).

To study this by coupling, consider two decks. In some positions ii the decks match (the label on the card in position ii is the same in both decks). Write 𝒟{\cal D} for the set of ii such that either position ii or position i+1i+1 or both match. Specify a step of the coupled chain by:

P(interchange ii and i+1i+1 in each deck)\displaystyle P(\mbox{interchange $i$ and $i+1$ in each deck}) =\displaystyle= 12d, i𝒟\displaystyle{\textstyle\frac{1}{2d}},\quad i\in{\cal D}
P(interchange ii and i+1i+1 in first deck, no change in second deck)\displaystyle P(\mbox{interchange $i$ and $i+1$ in first deck, no change in % second deck}) =\displaystyle= 12d, i𝒟\displaystyle{\textstyle\frac{1}{2d}},\quad i\not\in{\cal D}
P(interchange ii and i+1i+1 in second deck, no change in first deck)\displaystyle P(\mbox{interchange $i$ and $i+1$ in second deck, no change in % first deck}) =\displaystyle= 12d, i𝒟\displaystyle{\textstyle\frac{1}{2d}},\quad i\not\in{\cal D}
P(no change in either deck)\displaystyle P(\mbox{no change in either deck}) =\displaystyle= |𝒟|2d.\displaystyle{\textstyle\frac{|{\cal D}|}{2d}}.

Consider a particular card aa. From the coupling description we see

(a) if the card gets matched then it stays matched;

(b) while unmatched, at each step the card can move in at most one of the decks.

It follows that the “clockwise” distance D(t):=Xa1(t)-Xa2(t)moddD(t):=X_{a}^{1}(t)-X_{a}^{2}(t)\bmod d between the positions of card aa in the two decks behaves exactly as a lazy random walk on the dd-cycle:

pj,j+1=pj,j-1=1/d, 1jdp_{j,j+1}=p_{j,j-1}=1/d,\quad 1\leq j\leq d

until D(t)D(t) hits 00. By the elementary formula for mean hitting times on the cycle (Chapter 5 eq. (24)) (yyy 4/23/96 version), the mean time T(a)T^{(a)} until card aa becomes matched satisfies

ET(a)d2d24ET^{(a)}\leq{\textstyle\frac{d}{2}}\ {\textstyle\frac{d^{2}}{4}}

uniformly over initial configurations. By submultiplicativity (Chapter 2 section 4.3) (yyy 9/10/99 version)

P(T(a)>md3/4)2-m,m=1,2,.P(T^{(a)}>md^{3}/4)\leq 2^{-m},\ m=1,2,\ldots.

The chains couple at time T:=maxaT(a)T:=\max_{a}T^{(a)} and so

d¯(md3/4)P(T>md3/4)d2-m.\bar{d}(md^{3}/4)\leq P(T>md^{3}/4)\leq d2^{-m}.

In particular

τ1disc=O(d3logd).\tau_{1}^{{\rm disc}}=O(d^{3}\log d).

In this example it turns out that coupling does give the correct order of magnitude; the corresponding lower bound

τ1disc=Ω(d3logd)\tau_{1}^{{\rm disc}}=\Omega(d^{3}\log d)

was proved by Wilson [338]. Different generalizations of this example appear in section 12.1.13 and in Chapter 14 section 5 (yyy 3/10/94 version), where we discuss relaxation times.

A generalization of this example, the interchange process, is studied in Chapter 14 section 5 (yyy 3/10/94 version).

12.1.10 Independent sets

Fix a graph GG on nn vertices with maximal degree rr, An independent set is a set of vertices which does not contain any adjacent vertices. Fix mm and consider the space of all independent sets of size mm in GG. Picture an independent set 𝐱{\bf x} as a configuration of mm particles at distinct vertices, with no two particles at adjacent vertices. A natural discrete-time chain (𝐗t)({\bf X}_{t}) on II is

pick a uniform random particle aa and a uniform random vertex vv; move particle aa to vertex vv if feasible, else make no move.

To study mixing times, we can define a coupling (𝐗t,𝐘t)({\bf X}_{t},{\bf Y}_{t}) by simply making the same choice of (a,v)(a,v) in each of the two coupled chains, where at each time we invoke a matching of particles in the two realizations which is arbitrary except for matching particles at the same vertex. To analyze the coupling, let ρ\rho be the natural metric on II: ρ(𝐱,𝐲)=\rho({\bf x},{\bf y})= number of vertices occupied by particles of 𝐱{\bf x} but not by particles of 𝐲{\bf y}. Clearly Dt:=ρ(𝐗t,𝐘t)D_{t}:=\rho({\bf X}_{t},{\bf Y}_{t}) can change by at most 11 on each step. Let us show that, for initial states with ρ(𝐱,𝐲)=d>0\rho({\bf x},{\bf y})=d>0,

P(𝐱,𝐲)(D1=d+1)\displaystyle P_{({\bf x},{\bf y})}(D_{1}=d+1) \displaystyle\leq m-dm2d(r+1)n\displaystyle\frac{m-d}{m}\ \frac{2d(r+1)}{n} (12.11)
P(𝐱,𝐲)(D1=d-1)\displaystyle P_{({\bf x},{\bf y})}(D_{1}=d-1) \displaystyle\geq dmn-(m+d-2)(r+1)n.\displaystyle\frac{d}{m}\ \frac{n-(m+d-2)(r+1)}{n}. (12.12)

For in order that D1=d+1D_{1}=d+1 we must first choose a matched particle aa (chance (m-d)/d(m-d)/d) and then choose a vertex vv which is a neighbor of (or the same as) some vertex vv^{\prime} which is in exactly one of {𝐱,𝐲}\{{\bf x},{\bf y}\}: there are 2d2d such vertices vv^{\prime} and hence at most 2d(r+1)2d(r+1) possibilities for vv. This establishes (12.11). Similarly, in order that Dt=d-1D_{t}=d-1 it is sufficient that we pick an unmatched particle aa (chance d/md/m) and then choose a vertex vv which is not a neighbor of (or the same as) any vertex vv^{\prime} which is occupied in one or both realizations by some particle other than aa: there are m+d-2m+d-2 such forbidden vertices vv^{\prime} and hence at most (m+d-2)(r+1)(m+d-2)(r+1) forbidden positions for vv. This establishes (12.12).

From (12.11,12.12) a brief calculation gives

E(𝐱,𝐲)(D1-d)\displaystyle E_{({\bf x},{\bf y})}(D_{1}-d) \displaystyle\leq -dmn(n-(3m-d-2)(r+1))\displaystyle{\textstyle\frac{-d}{mn}}(n-(3m-d-2)(r+1))
\displaystyle\leq -dmn(n-3(m-1)(r+1)).\displaystyle{\textstyle\frac{-d}{mn}}(n-3(m-1)(r+1)).

In other words

E(𝐱,𝐲)D1κd; κ:=1-n-3(m-1)(r+1)mn.E_{({\bf x},{\bf y})}D_{1}\leq\kappa d;\quad\kappa:=1-\frac{n-3(m-1)(r+1)}{mn}.

If m<1+n3(r+1)m<1+\frac{n}{3(r+1)} then κ<1\kappa<1. In this case, by copying the end of the analysis of the graph-coloring chain (section 12.1.5)

d¯(t)mκt; τ1=O(logm1-κ).\bar{d}(t)\leq m\kappa^{t};\quad\tau_{1}=O\left({\textstyle\frac{\log m}{1-% \kappa}}\right).

To clarify the size-asymptotics, suppose m,nm,n\to\infty with m/nρ<13(r+1)m/n\to\rho<\frac{1}{3(r+1)}. Then for fixed ρ\rho

τ1=O(nlogn).\tau_{1}=O(n\log n).

12.1.11 Two base chains for genetic algorithms

One way of motivating study of Markov chains on combinatorial sets with uniform stationary distributions is as “base chains” on which to base Markov chain Monte Carlo, that is to create other chains designed to have some specified distribution as their stationary distributions. Here is a typical base chain underlying genetic algorithms.

Fix integers K,L1K,L\geq 1 with KK even. A state of the chain is a family of words (𝐱k,1kK)({\bf x}^{k},1\leq k\leq K), where each word is a binary LL-tuple 𝐱k=(xlk,1lL){\bf x}^{k}=(x^{k}_{l},1\leq l\leq L). A step of the chain is defined as follows.

Use a uniform random permutation π\pi of {1,2,,K}\{1,2,\ldots,K\} to partition the words into K/2K/2 pairs {𝐱π(1),𝐱π(2)}, {𝐱π(3),𝐱π(4)}\{{\bf x}^{\pi(1)},{\bf x}^{\pi(2)}\},\quad\{{\bf x}^{\pi(3)},{\bf x}^{\pi(4)}\}\ldots. Create a new pair {𝐲1,𝐲2}\{{\bf y}^{1},{\bf y}^{2}\} from {𝐱π(1),𝐱π(2)}\{{\bf x}^{\pi(1)},{\bf x}^{\pi(2)}\} by setting, independently for each 1lL1\leq l\leq L

P((yl1,yl2)=(xlπ(1),xlπ(2)))=P((yl1,yl2)=(xlπ(2),xlπ(1)))=1/2.P((y^{1}_{l},y^{2}_{l})=(x^{\pi(1)}_{l},x^{\pi(2)}_{l}))=P((y^{1}_{l},y^{2}_{l% })=(x^{\pi(2)}_{l},x^{\pi(1)}_{l}))=1/2. (12.13)

Repeat independently for 1iK/21\leq i\leq K/2 to create new pairs {𝐲2i-1,𝐲2i}\{{\bf y}^{2i-1},{\bf y}^{2i}\} from {𝐱π(2i-1),𝐱π(2i)}\{{\bf x}^{\pi(2i-1)},{\bf x}^{\pi(2i)}\} . The new state is the family of words 𝐲k{\bf y}^{k}.

Associated with an initial state (𝐱k)({\bf x}^{k}) is a vector of column sums 𝐦=(ml,1lL){\bf m}=(m_{l},1\leq l\leq L) where ml=kxlkm_{l}=\sum_{k}x^{k}_{l}. These sums are preserved by the chain, so the proper state space is the space I𝐦I_{\bf m} of families with column-sums 𝐦{\bf m}. The transition matrix is symmetric and so the chain is reversible with uniform stationary distribution on I𝐦I_{\bf m}.

To describe the coupling, first rephrase (12.13) in words as “(yl1,yl2)(y^{1}_{l},y^{2}_{l}) is {xlπ(1),xlπ(2)}\{x^{\pi(1)}_{l},x^{\pi(2)}_{l}\} in random order, either forwards or backwards”. Now specify the coupling as follows.

(i) Use the same random permutation π\pi for both chains.

(ii) For each ii and each ll, in creating the new words (yl2i-1,yl2i)(y^{2i-1}_{l},y^{2i}_{l}) from the old words {xlπ(2i-1),xlπ(2i))}\{x^{\pi({2i-1})}_{l},x^{\pi(2i)}_{l})\} use the same choice (forwards or backwards) in both chains, except when (xlπ(2i-1),xlπ(2i))=(1,0)(x^{\pi(2i-1)}_{l},x^{\pi(2i)}_{l})=(1,0) for one chain and =(0,1)=(0,1) for the other chain, in which case use opposite choices of (forwards, backwards) in the two chains.

To study the coupled processes (𝐗(t),𝐗^(t))({\bf X}(t),\hat{{\bf X}}(t)), fix ll and consider the number W(t):=k=1K|Xlk(t)-X^lk(t)|W(t):=\sum_{k=1}^{K}|X^{k}_{l}(t)-\hat{X}^{k}_{l}(t)| of words in which the ll’th letter is not matched in the two realizations. Suppose W(0)=wW(0)=w. Consider the creation of the first two new words in each chain. The only way that the number of matches changes is when we use opposite choices of (forwards, backwards) in the two chains, in which case two new matches are created. The chance that the ll’th letter in the two chains is 11 and 00 in the π(1)\pi(1)’th word and is 00 and 11 in the π(2)\pi(2)’th word equals w/2K×w/2K-1\frac{w/2}{K}\times\frac{w/2}{K-1}, and so (taking into account the symmetric case) the mean number of new matches at ll in these two words equals w2K(K-1)\frac{w^{2}}{K(K-1)}. Summing over the K/2K/2 pairs,

E(W(1)|W(0)=w)=w-w22(K-1).E(W(1)|W(0)=w)=w-{\textstyle\frac{w^{2}}{2(K-1)}}.

We can now apply a comparison lemma (Chapter 2 Lemma 32) (yyy 9/10/99 version) which concludes that the hitting time TlT^{l} of W(t)W(t) to 00 satisfies

ETlw=2K2(K-1)w22K.ET^{l}\leq\sum_{w=2}^{K}{\textstyle\frac{2(K-1)}{w^{2}}}\leq 2K.

Since T:=maxlTlT:=\max_{l}T^{l} is a coupling time, a now-familiar argument shows that for u=1,2,u=1,2,\ldots

d¯(4uK)P(T>4uK)LP(Tl>u4K)L 2-u\bar{d}(4uK)\leq P(T>4uK)\leq LP(T^{l}>u\cdot 4K)\leq L\ 2^{-u}

and so

τ1disc=O(KlogL).\tau_{1}^{{\rm disc}}=O(K\log L).
Open Problem 12.5

Show τ1disc=O(logK×logL)\tau_{1}^{{\rm disc}}=O(\log K\times\log L).

We expect this bound by analogy with the “random transpositions” shuffle (section 12.1.7). Loosely speaking, the action of the chain on a single position in words is like the random transpositions chain speeded up by a factor K/2K/2, so from (12.8) we expect its mixing time to be Θ(logK)\Theta(\log K). It would be interesting to study this example via the group representation or strong stationary time techniques which have proved successful for the random transpositions chain.

To make a metaphor involving biological genetics, the letters represent chromosomes and the words represent the chromosomal structure of a gamete; the process is “sexual reproduction from the viewpoint of gametes”. If instead we want a word to represent a particular chromosome and the letters to represent genes within that chromosome, then instead of flipping bits independently it is more natural to model crossover. That is, consider a chain in which the rule for creating a new pair {𝐲2i-1,𝐲2i}\{{\bf y}^{2i-1},{\bf y}^{2i}\} from {𝐱π(2i-1),𝐱π(2i)}\{{\bf x}^{\pi(2i-1)},{\bf x}^{\pi(2i)}\} becomes

Take UiU_{i} uniform on {1,2,,L,L+1}\{1,2,\ldots,L,L+1\}. Define

(yl2i-1,yl2i)\displaystyle(y^{2i-1}_{l},y^{2i}_{l}) =\displaystyle= (xlπ(2i-1),xlπ(2i)),  l<U\displaystyle(x^{\pi(2i-1)}_{l},x^{\pi(2i)}_{l}),\ l<U
(yl2i-1,yl2i)\displaystyle(y^{2i-1}_{l},y^{2i}_{l}) =\displaystyle= (xlπ(2i),xlπ(2i-1)),  lU.\displaystyle(x^{\pi(2i)}_{l},x^{\pi(2i-1)}_{l}),\ l\geq U.

As an exercise (hint in Notes), find a coupling argument to show that for this chain

τ1disc=O(KL2).\tau_{1}^{{\rm disc}}=O(KL^{2}). (12.14)

12.1.12 Path coupling

In certain complicated settings it is useful to know that it is enough to couple versions of the chain which start in “nearby” states. To say this carefully, let II be finite and consider a {0,1,2,}\{0,1,2,\ldots\}-valued function ρ(i,j)\rho(i,j) defined on some symmetric subset I×I\mbox{${\cal E}$}\subset I\times I. Call ρ\rho a pre-metric if

(i) ρ(i,j)=0 iff i=j\rho(i,j)=0\mbox{ iff }i=j.

(ii) ρ(i,j)=ρ(j,i)\rho(i,j)=\rho(j,i).

(iii) ρ(i0,ik)u=0k-1ρ(iu,iu+1)\rho(i_{0},i_{k})\leq\sum_{u=0}^{k-1}\rho(i_{u},i_{u+1}), whenever (i0,ik)(i_{0},i_{k}) and each (iu,iu+1)(i_{u},i_{u+1}) are in {\cal E}.

Clearly a pre-metric extends to a metrric ρ¯\bar{\rho} by defining

ρ¯(i,j):=min{uρ(iu,iu+1)}\bar{\rho}(i,j):=\min\left\{\sum_{u}\rho(i_{u},i_{u+1})\right\} (12.15)

the minimum over all paths i=i0,i1,,ik=ji=i_{0},i_{1},\ldots,i_{k}=j with each (iu,iu+1)(i_{u},i_{u+1})\in\mbox{${\cal E}$}. Note ρ¯(i,j)ρ(i,j)\bar{\rho}(i,j)\leq\rho(i,j) for (i,j)(i,j)\in\mbox{${\cal E}$}.

Lemma 12.6

Let SS be a state space. Let (μi,i+1, 0id-1)(\mu_{i,i+1},\ 0\leq i\leq d-1) be probability distributions on S×SS\times S such that the second marginal of μi,i+1\mu_{i,i+1} coincides with the first marginal of μi+1,i+2\mu_{i+1,i+2} for 0id-20\leq i\leq d-2. Then there exists a SS-valued random sequence (Vi,0id)(V_{i},0\leq i\leq d) such that μi,i+1=dist(Vi,Vi+1)\mu_{i,i+1}={\rm dist}(V_{i},V_{i+1}) for 0id-10\leq i\leq d-1.

Proof. Just take (Vi)(V_{i}) to be the non-homogeneous Markov chain whose transition probabilities P(Vi+1|Vi=v)P(V_{i+1}\in\cdot|V_{i}=v) are the conditional probabilities determined by the specified joint distribution μi,i+1\mu_{i,i+1}.

Lemma 12.7 (Path-coupling lemma)

Take a discrete-time Markov chain (Xt)(X_{t}) with finite state space II. Write X1(i)X^{(i)}_{1} for the time-11 value of the chain started at state ii. Let ρ\rho be a pre-metric defined on some subset I×I\mbox{${\cal E}$}\subset I\times I. Suppose that for each pair (i,j)(i,j) in {\cal E} we can construct a joint law (X1(i),X1(j))(X^{(i)}_{1},X^{(j)}_{1}) such that

Eρ¯(X1(i),X1(j))κρ(i,j)E\bar{\rho}(X^{(i)}_{1},X^{(j)}_{1})\leq\kappa\rho(i,j) (12.16)

for some constant 0<κ<10<\kappa<1. Then

d¯(t)Δρκt(κ<1)\bar{d}(t)\leq\Delta_{\rho}\kappa^{t}\quad(\kappa<1) (12.17)

where Δρ:=maxi,jIρ¯(i,j)\Delta_{\rho}:=\max_{i,j\in I}\bar{\rho}(i,j).

See the Notes for comments on the case κ=1\kappa=1.

Proof. Fix states i,ji,j and consider a path (iu)(i_{u}) attaining the minimum in (12.15). For each uu let (X1(iu),X1(iu+1))(X^{(i_{u})}_{1},X^{(i_{u+1})}_{1}) have a joint distribution satisfying (12.16). By Lemma 12.6 there exists a random sequence (X1(i)=X1(i0),X1(i1),,X1(j))(X_{1}^{(i)}=X_{1}^{(i_{0})},X_{1}^{(i_{1})},\ldots,X_{1}^{(j)}) consistent with these bivariate distributions. In particular, there is a joint distribution (X1(i),X1(j))(X^{(i)}_{1},X^{(j)}_{1}) such that

Eρ¯(X1(i),X1(j))Euρ¯(X1(iu),X1(iu+1))κuρ(iu,iu+1)=κρ¯(i,j).E\bar{\rho}(X^{(i)}_{1},X^{(j)}_{1})\leq E\sum_{u}\bar{\rho}(X^{(i_{u})}_{1},X% ^{(i_{u+1})}_{1})\leq\kappa\sum_{u}\rho(i_{u},i_{u+1})=\kappa\bar{\rho}(i,j).

This construction gives one step of a coupling of two copies of the chain started at arbitrary states, and so extends to a coupling ((Xt(i),Xt(j)),  t=0,1,2,)((X^{(i)}_{t},X^{(j)}_{t}),\ t=0,1,2,\ldots) of two copies of the entire processes. The inequality above implies

E(ρ¯(Xt+1(i),Xt+1(j))|Xt(i),Xt(j))κρ¯(Xt(i),Xt(j))E(\bar{\rho}(X^{(i)}_{t+1},X^{(j)}_{t+1})|X^{(i)}_{t},X^{(j)}_{t})\leq\kappa% \bar{\rho}(X^{(i)}_{t},X^{(j)}_{t})

and hence

P(Xt(i)Xt(j))Eρ¯(Xt(i),Xt(j))κtρ¯(i,j)κtΔρP(X^{(i)}_{t}\neq X^{(j)}_{t})\leq E\bar{\rho}(X^{(i)}_{t},X^{(j)}_{t})\leq% \kappa^{t}\bar{\rho}(i,j)\leq\kappa^{t}\Delta_{\rho}

establishing (12.17). \Box

Bubley and Dyer [79] introduced Lemma 12.7 and the name path-coupling. It has proved useful in extending the range of applicability of coupling methods in settings such as graph-coloring (Bubley et al [77] Vigoda [333]) and independent sets (Luby and Vigoda [244]). These are too intricate for presentation here, but the following example will serve to illustrate the use of path-coupling.

12.1.13 Extensions of a partial order

Fix a partial order \preceq on an nn-element set, and let ImI_{m} be the set of linear extensions of \preceq, that is to say total orders consistent with the given partial order. We can define a discrete-time Markov chain on ImI_{m} by re-using the idea in the “random adjacent transpositions” example (section 12.1.9). Let w()w(\cdot) be a probability distribution on {1,2,,n-1}\{1,2,\ldots,n-1\}. Define a step of the chain as follows.

Pick position ii with probability w(i)w(i), and independently pick one of {\{ stay, move }\} with probability 1/21/2 each. If pick “move” then interchange the elements in positions ii and i+1i+1 if feasible (i.e. if consistent with the partial order); else make no change.

The transition matrix is symmetric, so the stationary distribution is uniform on ImI_{m}.

To analyze by coupling, define one step of a bivariate coupled process as follows.

Make the same choice of ii in both chains. Also make the same choice of {\{ move, stay }\}, except in the case where the elements in positions ii and i+1i+1 are the same elements in opposite order in the two realizations, in which case use the opposite choices of {\{ stay, move }\}.

The coupling is similar (but not identical) to that in section 12.1.9, where the underlying chain is that corresponding to the “null” partial order. For a general partial order, the coupling started from an arbitrary pair of states seems hard to analyze directly. For instance, an element in the same position in both realizations at time tt may not remain so at time t+1t+1. Instead we use path-coupling, following an argument of Bubley and Dyer [80]. Call two states 𝐱{\bf x} and 𝐲{\bf y} adjacent if they differ by only one (not necessarily adjacent) transposition; if the transposed cards are in positions i<ji<j then let ρ(𝐱,𝐲)=j-i\rho({\bf x},{\bf y})=j-i. We want to study the increment Φ:=ρ(X1,Y1)-ρ(𝐱,𝐲)\Phi:=\rho(X_{1},Y_{1})-\rho({\bf x},{\bf y}) where (X1,Y1)(X_{1},Y_{1}) is the coupled chain after one step from (𝐱,𝐲)({\bf x},{\bf y}). The diagram shows a typical pair of adjacent states.

abcαdefβghabcβdefαghpositionij\begin{array}[]{ccccccccccc}&a&b&c&\alpha&d&e&f&\beta&g&h\\ &a&b&c&\beta&d&e&f&\alpha&g&h\\ \mbox{position}&\cdot&\cdot&\cdot&i&\cdot&\cdot&\cdot&j&\cdot&\cdot\end{array}

Observe first that any choice of position other than i-1,i,j-1,ji-1,i,j-1,j will have no effect on Φ\Phi. If position ii and “move” are chosen, then {α,d}\{\alpha,d\} are interchanged in the first chain and {β,d}\{\beta,d\} in the second; both lead to feasible configurations by examining the relative orders in the other chain’s previous configuration. This has chance w(i)/2w(i)/2 and leads to Φ=-1\Phi=-1. If position i-1i-1 and “move” are chosen (chance w(i-1)/2w(i-1)/2), then if either or both moves are feasible Φ=1\Phi=1, while if neither are feasible then Φ=0\Phi=0. Arguing similarly for choices j-1,jj-1,j leads to

E(Φ)12(w(i-1)-w(i)-w(j-1)+w(j)).E(\Phi)\leq{\textstyle\frac{1}{2}}(w(i-1)-w(i)-w(j-1)+w(j)).

This estimate remains true if j=i+1j=i+1 because in that case choosing position ii (chance w(i)w(i)) always creates a match. Now specify

w(i):=i(n-i)wn, wn:=j=1n-1j(n-j)w(i):={\textstyle\frac{i(n-i)}{w_{n}}},\quad w_{n}:=\sum_{j=1}^{n-1}j(n-j)

and then EΦ-j-iwnE\Phi\leq-\frac{j-i}{w_{n}}. This leads to

E(𝐱,𝐲)ρ(X1,Y1)(1-1wn)ρ(𝐱,𝐲)E_{({\bf x},{\bf y})}\rho(X_{1},Y_{1})\leq\left(1-{\textstyle\frac{1}{w_{n}}}% \right)\rho({\bf x},{\bf y})

for adjacent (𝐱,𝐲)({\bf x},{\bf y}). We are thus in the setting of Lemma 12.7, which shows

d¯(t)Δnexp(-t/wn).\bar{d}(t)\leq\Delta_{n}\exp(-t/w_{n}).

Since Δn=O(n2)\Delta_{n}=O(n^{2}) and wnn3/6w_{n}\sim n^{3}/6 we obtain

τ1disc=(13+o(1))n3logn.\tau_{1}^{{\rm disc}}=({\textstyle\frac{1}{3}}+o(1))\ n^{3}\log n.