If $X$ and $Y$ are random variables with Binomial $(n,p_{1})$ and $(n,p_{2})$ distributions respectively, and if $p_{1}\leq p_{2}$, then it is intuitively obvious that

$P(X\geq x)\leq P(Y\geq x)\mbox{ for all }x.$ | (14.1) |

One could verify this from the exact formulas, but there is a more elegant non-computational proof. For $1\leq i\leq n$ define events $(A_{i},B_{i},C_{i})$, independent as $i$ varies, with $P(A_{i})=p_{1},P(B_{i})=p_{2}-p_{1},P(C_{i})=1-p_{2}$. And define

$\displaystyle X^{\prime}=\sum_{i}1_{A_{i}}$ | $\displaystyle=$ | number of A’s which occur | ||

$\displaystyle Y^{\prime}=\sum_{i}1_{A_{i}\cup B_{i}}$ | $\displaystyle=$ | number of A’s and B’s which occur. |

Then $X^{\prime}\leq Y^{\prime}$, so (14.1) holds for $X^{\prime}$ and $Y^{\prime}$, but then because $X^{\prime}\ \stackrel{d}{=}\ X$ and $Y^{\prime}\ \stackrel{d}{=}\ Y$ we have proved that (14.1) holds for $X$ and $Y$. This is the prototype of a coupling argument, which (in its wide sense) means

to prove some distributional inequality relating two random processes $X,Y$ by constructing versions $X^{\prime},Y^{\prime}$ which satisfy some sample path inequality.

Our first “process” example is a somewhat analogous proof of part (a) of the following result, which abstracts slightly a result stated for random walk on distance-regular graphs (Chapter 7 Proposition yyy).

Let $(X_{t})$ be an irreducible continuous-time birth-and-death chain on states $\{0,1,\ldots,\Delta\}$.

(a) $i$

(b) $t$

Proof. Fix $i_{1}\leq i_{2}$. Suppose we can construct processes $Y_{t}$ and $Z_{t}$, distributed as the given chain started at $i_{1}$ and $i_{2}$ respectively, such that

$Y_{t}\leq Z_{t}\mbox{ for all }t.$ | (14.2) |

Then

$P_{i_{1}}(X_{t}=0)=P(Y_{t}=0)\geq P(Z_{t}=0)=P_{i_{2}}(X_{t}=0).$ |

But by reversibility

$P_{i_{1}}(X_{t}=0)=\frac{\pi_{0}}{\pi_{i_{1}}}\ P_{0}(X_{t}=i_{1})$ |

and similarly for $i_{2}$, establishing (a).

Existence of processes satisfying (14.2) is a consequence of the Doeblin coupling discussed below. The proof of part (b) involves a different technique and is deferred to section 14.1.3.

Consider a finite-state chain in discrete or continuous time. Fix states $i,j$. Suppose we construct a joint process $(X^{(i)}_{t},X^{(j)}_{t};t\geq 0)$ such that

$(X^{(i)}_{t},t\geq 0)$ is distributed as the chain started at $i$ | |||

$(X^{(j)}_{t},t\geq 0)$ | (14.3) |

And suppose there is a random time $T\leq\infty$ such that

$X^{(i)}_{t}=X^{(j)}_{t},\ T\leq t<\infty.$ | (14.4) |

Call such a $T$ a coupling time. Then the coupling inequality is

$||P_{i}(X_{t}\in\cdot)-P_{j}(X_{t}\in\cdot)||\leq P(T>t),\ 0\leq t<\infty.$ | (14.5) |

The inequality is clear once we observe $P(X_{t}^{(i)}\in\cdot,T\leq t)=P(X_{t}^{(j)}\in\cdot,T\leq t)$. The coupling inequality provides a method of bounding the variation distance $\bar{d}(t)$ of Chapter 2 section yyy.

The most common strategy for constructing a coupling satisfying (14.3) is via Markov couplings, as follows. Suppose the underlying chain has state space $I$ and (to take the continuous-time case) transition rate matrix ${\bf Q}=(q(i,j))$. Consider a transition rate matrix $\tilde{{\bf Q}}$ on the product space $I\times I$. Write the entries of $\tilde{{\bf Q}}$ as $\tilde{q}(i,j;k,l)$ instead of the logical-but-fussy $\tilde{q}((i,j),(k,l))$. Suppose that, for each pair $(i,j)$ with $j\neq i$,

$\tilde{q}(i,j;\cdot,\cdot)\mbox{ has marginals }q(i,\cdot)\mbox{ and }q(j,\cdot)$ | (14.6) |

in other words $\sum_{l}\tilde{q}(i,j;k,l)=q(i,k)$ and $\sum_{k}\tilde{q}(i,j;k,l)=q(j,l)$. And suppose that

$\displaystyle\tilde{q}(i,i;k,k)$ | $\displaystyle=$ | $\displaystyle q(i,k)\mbox{ for all }k$ | ||

$\displaystyle\tilde{q}(i,i;k,l)$ | $\displaystyle=$ | $\displaystyle 0\mbox{ for }l\neq k.$ |

Take $(X^{(i)}_{t},X^{(j)}_{t})$ to be the chain on $I\times I$ with transition rate matrix $\tilde{{\bf Q}}$ and initial position $(i,j)$, Then (14.3) must hold, and $T\equiv\min\{t:X^{(i)}_{t}=X^{(j)}_{t}\}$ is a coupling time. This construction gives a Markov coupling, and all the examples where we use the coupling inequality will be of this form. In practice it is much more understandable to define the joint process in words

xxx red and black particles.

A particular choice of $\tilde{{\bf Q}}$ is

$\tilde{q}(i,j;k,l)=q(i,k)q(j,l),\ j\neq i$ | (14.7) |

in which case the joint process is called to Doeblin coupling. In words, the Doeblin coupling consists of starting one particle at $i$ and the other particle at $j$, and letting the two particles move independently until they meet, at time $M_{i,j}$ say, and thereafter letting them stick together. In the particular case of a birth-and-death process, the particles cannot cross without meeting (in continuous time), and so if $i<j$ then $X^{(i)}_{t}\leq X^{(j)}_{t}$ for all $t$, the property we used at (14.2).

Use of the coupling inequality has nothing to do with reversibility. In fact it finds more use in the irreversible setting, where fewer alternative methods are available for quantifying convergence to stationarity. In the reversible setting, coupling provides a quick way to get bounds which usually (but not always) can be improved by other methods. Here are two examples we have seen before.

Random walk on the $d$-cube (Chapter 5 Example yyy).

For ${\bf i}=(i_{1},\ldots,i_{d})$ and ${\bf j}=(j_{1},\ldots,j_{d})$ in $I=\{0,1\}^{d}$, let $D({\bf i},{\bf j})$ be the set of coordinates $u$ where ${\bf i}$ and ${\bf j}$ differ. Write ${\bf i}^{u}$ for the state obtained by changing the $i$’th coordinate of ${\bf i}$. Recall that in continuous time the components move independently as $2$-state chains with transition rates $1/d$. In words, the coupling is “run unmatched coordinates independently until they match, and then run them together”. Formally, the non-zero transitions of the joint process are

$\displaystyle\tilde{q}({\bf i},{\bf j};{\bf i}^{u},{\bf j}^{u})$ | $\displaystyle=$ | $\displaystyle 1/d\mbox{ if }i_{u}=j_{u}$ | ||

$\displaystyle\tilde{q}({\bf i},{\bf j};{\bf i}^{u},{\bf j})$ | $\displaystyle=$ | $\displaystyle 1/d\mbox{ if }i_{u}\neq j_{u}$ | ||

$\displaystyle\tilde{q}({\bf i},{\bf j};{\bf i},{\bf j}^{u})$ | $\displaystyle=$ | $\displaystyle 1/d\mbox{ if }i_{u}\neq j_{u}.$ |

For each coordinate which is initially unmatched, it takes exponential (rate $2/d$) time until it is matched, and so the coupling time $T$ satisfies

$T\ \stackrel{d}{=}\ \max(\xi_{1},\ldots,\xi_{d_{0}})$ |

where the $(\xi_{u})$ are independent exponential (rate $2/d$) and $d_{0}=d({\bf i},{\bf j})$ is the initial number of unmatched coordinates. So

$P(T\leq t)=(1-\exp(-2t/d))^{d_{0}}$ |

and the coupling inequality bounds variation distance as

$\bar{d}(t)\leq(1-\exp(-2t/d))^{d}.$ |

This leads to an upper bound on the variation threshold time

$\tau_{1}\leq(\frac{1}{2}+o(1))d\log d\mbox{ as }d\rightarrow\infty.$ |

In this example we saw in Chapter 5 that in fact

$\tau_{1}\sim\frac{1}{4}d\log d\mbox{ as }d\rightarrow\infty$ |

so the coupling bound is off by a factor of $2$.

Random walk on a dense regular graph (Chapter 5 Example yyy).

Consider a $r$-regular $n$-vertex graph. Write $\mbox{${\cal N}$}(v)$ for the set of neighbors of $v$. For any pair $v,w$ we can define a $1-1$ map $\theta_{v,w}:\mbox{${\cal N}$}(v)\rightarrow\mbox{${\cal N}$}(w)$ such that $\theta_{v,w}(x)=x$ for $x\in\mbox{${\cal N}$}(v)\cap\mbox{${\cal N}$}(w)$. We can now define a “greedy coupling” by

$\tilde{q}(v,w;x,\theta_{v,w}(x)=1/r,\ \ x\in\mbox{${\cal N}$}(v).$ |

In general one cannot get useful bounds on the coupling time $T$. But consider the dense case, where $r>n/2$. As observed in Chapter 5 Example yyy, here $|\mbox{${\cal N}$}(v)\cap\mbox{${\cal N}$}(w)|\geq 2r-n$ and so the coupled processes $(X_{t},Y_{t})$ have the property that for $w\neq v$

$P(X_{t+dt}=Y_{t+dt}|X_{t}=v,Y_{t}=w)=\frac{|\mbox{${\cal N}$}(v)\cap\mbox{${% \cal N}$}(w)|}{r}dt\geq\frac{2r-n}{r}dt$ |

implying that $T$ satisfies

$P(T>t)\leq\exp(-(2r-n)t/r).$ |

So the coupling inequality implies $\bar{d}(t)\leq\exp(-(2r-n)t/r)$, and in particular the variation threshold satisfies

$\tau_{1}\leq\frac{r}{2r-n}.$ |

We now give two examples of coupling in the wide sense, to compare different processes. The first is a technical result (inequality (14.8) below) which we needed in Chapter 6 yyy. The second is the proof of Proposition 14.1(b).

Exit times for constrained random walk.

Let $(X_{t})$ be discrete-time random walk on a graph $G$, let $A$ be a subset of the vertices of $G$ and let $(Y_{t})$ be random walk on the subgraph induced by $A$. Given $B\subset A$, let $S$ be the first hitting time of $(Y_{t})$ on $B$, and let $T$ be the first hitting time of $(X_{t})$ on $B\cup A^{c}$. Then

$E_{i}T\leq E_{i}S,\ i\in A.$ | (14.8) |

This is “obvious”, and the reason it’s obvious is by coupling. We can construct coupled processes $(X^{\prime},Y^{\prime})$ with the property that, if both particles are at the same position $a$ in $A$, and if $X$ jumps to another state $b$ in $A$, then $Y$ jumps to the same state $b$. This property immediately implies that, for the coupled processes started at the same state in $A$, we have $T^{\prime}\leq S^{\prime}$ and hence (14.8).

In words, here is the coupling $(X^{\prime},Y^{\prime})$. When the particles are at different positions they jump independently. When they are at the same position, first let $X^{\prime}$ jump; if $X^{\prime}$ jumps to a vertex in $A$ let $Y^{\prime}$ jump to the same vertex, and otherwise let $Y^{\prime}$ jump to a uniform random neighbor in $A$. Formally, the coupled process moves according to the transition matrix $\tilde{{\bf P}}$ on $G\times A$ defined by

$x\not\in A$ |

$\tilde{p}(a,a;b,b)=p_{G}(a,b),\ b\in A$ |

$\tilde{p}(a,a;y,b)=p_{G}(a,y)p_{A}(a,b),\ b\in A,y\in A^{c}$ |

where $p_{A}$ and $p_{G}$ refer to transition probabilities for the original random walks on $A$ and $G$.

Proof of Proposition 14.1(b). Fix $i\geq 1$. By reversibility it is sufficient to prove

$t$ |

Consider the Doeblin coupling $(X^{(0)}_{t},X^{(i)}_{t})$ of the processes started at $0$ and at $i$, with coupling time $T$. Since $X^{(0)}_{t}\leq X^{(i)}_{t}$ we have

$P(X^{(i)}_{t}=0)=P(X^{(0)}_{t}=0,T\leq t)$ |

and so we have to prove

$t$ |

It suffices to show that, for $t_{1}>t$,

$P(T\leq t|X^{(0)}_{t_{1}}=0)\geq P(T\leq t|X^{(0)}_{t}=0)$ |

and thus, by considering the conditional distribution of $X^{(0)}_{t}$ given $X^{(0)}_{t_{1}}=0$, it suffices to show that

$P(T\leq t|X^{(0)}_{t}=j)\geq P(T\leq t|X^{(0)}_{t}=0)$ | (14.9) |

for $j\geq 0$. So fix $j$ and $t$. Write $(X^{(0,j)}_{s},0\leq s\leq t)$ for the process conditioned on $X_{0}=0,X_{t}=j$. By considering time running backwards from $t$ to $0$, the processes $X^{(0,0)}$ and $X^{(0,j)}$ are the same non-homogeneous Markov chain started at the different states $0$ and $j$, and we can use the Doeblin coupling in this non-homogeneous setting to construct versions of these processes with

$X^{(0,0)}_{s}\leq X^{(0,j)}_{s},\ 0\leq s\leq t.$ |

Now introduce an independent copy of the original process, started at time $0$ in state $i$. If this process meets $X^{(0,0)}$ before time $t$ then it must also meet $X^{(0,j)}$ before time $t$, establishing (14.9).

Generated on Mon Jun 2 14:23:48 2014 by LaTeXML