9 A Second Look at General Markov Chains (April 21, 1995)

9.1 Minimal constructions and mixing times

Chapter 4 Theorem yyy involved three mixing time parameters; τ1\tau_{1} related to variation distance to stationarity, τ1(1)\tau_{1}^{(1)} related to “separation” from stationarity, and τ1(2)\tau_{1}^{(2)} related to stationary times (see below). In Chapter 4 these parameters were defined under worst-case initial distributions, and our focus was on “equivalence” of these parameters for reversible chains. Here we discuss underlying “exact” results. Fix an initial distribution μ\mu. Then associated with each notion of mixing, there is a corresponding construction of a minimal random time TT, stated in Theorems 9.1 - 9.3 below.

xxx randomized stopping times

Call a stopping time TT a strong stationary time if

Pμ(Xt=j,T=t)=πjPμ(T=t) for all j,tP_{\mu}(X_{t}=j,T=t)=\pi_{j}P_{\mu}(T=t)\mbox{ for all }j,t (9.1)

i.e. if XTX_{T} has distribution π\pi and is independent of TT. Call a stopping time TT a stationary time if

Pμ(XT=j)=πj for all j.P_{\mu}(X_{T}=j)=\pi_{j}\mbox{ for all }j. (9.2)

Call a random time TT a coupling time if we can construct a joint distribution ((Xt,Yt);t0)((X_{t},Y_{t});t\geq 0) such that (Xt)(X_{t}) is the chain with initial distribution μ\mu, (Yt)(Y_{t}) is the stationary chain, and Xt=Yt,tTX_{t}=Y_{t},t\geq T. (A coupling time need not be a stopping time, even w.r.t. the joint process; this is the almost the only instance of a random time which is not a stopping time that we encounter in this book.)

Recall from yyy the notion of separation of θ\theta from π\pi:

sep(θ)min{u:θj(1-u)πjj}.\mbox{sep}(\theta)\equiv\min\{u:\theta_{j}\geq(1-u)\pi_{j}\ \forall j\}.

Write sepμ(t)\mbox{sep}_{\mu}(t) for the separation at time tt when the initial distribution was μ\mu:

sepμ(t)=min{u:Pμ(Xt=j)(1-u)πjj}.\mbox{sep}_{\mu}(t)=\min\{u:P_{\mu}(X_{t}=j)\geq(1-u)\pi_{j}\ \forall j\}.

Similarly write vdμ(t)\mbox{vd}_{\mu}(t) for the variation distance from stationarity at time tt:

vdμ(t)=12j|Pμ(Xt=j)-πj|.\mbox{vd}_{\mu}(t)=\frac{1}{2}\sum_{j}|P_{\mu}(X_{t}=j)-\pi_{j}|.
Theorem 9.1

Let TT be any strong stationary time for the μ\mu-chain. Then

sepμ(t)Pμ(T>t) for all t0.{\rm sep}_{\mu}(t)\leq P_{\mu}(T>t)\mbox{ for all }t\geq 0. (9.3)

Moreover there exists a minimal strong stationary time TT for which

sepμ(t)=Pμ(T>t) for all t0.{\rm sep}_{\mu}(t)=P_{\mu}(T>t)\mbox{ for all }t\geq 0. (9.4)
Theorem 9.2

For any coupling time TT,

vdμ(t)Pμ(T>t) for all t0.{\rm vd}_{\mu}(t)\leq P_{\mu}(T>t)\mbox{ for all }t\geq 0.

Moreover there exists a minimal coupling time TT for which

vdμ(t)=Pμ(T>t) for all t0.{\rm vd}_{\mu}(t)=P_{\mu}(T>t)\mbox{ for all }t\geq 0.
Theorem 9.3

For any stationary time TT,

EμTmaxj(EμTj-EπTj).E_{\mu}T\geq\max_{j}(E_{\mu}T_{j}-E_{\pi}T_{j}). (9.5)

Moreover there exist mean-minimal stationary times TT for which

EμT=maxj(EμTj-EπTj).E_{\mu}T=\max_{j}(E_{\mu}T_{j}-E_{\pi}T_{j}). (9.6)

In each case, the first assertion is immediate from the definitions, and the issue is to carry out a construction of the required TT. Despite the similar appearance of the results, attempts to place them all in a common framework have not been fruitful. We will prove Theorems 9.1 and 9.3 below, and illustrate with examples. These two proofs involve only rather simple “greedy” constructions. We won’t give the proof of Theorem 9.2 (the construction is usually called the maximal coupling: see Lindvall [234]) because the construction is a little more elaborate and the existence of the minimal coupling time is seldom useful, but on the other hand the coupling inequality in Theorem 9.2 will be used extensively in Chapter 14. In the context of Theorems 9.1 and 9.2 the minimal times TT are clearly unique in distribution, but in Theorem 9.3 there will generically be many mean-minimal stationary times TT with different distributions.

9.1.1 Strong stationary times

For any stopping time TT, define

θj(t)=Pμ(Xt=j,Tt),σj(t)=Pμ(Xt=j,T=t).\theta_{j}(t)=P_{\mu}(X_{t}=j,T\geq t),\ \ \ \ \sigma_{j}(t)=P_{\mu}(X_{t}=j,T% =t). (9.7)

Clearly these vectors satisfy

0σ(t)θ(t), (θ(t)-σ(t))𝐏=θ(t+1)t; θ0=μ.0\leq\sigma(t)\leq\theta(t),\ \ (\theta(t)-\sigma(t)){\bf P}=\theta(t+1)\ % \forall t;\ \ \theta_{0}=\mu. (9.8)

Conversely, given (θ(t),σ(t);t0)(\theta(t),\sigma(t);t\geq 0) satisfying (9.8), we can construct a randomized stopping time TT satisfying (9.7) by declaring that P(T=t|Xt=j,Tt,Xs,s<t)=σj(t)/θj(t)P(T=t|X_{t}=j,T\geq t,X_{s},s<t)=\sigma_{j}(t)/\theta_{j}(t). The proofs of Theorems 9.1 and 9.3 use different definitions of vectors satisfying (9.8).

Proof of Theorem 9.1. A particular sequence (θ(t),σ(t);t0)(\theta(t),\sigma(t);t\geq 0) can be specified inductively by (9.8) and

σ(t)=rtπ,  where rt=minjθj(t)/πj.\sigma(t)=r_{t}\pi\ ,\ \ \mbox{ where }r_{t}=\min_{j}\theta_{j}(t)/\pi_{j}. (9.9)

The associated stopping time satisfies

Pμ(Xt=j,T=t)=σj(t)=rtπjP_{\mu}(X_{t}=j,T=t)=\sigma_{j}(t)=r_{t}\pi_{j}

and so is a strong stationary time with Pμ(T=t)=rtP_{\mu}(T=t)=r_{t}. One can now verify inductively that

Pμ(Xt)=θ(t)+Pμ(Tt-1)piP_{\mu}(X_{t}\in\cdot)=\theta(t)+P_{\mu}(T\leq t-1)\ \cdot pi

and so the separation is

sepμ(t)=1-minjPμ(Xt=j)πj=Pμ(Tt)-rt=Pμ(T>t).{\rm sep}_{\mu}(t)=1-\min_{j}\frac{P_{\mu}(X_{t}=j)}{\pi_{j}}=P_{\mu}(T\geq t)% -r_{t}=P_{\mu}(T>t).

9.1.2 Stopping times attaining a specified distribution

For comparison with the other two results, we stated Theorem 9.3 in terms of stopping times at which the stationary distribution is attained, but the underlying result (amplified as Theorem 9.4) holds for an arbitrary target distribution ρ\rho. So fix ρ\rho as well as the initial distribution μ\mu. Call a stopping time TT admissible if Pμ(XT)=ρP_{\mu}(X_{T}\in\cdot)=\rho. Write t¯(μ,σ)\bar{t}(\mu,\sigma) for the inf of EμTE_{\mu}T over all admissible stopping times TT.

Theorem 9.4

(a) t¯(μ,σ)=maxj(EμTj-EρTj)\bar{t}(\mu,\sigma)=\max_{j}(E_{\mu}T_{j}-E_{\rho}T_{j}).

(b) The “filling scheme” below defines an admissible stopping time such that EμT=t¯(μ,σ)E_{\mu}T=\bar{t}(\mu,\sigma).

(c) Any admissible stopping time TT with the property

k such that Pμ(TTk)=1.\exists\ k\mbox{ such that }P_{\mu}(T\leq T_{k})=1. (9.10)

satisfies EμT=t¯(μ,σ)E_{\mu}T=\bar{t}(\mu,\sigma).

Part (c) is rather remarkable, and can be rephrased as follows. Call a state kk with property (9.10) a halting state for the stopping time TT. In words, the chain must stop if and when it hits a halting state. Then part (c) asserts that, to verify that an admissible time TT attains the minimum t¯(μ,ρ)\bar{t}(\mu,\rho), it suffices to show that there exists some halting state. In the next section we shall see this is very useful in simple examples.

Proof. The greedy construction used here is called a filling scheme. Recall from (9.7) the definitions

θj(t)=Pμ(Xt=j,Tt),σj(t)=Pμ(Xt=j,T=t).\theta_{j}(t)=P_{\mu}(X_{t}=j,T\geq t),\ \ \ \ \sigma_{j}(t)=P_{\mu}(X_{t}=j,T% =t).

Write also Σj(t)=Pμ(XT=j,Tt)\Sigma_{j}(t)=P_{\mu}(X_{T}=j,T\leq t). We now define (θ(t),σ(t);t0)(\theta(t),\sigma(t);t\geq 0) and the associated stopping time T¯\bar{T} inductively via (9.8) and

σj(t)\displaystyle\sigma_{j}(t) =\displaystyle= 0 if Σj(t-1)=ρj\displaystyle 0\mbox{ if }\Sigma_{j}(t-1)=\rho_{j}
=\displaystyle= θt if Σj(t-1)+θj(t)ρj\displaystyle\theta_{t}\mbox{ if }\Sigma_{j}(t-1)+\theta_{j}(t)\leq\rho_{j}
=\displaystyle= ρj-Σj(t-1) otherwise.\displaystyle\rho_{j}-\Sigma_{j}(t-1)\mbox{ otherwise. }

In words, we stop at the current state (jj, say) provided our “quota” ρj\rho_{j} for the chance of stopping at jj has not yet been filled. Clearly

Σj(t)ρjjt.\Sigma_{j}(t)\leq\rho_{j}\ \forall j\ \forall t. (9.11)

We now claim that T¯\bar{T} satisfies property (9.10). To see this, consider

tjmin{t:Σj(t)=ρj}.t_{j}\equiv\min\{t:\Sigma_{j}(t)=\rho_{j}\}\leq\infty.

Then (9.10) holds by construction for any kk such that tk=maxjtjt_{k}=\max_{j}t_{j}\leq\infty. In particular, T¯Tk<\bar{T}\leq T_{k}<\infty a.s. and then by (9.11) Pμ(XT¯)=limtΣ(t)=ρP_{\mu}(X_{\bar{T}}\in\cdot)=\lim{t\rightarrow\infty}\Sigma(t)=\rho. So T¯\bar{T} is an admissible stopping time.

Remark. Generically we expect tj=t_{j}=\infty for exactly one state jj, though other possibilities may occur, e.g. in the presence of symmetry.

Now consider an arbitrary admissible stopping time TT, and consider the associated occupation measure 𝐱=(xj){\bf x}=(x_{j}):

xjEμ(number of visits to jj during times 0,1,,T-10,1,\ldots,T-1).x_{j}\equiv E_{\mu}(\mbox{number of visits to $j$ during times $0,1,\ldots,T-1% $}).

We shall show

xj+ρj=μj+ixipijj.x_{j}+\rho_{j}=\mu_{j}+\sum_{i}x_{i}p_{ij}\ \forall j. (9.12)

Indeed, by counting the number of visits during 0,1,,T-1,T0,1,\ldots,T-1,T in two ways,

xj+ρj=μj+Eμ(number of visits to jj during 1,2,,T1,2,\ldots,T).x_{j}+\rho_{j}=\mu_{j}+E_{\mu}(\mbox{number of visits to $j$ during $1,2,% \ldots,T$}).

Chapter 2 Lemma yyy showed the (intuitively obvious) fact

xipij=Eμ( number of transitions iji\rightarrow j starting before time TT).x_{i}p_{ij}=E_{\mu}(\mbox{ number of transitions $i\rightarrow j$ starting % before time $T$}).

So summing over ii,

ixipij=Eμ(number of visits to jj during 1,2,,T1,2,\ldots,T)\sum_{i}x_{i}p_{ij}=E_{\mu}(\mbox{number of visits to $j$ during $1,2,\ldots,T% $})

and (9.12) follows.

Write 𝐱¯\bar{{\bf x}} for the occupation measure associated with the stopping time T¯\bar{T} produced by the filling scheme. By (9.10), minkx¯k=0\min_{k}\bar{x}_{k}=0. If 𝐱{\bf x} and 𝐱{\bf x}^{\prime} are solutions of (9.12) then the difference 𝐝=𝐱-𝐱{\bf d}={\bf x}-{\bf x}^{\prime} satisfies 𝐝=𝐝𝐏{\bf d}={\bf d}{\bf P} and so is a multiple of the stationary distribution π\pi. In particular, if 𝐱{\bf x} is the occupation measure for some arbitrary admissible time TT, then

𝐱𝐱¯, with equality iff minkxk=0.{\bf x}\geq\bar{{\bf x}},\mbox{ with equality iff }\min_{k}x_{k}=0.

Since EμT=ixiE_{\mu}T=\sum_{i}x_{i}, we have established parts (b) and (c) of the theorem, and

t¯(μ,σ)=ix¯i.\bar{t}(\mu,\sigma)=\sum_{i}\bar{x}_{i}.

To prove (a), choose a state kk such that x¯k=0\bar{x}_{k}=0, that is such that T¯Tk\bar{T}\leq T_{k}. Then EμTk=EμT¯+EρTkE_{\mu}T_{k}=E_{\mu}\bar{T}+E_{\rho}T_{k} and hence t¯(μ,σ)maxj(EμTj-EρTj)\bar{t}(\mu,\sigma)\leq\max_{j}(E_{\mu}T_{j}-E_{\rho}T_{j}). But for any admissible stopping time TT and any state jj

EμTjEμT+EρTjE_{\mu}T_{j}\leq E_{\mu}T+E_{\rho}T_{j}

giving the reverse inequality t¯(μ,σ)maxj(EμTj-EρTj)\bar{t}(\mu,\sigma)\geq\max_{j}(E_{\mu}T_{j}-E_{\rho}T_{j}). \Box

Corollary 9.5

The minimal strong stationary time has mean t¯(μ,π)\bar{t}(\mu,\pi), i.e. is mean-minimal amongst all not-necessarily-strong stationary times, iff there exists a state kk such that

Pμ(Xt=k)/πk=minjPμ(Xt=j)/πjt.P_{\mu}(X_{t}=k)/\pi_{k}=\min_{j}P_{\mu}(X_{t}=j)/\pi_{j}\ \forall t.

Proof. From the construction of the minimal strong stationary time, this is the condition for kk to be a halting state.

9.1.3 Examples

Example 9.6

Patterns in coin-tossing.

Recall Chapter 2 Example yyy: (Xt)(X_{t}) is the chain on the set {H,T}n\{H,T\}^{n} of nn-tuples i=(i1,,in)i=(i_{1},\ldots,i_{n}). Start at some arbitrary initial state j=(j1,,jn)j=(j_{1},\ldots,j_{n}). Here the deterministic stopping time “T=nT=n” is a strong stationary time. Now a state k=(k1,,kn)k=(k_{1},\ldots,k_{n}) will be a halting state provided it does not overlap jj, that is provided there is no 1un1\leq u\leq n such that (ju,,jn)=(k1,,kn+u-1)(j_{u},\ldots,j_{n})=(k_{1},\ldots,k_{n+u-1}). But the number of overlapping states is at most 1=2+22++2n-1=2n-11=2+2^{2}+\ldots+2^{n-1}=2^{n}-1, so there exists a non-overlapping state, i.e. a halting state. So ETET attains the minimum (=n=n) of t¯(j,π)\bar{t}(j,\pi) over all stationary times (and not just over all strong stationary times).

Example 9.7

Top-to-random card shuffling.

Consider the following scheme for shuffling an nn-card deck: the top card is removed, and inserted in one of the nn possible positions, chosen uniformly at random. Start in some arbitrary order. Let TT be the first time that the card which was originally second-from-bottom has reached the top of the deck. Then it is not hard to show (Diaconis [123] p. 177) that T+1T+1 is a strong stationary time. Now any configuration in which the originally-bottom card is the top card will be a halting state, and so T+1T+1 is mean-minimal over all stationary times. Here E(T+1)=1+m=2n-1nm=n(hn-1)E(T+1)=1+\sum_{m=2}^{n-1}\frac{n}{m}=n(h_{n}-1).

Example 9.8

The winning streak chain.

In a series of games which you win or lose independently with chance 0<c<10<c<1, let X^t\hat{X}_{t} be your current “winning streak”, i.e. the number of games won since your last loss. For fixed nn, the truncated process Xt=min(Xt,n-1)X_{t}=\min(X_{t},n-1) is the Markov chain on states {0,1,2,,n-1}\{0,1,2,\ldots,n-1\} with transition probabilities

p(i,0)=1-c, p(i,min(i+1,n-1))=c; 0in-1p(i,0)=1-c,\ \ p(i,\min(i+1,n-1))=c;\ 0\leq i\leq n-1

and stationary distribution

πi=(1-c)ci, 0in-2;   πn-1=cn-1.\pi_{i}=(1-c)c^{i},\ 0\leq i\leq n-2;\ \ \ \pi_{n-1}=c^{n-1}.

We present this chain, started at 00, as an example where it is easy to see there are different mean-minimal stationary times TT. We’ll leave the simplest construction until last – can you guess it now? First consider TJT_{J}, where JJ has the stationary distribution. This is a stationary time, and n-1n-1 is a halting state, so it is mean-minimal. Now it is easy to show

E0Tj=1(1-c)cj-11-c,1jn-1.E_{0}T_{j}=\frac{1}{(1-c)c^{j}}\ -\frac{1}{1-c},1\leq j\leq n-1.

(Slick proof: in the not-truncated chain, Chapter 2 Lemma yyy says

1=Ej( number of visits to jj before T0T_{0})=πj(EjT0+E0Tj)=πj(1/(1-c)+E0Tj)1=E_{j}(\mbox{ number of visits to $j$ before $T_{0}$})=\pi_{j}(E_{j}T_{0}+E_{% 0}T_{j})=\pi_{j}(1/(1-c)\ +E_{0}T_{j}).) So

t¯(0,π)=E0TJ=j1πjE0Tj=n-2+πn-1(1-c)cn-1-1-π01-c=n-1.\bar{t}(0,\pi)=E_{0}T_{J}=\sum_{j\geq 1}\pi_{j}E_{0}T_{j}=n-2+\frac{\pi_{n-1}}% {(1-c)c^{n-1}}-\frac{1-\pi_{0}}{1-c}=n-1.

Here is another stopping time TT which is easily checked to attain the stationary distribution, for the chain started at 00. With chance 1-c1-c stop at time 00. Otherwise, run the chain until either hitting n-1n-1 (in which case, stop) or returning to 00. In the latter case, the return to 00 occurs as a transition to 00 from some state M0M\geq 0. Continue until first hitting M+1M+1, then stop. Again n-1n-1 is a halting state, so this stationary time also is mean-minimal. Of course, the simplest construction is the deterministic time T=n-1T=n-1. This is a strong stationary time (the winning streak chain is a function of the patterns in coin tossing chain), and again n-1n-1 is clearly a halting state. Thus t¯(0,π)=n-1\bar{t}(0,\pi)=n-1 without needing the calculation above.

Remark. One could alternatively use Corollary 9.5 to show that the strong stationary times in Examples 9.6 and 9.7 are mean-minimal stationary times. The previous examples are atypical: here is a more typical example in which the hypothesis of Corollary 9.5 is not satisfied and so no mean-optimal stationary time is a strong stationary time.

Example 9.9

xxx needs a name!

Chapter 2 Example yyy can be rewritten as follows. Let (Ut)(U_{t}) be independent uniform on {0,1,,n-1}\{0,1,\ldots,n-1\} and let (At)(A_{t}) be independent events with P(At)=aP(A_{t})=a. Define a chain XX on {0,1,,n-1}\{0,1,\ldots,n-1\} by

Xt+1\displaystyle X_{t+1} =\displaystyle= Ut+1 on Atc\displaystyle U_{t+1}\mbox{ on }A^{c}_{t}
=\displaystyle= Xt+1 mod n on At.\displaystyle X_{t}+1\mbox{ mod }n\mbox{ on }A_{t}.

The stationary distribution is the uniform distribution. Take X0=0X_{0}=0. Clearly Tmin{t:At occurs }T\equiv\min\{t:A_{t}\mbox{ occurs }\} is a strong stationary time, and ET=1/(1-a)ET=1/(1-a), and it is easy to see that TT is the minimal strong stationary time. But TT is not a mean-minimal stationary time. The occupation measure 𝐱{\bf x} associated with TT is such that minjxj=xn-1=an-1+a2n-1+=an-1/(1-an)\min_{j}x_{j}=x_{n-1}=a^{n-1}+a^{2n-1}+\ldots=a^{n-1}/(1-a^{n}), and so the occupation measure 𝐱¯\bar{{\bf x}} associated with a mean-minimal stationary time is 𝐱¯=𝐱-an-11-anπ\bar{{\bf x}}={\bf x}-\frac{a^{n-1}}{1-a^{n}}\pi, and so t¯(0,π)=11-a-an-11-an\bar{t}(0,\pi)=\frac{1}{1-a}-\frac{a^{n-1}}{1-a^{n}}.