4 Hitting and Convergence Time, and Flow Rate, Parameters for Reversible Markov Chains (October 11, 1994)

4.3 The variation threshold τ1\tau_{1}.

4.3.1 Definitions

Recall from Chapter 2 yyy that ||  ||||\ \ \ \ || denotes variation distance and

d(t)d¯(t)2d(t)d(t)\leq\bar{d}(t)\leq 2d(t)

We define the parameter

τ1min{t:d¯(t)e-1}.\tau_{1}\equiv\min\{t:\bar{d}(t)\leq e^{-1}\}. (4.11)

The choice of constant e-1e^{-1}, and of using d¯(t)\bar{d}(t) instead of d(t)d(t), are rather arbitrary, but this choice makes the numerical constants work out nicely (in particular, makes τ2τ1\tau_{2}\leq\tau_{1} – see section 4.4). Submultiplicativity gives

Lemma 4.5

d(t)d¯(t)exp(-t/τ1)exp(1-t/τ1),  t0d(t)\leq\bar{d}(t)\leq\exp(-\lfloor t/\tau_{1}\rfloor)\leq\exp(1-t/\tau_{1}),% \ t\geq 0.

The point of parameter τ1\tau_{1} is to formalize the idea of “time to approach stationarity, from worst starting-place”. The fact that variation distance is just one of several distances one could use may make τ1\tau_{1} seem a very arbitrary choice, but Theorem 4.6 below says that three other possible quantifications of this idea are equivalent. Here equivalent has a technical meaning: parameters τa\tau_{a} and τb\tau_{b} are equivalent if their ratio is bounded above and below by numerical constants not depending on the chain. (Thus (4.2) says τ*\tau^{*} and maxjEπTj\max_{j}E_{\pi}T_{j} are equivalent parameters). More surprisingly, τ1\tau_{1} is also equivalent to two more parameters involving mean hitting times. We now define all these parameters.

xxx Warning. Parameters τ1(4),τ1(5)\tau_{1}^{(4)},\tau_{1}^{(5)} in this draft were parameters τ1(3),τ1(4)\tau_{1}^{(3)},\tau_{1}^{(4)} in the previous draft.

The first idea is to measure distance from stationarity by using ratios of probabilities. Define separation from stationarity to be

s(t)min{s:pij(t)(1-s)πj for all i,j}.s(t)\equiv\min\{s:p_{ij}(t)\geq(1-s)\pi_{j}\mbox{ for all }i,j\}.

Then s()s(\cdot) is submultiplicative, so we naturally define the separation threshold time to be

τ1(1)min{t:s(t)e-1}.\tau_{1}^{(1)}\equiv\min\{t:s(t)\leq e^{-1}\}.

The second idea is to consider minimal random times at which the chain has exactly the stationary distribution. Let


where the minmin is over stopping times UiU_{i} such that Pi(X(Ui))=π()P_{i}(X(U_{i})\in\cdot)=\pi(\cdot). As a variation on this idea, let us temporarily write, for a probability distribution μ\mu on the state space,


where the minmin is over stopping times UiU_{i} such that Pi(X(Ui))=μ()P_{i}(X(U_{i})\in\cdot)=\mu(\cdot). Then define


Turning to the parameters involving mean hitting times, we define

τ1(4)maxi,kjπj|EiTj-EkTj|=maxi,kj|Zij-Zkj|\tau_{1}^{(4)}\equiv\max_{i,k}\sum_{j}\pi_{j}|E_{i}T_{j}-E_{k}T_{j}|=\max_{i,k% }\sum_{j}|Z_{ij}-Z_{kj}| (4.12)

where the equality involves the fundamental matrix 𝐙{\bf Z} and holds by the mean hitting time formula. Parameter τ1(4)\tau_{1}^{(4)} measures variability of mean hitting times as the starting place varies. The final parameter is


Here we can regard the right side as the ratio of EiTAE_{i}T_{A}, the Markov chain mean hitting time on AA, to 1/π(A)1/\pi(A), the mean hitting time under independent sampling from the stationary distribution.

The definitions above make sense in either discrete or continuous time, but the following notational convention turns out to be convenient. For a discrete-time chain we define τ1\tau_{1} to be the value obtained by applying the definition (4.11) to the continuized chain, and write τ1disc\tau_{1}^{{\mbox{disc}}} for the value obtained for the discrete-time chain itself. Define similarly τ1(1)\tau_{1}^{(1)} and τ11,disc\tau_{1}^{1,{\mbox{disc}}}. But the other parameters τ1(2)-τ1(5)\tau_{1}^{(2)}-\tau_{1}^{(5)} are defined directly in terms of the discrete-time chain. We now state the equivalence theorem, from Aldous [12].

Theorem 4.6

(a) In either discrete or continuous time, the parameters

τ1,τ1(1),τ1(2),τ1(3),τ1(4)\tau_{1},\tau_{1}^{(1)},\tau_{1}^{(2)},\tau_{1}^{(3)},\tau_{1}^{(4)} and τ1(5)\tau_{1}^{(5)} are equivalent.

(b) In discrete time, τ1𝑑𝑖𝑠𝑐\tau_{1}^{{\mbox{disc}}} and τ11,𝑑𝑖𝑠𝑐\tau_{1}^{1,{\mbox{disc}}} are equivalent, and τ1(2)ee-1τ11,𝑑𝑖𝑠𝑐\tau_{1}^{(2)}\leq\frac{e}{e-1}\tau_{1}^{1,{\mbox{disc}}} .

This will be (partially) proved in section 4.3.2, but let us first give a few remarks and examples. The parameter τ1\tau_{1} and total variation distance are closely related to the notion of coupling of Markov chains, discussed in Chapter 14. Analogously (see the Notes), the separation s(t)s(t) and the parameter τ1(1)\tau_{1}^{(1)} are closely related to the notion of strong stationary times ViV_{i} for which

Pi(X(Vi)|Vi=t)=π() for all t.P_{i}(X(V_{i})\in\cdot\ |V_{i}=t)=\pi(\cdot)\mbox{ for all }t. (4.13)

Under our standing assumption of reversibility there is a close connection between separation and variation distance, indicated by the next lemma.

Lemma 4.7

(a) d¯(t)s(t)\bar{d}(t)\leq s(t).

(b) s(2t)1-(1-d¯(t))2s(2t)\leq 1-(1-\bar{d}(t))^{2}.

Proof. Part (a) is immediate from the definitions. For (b),

pik(2t)πk\displaystyle\frac{p_{ik}(2t)}{\pi_{k}} =\displaystyle= jpij(t)pjk(t)πk\displaystyle\sum_{j}\frac{p_{ij}(t)p_{jk}(t)}{\pi_{k}}
=\displaystyle= jπjpij(t)pkj(t)πj2 by reversibility\displaystyle\sum_{j}\pi_{j}\frac{p_{ij}(t)p_{kj}(t)}{\pi^{2}_{j}}\mbox{ by reversibility}
\displaystyle\geq (jπjpij1/2(t)pkj1/2(t)πj)2 by EZ(EZ1/2)2\displaystyle\left(\sum_{j}\pi_{j}\frac{p_{ij}^{1/2}(t)p_{kj}^{1/2}(t)}{\pi_{j% }}\right)^{2}\mbox{ by }EZ\geq(EZ^{1/2})^{2}
\displaystyle\geq (jmin(pij(t),pkj(t)))2\displaystyle\left(\sum_{j}\min(p_{ij}(t),p_{kj}(t))\right)^{2}
=\displaystyle= (1-||Pi(Xt)-Pk(Xt)||)2\displaystyle\left(1-||P_{i}(X_{t}\in\cdot)-P_{k}(X_{t}\in\cdot)||\right)^{2}
\displaystyle\geq (1-d¯(t))2.  \displaystyle(1-\bar{d}(t))^{2}.\ \ \ \ \Box

Note also that the definition of s(t)s(t) involves lower bounds in the convergence pij(t)πj1\frac{p_{ij}(t)}{\pi_{j}}\rightarrow 1. One can make a definition involving upper bounds

d^(t)maxi,jpij(t)πj-1=maxipii(t)πi-10\hat{d}(t)\equiv\max_{i,j}\frac{p_{ij}(t)}{\pi_{j}}-1=\max_{i}\frac{p_{ii}(t)}% {\pi_{i}}-1\geq 0 (4.14)

where the equality (Chapter 3 Lemma yyy) requires in discrete time that tt be even. This yields the following one-sided inequalities, but Example 4.9 shows there can be no such reverse inequality.

Lemma 4.8

(a) 4||Pi(Xt)-π()||2pii(2t)πi-1,t04||P_{i}(X_{t}\in\cdot)-\pi(\cdot)||^{2}\leq\frac{p_{ii}(2t)}{\pi_{i}}\ -1,\ t\geq 0.

(b) d(t)12d^(2t),  t0d(t)\leq\frac{1}{2}\sqrt{\hat{d}(2t)},\ t\geq 0

Proof. Part (b) follows from part (a) and the definitions. Part (a) is essentially just the “||||1||||2||\ \ ||_{1}\leq||\ \ ||_{2}” inequality, but let’s write it out bare-hands.

4||Pi(Xt)-π()||2\displaystyle 4||P_{i}(X_{t}\in\cdot)-\pi(\cdot)||^{2} =\displaystyle= (j|pij(t)-πj|)2\displaystyle\left(\sum_{j}|p_{ij}(t)-\pi_{j}|\right)^{2}
=\displaystyle= (jπj1/2|pij(t)-πjπj1/2|)2\displaystyle\left(\sum_{j}\pi^{1/2}_{j}\left|\frac{p_{ij}(t)-\pi_{j}}{\pi^{1/% 2}_{j}}\right|\right)^{2}
\displaystyle\leq j(pij(t)-πj)2πj by Cauchy-Schwarz\displaystyle\sum_{j}\frac{(p_{ij}(t)-\pi_{j})^{2}}{\pi_{j}}\mbox{ by Cauchy-Schwarz}
=\displaystyle= -1+jpij2(t)πj\displaystyle-1+\sum_{j}\frac{p_{ij}^{2}(t)}{\pi_{j}}
=\displaystyle= -1+pii(2t)πi by Chapter 3 Lemma yyy.\displaystyle-1+\frac{p_{ii}(2t)}{\pi_{i}}\mbox{ by Chapter 3 Lemma yyy.}
Example 4.9

Consider a continuous-time 33-state chain with transition rates


Here πa=ε2+ε,  πb=πc=12+ε\pi_{a}=\frac{\varepsilon}{2+\varepsilon},\ \pi_{b}=\pi_{c}=\frac{1}{2+\varepsilon}. It is easy to check that τ1\tau_{1} is bounded as ε0\varepsilon\rightarrow 0. But paa(t)e-tp_{aa}(t)\rightarrow e^{-t} as ε0\varepsilon\rightarrow 0, and so by considering state aa we have d^(t)\hat{d}(t)\rightarrow\infty as ε0\varepsilon\rightarrow 0 for any fixed tt.

Remark. In the nice examples discussed in Chapter 5 we can usually find a pair of states (i0,j0)(i_{0},j_{0}) such that

d¯(t)=||Pi0(Xt)-Pj0(Xt)|| for all t.\bar{d}(t)=||P_{i_{0}}(X_{t}\in\cdot)-P_{j_{0}}(X_{t}\in\cdot)||\ \mbox{ for % all }t.

The next example shows this is false in general.

Example 4.10

Consider random walk on the weighted graph


for suitably small ε\varepsilon. As t0t\rightarrow 0 we have 1-d¯(t)cεt21-\bar{d}(t)\sim c_{\varepsilon}t^{2}, the maxmax attained by pairs (0,2)(0,2) or (1,3)(1,3). But as tt\rightarrow\infty we have d¯(t)aεexp(-t/τ2(ε))\bar{d}(t)\sim a_{\varepsilon}\exp(-t/\tau_{2}(\varepsilon)) where τ2(ε)=Θ(1/ε)\tau_{2}(\varepsilon)=\Theta(1/\varepsilon) and where the max is attained by pairs (i, *)(i,*). \Box

As a final comment, one might wonder whether the minimizing distribution μ\mu in the definition of τ1(3)\tau_{1}^{(3)} were always π\pi, i.e. whether τ1(3)=τ1(2)\tau_{1}^{(3)}=\tau_{1}^{(2)} always. But a counter-example is provided by random walk on the nn-star (Chapter 5 yyy) where τ1(3)=1\tau_{1}^{(3)}=1 (by taking μ\mu to be concentrated on the center vertex) but τ1(2)3/2\tau_{1}^{(2)}\rightarrow 3/2.

4.3.2 Proof of Theorem 6

We will prove

Lemma 4.11

τ1τ1(1)4τ1\tau_{1}\leq\tau_{1}^{(1)}\leq 4\tau_{1}

Lemma 4.12


Lemma 4.13

τ1(4)4τ1(3)\tau_{1}^{(4)}\leq 4\tau_{1}^{(3)}

Lemma 4.14


These lemmas hold in discrete and continuous time, interpreting tau1,τ1(1)tau_{1},\tau_{1}^{(1)} as τ1disc,τ11,disc\tau_{1}^{{\mbox{disc}}},\tau_{1}^{1,{\mbox{disc}}} is discrete time. Incidentally, Lemmas 4.12, 4.13 and 4.14 do not depend on reversibility. To complete the proof of Theorem 4.6 in continuous time we would need to show

τ1Kτ1(5) in continuous time \tau_{1}\leq K\tau^{(5)}_{1}\mbox{ in continuous time } (4.15)

for some absolute constant KK. The proof I know is too lengthy to repeat here – see [12]. Note that (from its definition) τ1(2)τ0\tau_{1}^{(2)}\leq\tau_{0}, so that (4.15) and the lemmas above imply τ12Kτ0\tau_{1}\leq 2K\tau_{0} in continuous time. We shall instead give a direct proof of a result weaker than (4.15):

Lemma 4.15

τ166τ0\tau_{1}\leq 66\tau_{0}.

Turning to the assertions of Theorem 4.6 is discrete time, (b) is given by the discrete-time versions of Lemmas 4.11 and 4.12. To prove (a), it is enough to show that the numerical values of the parameters τ1(2)--τ1(5)\tau_{1}^{(2)}--\tau_{1}^{(5)} are unchanged by continuizing the discrete-time chain. For τ1(5)\tau_{1}^{(5)} and τ1(4)\tau_{1}^{(4)} this is clear, because continuization doesn’t affect mean hitting times. For τ1(3)\tau_{1}^{(3)} and τ1(2)\tau_{1}^{(2)} it reduces to the following lemma.

Lemma 4.16

Let XtX_{t} be a discrete-time chain and YtY_{t} be its continuization, both started with the same distribution. Let TT be a randomized stopping time for YY. Then there exists a randomized stopping time T^\hat{T} for XX such that P(X(T^))=P(Y(T))P(X(\hat{T})\in\cdot)=P(Y(T)\in\cdot) and ET^=ETE\hat{T}=ET.

Proof of Lemma 4.11. The left inequality is immediate from Lemma 4.7(a), and the right inequality holds because

s(4τ1)\displaystyle s(4\tau_{1}) \displaystyle\leq 1-(1-d¯(2τ1))2 by Lemma 4.7(b)\displaystyle 1-(1-\bar{d}(2\tau_{1}))^{2}\mbox{ by Lemma }\ref{dsrel}(b)
\displaystyle\leq 1-(1-e-2)2 by Lemma 4.5\displaystyle 1-(1-e^{-2})^{2}\mbox{ by Lemma \ref{dsub}}
\displaystyle\leq e-1.\displaystyle e^{-1}.

Proof of Lemma 4.12. The left inequality is immediate from the definitions. For the right inequality, fix ii. Write u=τ1(1)u=\tau_{1}^{(1)}, so that

pjk(u)(1-e-1)πk for all j,k.p_{jk}(u)\geq(1-e^{-1})\pi_{k}\mbox{ for all }j,k.

We can construct a stopping time Ui{u,2u,3u,}U_{i}\in\{u,2u,3u,\ldots\} such that


and then by induction on mm such that

Pi(XUi,Ui=mu)=e-(m-1)(1-e-1)π(),m1.P_{i}(X_{U_{i}}\in\cdot,U_{i}=mu)=e^{-(m-1)}(1-e^{-1})\pi(\cdot),\ m\geq 1.

Then Pi(XUi)=π()P_{i}(X_{U_{i}}\in\cdot)=\pi(\cdot) and EiUi=u(1-e-1)-1E_{i}U_{i}=u(1-e^{-1})^{-1}. So τ1(2)(1-e-1)-1τ1(1)\tau_{1}^{(2)}\leq(1-e^{-1})^{-1}\tau_{1}^{(1)}.

Remark. What the argument shows is that we can construct a strong stationary time ViV_{i} (in the sense of (4.13)) such that

EiVi=(1-e-1)-1τ1(1).E_{i}V_{i}=(1-e^{-1})^{-1}\tau_{1}^{(1)}. (4.16)

Proof of Lemma 4.13. Consider the probability distribution μ\mu attaining the min in the definition of τ1(3)\tau_{1}^{(3)}, and the associated stopping times UiU_{i}. Fix ii. Since Pi(X(Ui))=μ()P_{i}(X(U_{i})\in\cdot)=\mu(\cdot),

EiTjEiUi+EμTjτ1(3)+EμTj.E_{i}T_{j}\leq E_{i}U_{i}+E_{\mu}T_{j}\leq\tau_{1}^{(3)}+E_{\mu}T_{j}.

The random target lemma (4.7) says jEiTjπj=jEμTjπj\sum_{j}E_{i}T_{j}\pi_{j}=\sum_{j}E_{\mu}T_{j}\pi_{j} and so

jπj|EiTj-EμTj|=2jπj(EiTj-EμTj)+2τ1(3).\sum_{j}\pi_{j}|E_{i}T_{j}-E_{\mu}T_{j}|=2\sum_{j}\pi_{j}(E_{i}T_{j}-E_{\mu}T_% {j})^{+}\leq 2\tau_{1}^{(3)}.

Writing b(i)b(i) for the left sum, the definition of τ1(4)\tau_{1}^{(4)} and the triangle inequality give τ1(4)maxi,k(b(i)+b(k))\tau_{1}^{(4)}\leq\max_{i,k}(b(i)+b(k)), and the Lemma follows.

Proof of Lemma 4.14. Fix a subset AA and a starting state iAi\not\in A. Then for any jAj\in A,


where ρ\rho is the hitting place distribution Pi(XTA)P_{i}(X_{T_{A}}\in\cdot). So

π(A)EiTA=jAπjEiTA=jAπj(EiTj-EρTj)\pi(A)E_{i}T_{A}=\sum_{j\in A}\pi_{j}E_{i}T_{A}=\sum_{j\in A}\pi_{j}(E_{i}T_{j% }-E_{\rho}T_{j})
maxkjAπj(EiTj-EkTj)τ1(4).\leq\max_{k}\sum_{j\in A}\pi_{j}(E_{i}T_{j}-E_{k}T_{j})\leq\tau_{1}^{(4)}.

Proof of Lemma 4.15. For small δ>0\delta>0 to be specified later, define


Note that Markov’s inequality and the definition of τ0\tau_{0} give

π(Ac)=π{j:EπTj>τ0/δ}jπjEπTjτ0/δ=τ0τ0/δ=δ.\pi(A^{c})=\pi\{j:E_{\pi}T_{j}>\tau_{0}/\delta\}\leq\frac{\sum_{j}\pi_{j}E_{% \pi}T_{j}}{\tau_{0}/\delta}=\frac{\tau_{0}}{\tau_{0}/\delta}=\delta. (4.17)

Next, for any jj

EπTj\displaystyle E_{\pi}T_{j} =\displaystyle= 0(pjj(s)πj-1)ds by Chapter 2 Lemma yyy\displaystyle\int_{0}^{\infty}\left(\frac{p_{jj}(s)}{\pi_{j}}-1\right)ds\mbox{% by Chapter 2 Lemma yyy }
\displaystyle\geq t(pjj(t)πj-1) for any t\displaystyle t\left(\frac{p_{jj}(t)}{\pi_{j}}-1\right)\mbox{ for any }t

by monotonicity of pjj(t)p_{jj}(t). Thus for jAj\in A we have

pjj(t)πj-1EπTjtτ0δt\frac{p_{jj}(t)}{\pi_{j}}-1\leq\frac{E_{\pi}T_{j}}{t}\leq\frac{\tau_{0}}{% \delta t}

and applying Chapter 3 Lemma yyy (b)

pjk(t)πk1-τ0δt,  j,kA.\frac{p_{jk}(t)}{\pi_{k}}\geq 1-\frac{\tau_{0}}{\delta t},\ j,k\in A.

Now let ii be arbitrary and let kAk\in A. For any 0su0\leq s\leq u,

Pi(Xu+t=k|TA=s)πkminjAPj(Xu+t-s=k)πk1-τ0δ(u+t-s)1-τ0δt\frac{P_{i}(X_{u+t}=k|T_{A}=s)}{\pi_{k}}\geq\min_{j\in A}\frac{P_{j}(X_{u+t-s}% =k)}{\pi_{k}}\geq 1-\frac{\tau_{0}}{\delta(u+t-s)}\geq 1-\frac{\tau_{0}}{% \delta t}

and so

pik(u+t)πk(1-τ0δt)+Pi(TAu).\frac{p_{ik}(u+t)}{\pi_{k}}\geq\left(1-\frac{\tau_{0}}{\delta t}\right)^{+}P_{% i}(T_{A}\leq u). (4.18)



using Markov’s inequality and the definition of τ1(5)\tau_{1}^{(5)}. And τ1(5)τ1(4)2τ0\tau_{1}^{(5)}\leq\tau_{1}^{(4)}\leq 2\tau_{0}, the first inequality being Lemma 4.14 and the second being an easy consequence of the definitions. Combining (4.18) and the subsequent inequalities shows that, for kAk\in A and arbitrary ii

pik(u+t)πk(1-τ0δt)+(1-2τ0uπ(A))+η, say.\frac{p_{ik}(u+t)}{\pi_{k}}\geq\left(1-\frac{\tau_{0}}{\delta t}\right)^{+}% \left(1-\frac{2\tau_{0}}{u\pi(A)}\right)^{+}\equiv\eta,\mbox{ say}.

Applying this to arbitrary ii and jj we get

d¯(u+t)1-ηπ(A)1-(1-τ0δt)+(π(A)-2τ0u)+\bar{d}(u+t)\leq 1-\eta\pi(A)\leq 1-\left(1-\frac{\tau_{0}}{\delta t}\right)^{% +}\left(\pi(A)-\frac{2\tau_{0}}{u}\right)^{+}
1-(1-τ0δt)+(1-δ-2τ0u)+ by (4.17).\leq 1-\left(1-\frac{\tau_{0}}{\delta t}\right)^{+}\left(1-\delta-\frac{2\tau_% {0}}{u}\right)^{+}\mbox{ by }(\ref{p50}).

Putting t=49τ0,u=17τ0,δ=1/7t=49\tau_{0},u=17\tau_{0},\delta=1/7 makes the bound =305833<e-1=\frac{305}{833}<e^{-1}.

Remark. The ingredients of the proof above are complete monotonicity and conditioning on carefully chosen hitting times. The proof of (4.15) in [12] uses these ingredients, plus the minimal hitting time construction in the recurrent balayage theorem (Chapter 2 yyy).

Outline proof of Lemma 4.16. The observant reader will have noticed (Chapter 2 yyy) that we avoided writing down a careful definition of stopping times in the continuous setting. The definition involves measure-theoretic issues which I don’t intend to engage, and giving a rigorous proof of the lemma is a challenging exercise in the measure-theoretic formulation of continuous-time chains. However, the underlying idea is very simple. Regard the chain YtY_{t} as constructed from the chain (X0,X1,X2,)(X_{0},X_{1},X_{2},\ldots) and exponential(11) holds (ξi)(\xi_{i}). Define T^=N(T)\hat{T}=N(T), where N(t)N(t) is the Poisson counting process N(t)=max{m:ξ1++ξmt}N(t)=\max\{m:\xi_{1}+\ldots+\xi_{m}\leq t\}. Then X(T^)=Y(T)X(\hat{T})=Y(T) by construction and ET^=ETE\hat{T}=ET by the optional sampling theorem for the martingale N(t)-tN(t)-t. \Box

4.3.3 τ1\tau_{1} in discrete time, and algorithmic issues

Of course for period-22 chains we don’t have convergence to stationarity in discrete time, so we regard τ1disc=τ11,disc=\tau_{1}^{{\mbox{disc}}}=\tau_{1}^{1,{\mbox{disc}}}=\infty. Such chains – random walks on bipartite weighted graphs – include several simple examples of unweighted graphs we will discuss in Chapter 5 (e.g. the nn-path and nn-cycle for even nn, and the dd-cube) and Chapter 7 (e.g. card-shuffling by random transpositions, if we insist on transposing distinct cards).

As mentioned in Chapter 1 xxx, a topic of much recent interest has been “Markov Chain Monte Carlo”, where one constructs a discrete-time reversible chain with specified stationary distribution π\pi and we wish to use the chain to sample from π\pi. We defer systematic discussion to xxx, but a few comments are appropriate here. We have to start a simulation somewhere. In practice one might use as initial distribution some distribution which is feasible to simulate and which looks intuitively “close” to π\pi, but this idea is hard to formalize and so in theoretical analysis we seek results which hold regardless of the initial distribution, i.e. “worst-case start” results. In this setting τ1(2)\tau_{1}^{(2)} is, by definition, the minimum expected time to generate a sample with distribution π\pi. But the definition of τ1(2)\tau_{1}^{(2)} merely says a stopping time exists, and doesn’t tell us how to implement it algorithmically. For algorithmic purposes we want rules which don’t involve detailed structure of the chain. The most natural idea – stopping at a deterministic time – requires one to worry unnecessarily about near-periodicity. One way to avoid this worry is to introduce holds into the discrete-time chain, i.e. simulate (𝐏+I)/2({\bf P}+I)/2 instead of 𝐏{\bf P}. As an alternative, the distribution of the continuized chain at time tt can be obtained by simply running the discrete-time chain for a Poisson(tt) number of steps. “In practice” there is little difference between these alternatives. But the continuization method, as well as being mathematically less artificial, allows us to avoid the occasional messiness of discrete-time theory (see e.g. Proposition 4.29 below). In this sense our use of τ1\tau_{1} for discrete-time chains as the value for continuous-time chains is indeed sensible: it measures the accuracy of a natural algorithmic procedure applied to a discrete-time chain.

Returning to technical matters, the fact that a periodic (reversible, by our standing assumption) chain can only have period 22 suggests that the discrete-time periodicity effect could be eliminated by averaging over times tt and t+1t+1 only, as follows.

Open Problem 4.17

Show there exist ψ(x)0\psi(x)\downarrow 0 as x0x\downarrow 0 and ϕ(t)t\phi(t)\sim t as tt\rightarrow\infty such that, for any discrete-time chain,

maxi||Pi(Xt)+Pi(Xt+1)2-π()||ψ(d(ϕ(t))),   t=0,1,2,\max_{i}\left|\left|\frac{P_{i}(X_{t}\in\cdot)+P_{i}(X_{t+1}\in\cdot)}{2}-\pi(% \cdot)\right|\right|\leq\psi(d(\phi(t))),\ \ \ t=0,1,2,\ldots

where d()d(\cdot) refers to the continuized chain.

See the Notes for some comments on this problem.

If one does wish to study distributions of discrete-time chains at deterministic times, then in place of τ2\tau_{2} one needs to use

βmax(|λm|:2mn)=max(λ2,-λn).\beta\equiv\max(|\lambda_{m}|:2\leq m\leq n)=\max(\lambda_{2},-\lambda_{n}). (4.19)

The spectral representation then implies

|Pi(Xt=i)-πi|βt,t=0,1,2,.|P_{i}(X_{t}=i)-\pi_{i}|\leq\beta^{t},\ t=0,1,2,\ldots. (4.20)

4.3.4 τ1\tau_{1} and mean hitting times

In general τ1\tau_{1} may be much smaller than τ*\tau^{*} or τ0\tau_{0}. For instance, random walk on the complete graph has τ0n\tau_{0}\sim n while τ11\tau_{1}\rightarrow 1. So we cannot (without extra assumptions) hope to improve much on the following result.

Lemma 4.18

For an nn-state chain, in discrete or continuous time,

τ0nτ1(2)\tau_{0}\leq n\tau^{(2)}_{1}

Lemmas 4.24 and 4.25 later are essentially stronger, giving corresponding upper bounds in terms of τ2\tau_{2} instead of τ1\tau_{1}. But a proof of Lemma 4.18 is interesting for comparison with the cat-and-mouse game below.

Proof of Lemma 4.18. By definition of τ1(2)\tau^{(2)}_{1}, for the chain started at i0i_{0} we can find stopping times U1,U2,U_{1},U_{2},\ldots such that

E(Us+1-Us|Xu,uUs)τ1(2)E(U_{s+1}-U_{s}|X_{u},u\leq U_{s})\leq\tau^{(2)}_{1}
(X(Us);s1) are independent with distribution π.(X(U_{s});s\geq 1)\mbox{ are independent with distribution }\pi.

So Sjmin{s:X(Us)=j}S_{j}\equiv\min\{s:X(U_{s})=j\} has Ei0Sj=1/πjE_{i_{0}}S_{j}=1/\pi_{j}, and so

Ei0TjEi0USjτ1(2)πjE_{i_{0}}T_{j}\leq E_{i_{0}}U_{S_{j}}\leq\frac{\tau^{(2)}_{1}}{\pi_{j}}

where the second inequality is justified below. The second assertion of the lemma is now clear, and the first holds by averaging over jj.

The second inequality is justified by the following martingale result, which is a simple application of the optional sampling theorem. The “equality” assertion is sometimes called Wald’s equation for martingales.

Lemma 4.19

Let 0=Y0Y1Y20=Y_{0}\leq Y_{1}\leq Y_{2}\ldots be such that

E(Yi+1-Yi|Yj,ji)c,i0E(Y_{i+1}-Y_{i}|Y_{j},j\leq i)\leq c,\ i\geq 0

for a constant cc. Then for any stopping time TT,

EYTcET.EY_{T}\leq cET.

If in the hypothesis we replace “c\leq c” by “=c=c”, then EYT=cETEY_{T}=cET.

Cat-and-Mouse Game. Here is another variation on the type of game described in Chapter 3 section yyy. Fix a graph. The cat starts at some vertex vcv_{c} and follows a continuous-time simple random walk. The mouse starts at some vertex vmv_{m} and is allowed an arbitrary strategy. Recall the mouse can’t see the cat, so it must use a deterministic strategy, or a random strategy independent of the cat’s moves. The mouse seeks to maximize EMEM, the time until meeting. Write m*m^{*} for the supsup of EMEM over all starting positions vc,vmv_{c},v_{m} and all strategies for the mouse. So m*m^{*} just depends on the graph. Clearly m*maxi,jEiTjm^{*}\geq\max_{i,j}E_{i}T_{j}, since the mouse can just stand still.

Open Problem 4.20

Does m*=maxi,jEiTjm^{*}=\max_{i,j}E_{i}T_{j}? In other words, is it never better to run than to hide?

Here’s a much weaker upper bound on m*m^{*}. Consider for simplicity a regular nn-vertex graph. Then

m*enτ1(1)e-1.m^{*}\leq\frac{en\tau^{(1)}_{1}}{e-1}. (4.21)

Because as remarked at (4.16), we can construct a strong stationary time VV such that EV=eτ1(1)e-1=cEV=\frac{e\tau_{1}^{(1)}}{e-1}=c, say. So we can construct 0=V0<V1<V20=V_{0}<V_{1}<V_{2}\ldots such that

E(Vi+1-Vi|Vj,ji)c,i0E(V_{i+1}-V_{i}|V_{j},j\leq i)\leq c,\ i\geq 0
(X(Vi),i1) are independent with the uniform distribution π(X(V_{i}),i\geq 1)\mbox{ are independent with the uniform distribution }\pi
(X(Vi),i1) are independent of (Vi,i1).(X(V_{i}),i\geq 1)\mbox{ are independent of }(V_{i},i\geq 1).

So regardless of the mouse’s strategy, the cat has chance 1/n1/n to meet the mouse at time ViV_{i}, independently as ii varies, so the meeting time MM satisfies MVTM\leq V_{T} where TT is a stopping time with mean nn, and (4.21) follows from Lemma 4.19. This topic will be pursued in Chapter 6 yyy.

4.3.5 τ1\tau_{1} and flows

Since discrete-time chains can be identified with random walks on weighted graphs, relating properties of the chain to properties of “flows” on the graph is a recurring theme. Thompson’s principle (Chapter 3 yyy) identified mean commute times and mean hitting times from stationarity as infs over flows of certain quantities. Sinclair [307] noticed that τ1\tau_{1} could be related to “multicommodity flow” issues, and we give a streamlined version of his result (essentially Corollary 4.22) here. Recall from Chapter 3 section yyy the general notation of a unit flow from aa to π\pi, and the special flow 𝐟aπ{\bf f}^{a\rightarrow\pi} induced by the Markov chain.

Lemma 4.21

Consider a family 𝐟=(𝐟(a)){\bf f}=({\bf f}^{(a)}), where, for each state aa, 𝐟(a){\bf f}^{(a)} is a unit flow from aa to the stationary distribution π\pi. Define

ψ(𝐟)=maxedges (i,j)aπa|fij(a)|πipij\psi({\bf f})=\max_{\mbox{edges }(i,j)}\sum_{a}\pi_{a}\frac{|f_{ij}^{(a)}|}{% \pi_{i}p_{ij}}

in discrete time, and substitute qijq_{ij} for pijp_{ij} in continuous time. Let 𝐟aπ{\bf f}^{a\rightarrow\pi} be the special flow induced by the chain. Then

ψ(𝐟aπ)τ1(4)Δψ(𝐟aπ)\psi({\bf f}^{a\rightarrow\pi})\leq\tau_{1}^{(4)}\leq\Delta\psi({\bf f}^{a% \rightarrow\pi})

where Δ\Delta is the diameter of the transition graph.

Proof. We work in discrete time (the continuous case is similar). By Chapter 3 yyy


and so

aπa|fijaπ|πipij=a|Zia-Zja|.\sum_{a}\pi_{a}\frac{|f_{ij}^{a\rightarrow\pi}|}{\pi_{i}p_{ij}}=\sum_{a}|Z_{ia% }-Z_{ja}|.


ψ(𝐟aπ)=maxedge(i,j)a|Zia-Zja|.\psi({\bf f}^{a\rightarrow\pi})=\max_{{\rm edge}(i,j)}\sum_{a}|Z_{ia}-Z_{ja}|.

The result now follows because by (4.12)


where ii and kk are not required to be neighbors. \Box

Using Lemmas 4.11 - 4.13 to relate τ1(4)\tau_{1}^{(4)} to τ1\tau_{1}, we can deduce a lower bound on τ1\tau_{1} in terms of flows.

Corollary 4.22

τ1e-116einf𝐟ψ(𝐟)\tau_{1}\geq\frac{e-1}{16e}\ \inf_{{\bf f}}\psi({\bf f}).

Unfortunately it seems hard to get analogous upper bounds. In particular, it is not true that

τ1=O(Δinf𝐟ψ(𝐟)).\tau_{1}=O\left(\Delta\inf_{{\bf f}}\psi({\bf f})\right).

To see why, consider first random walk on the nn-cycle (Chapter 5 Example yyy). Here τ1=Θ(n2)\tau_{1}=\Theta(n^{2}) and ψ(𝐟aπ)=Θ(n)\psi({\bf f}^{a\rightarrow\pi})=\Theta(n), so the upper bound in Lemma 4.21 is the right order of magnitude, since Δ=Θ(n)\Delta=\Theta(n). Now modify the chain by allowing transitions between arbitrary pairs (i,j)(i,j) with equal chance o(n-3)o(n^{-3}). The new chain will still have τ1=Θ(n2)\tau_{1}=\Theta(n^{2}), and by considering the special flow in the original chain we have inf𝐟ψ(𝐟)=O(n)\inf_{{\bf f}}\psi({\bf f})=O(n), but now the diameter Δ=1\Delta=1.