# 4.3 The variation threshold $\tau_{1}$.

## 4.3.1 Definitions

Recall from Chapter 2 yyy that $||\ \ \ \ ||$ denotes variation distance and

 $d(t)\equiv\max_{i}||P_{i}(X_{t}\in\cdot)-\pi(\cdot)||$
 $\bar{d}(t)\equiv\max_{ij}||P_{i}(X_{t}\in\cdot)-P_{j}(X_{t}\in\cdot)||$
 $d(t)\leq\bar{d}(t)\leq 2d(t)$
 $\bar{d}(s+t)\leq\bar{d}(s)\bar{d}(t)$

We define the parameter

 $\tau_{1}\equiv\min\{t:\bar{d}(t)\leq e^{-1}\}.$ (4.11)

The choice of constant $e^{-1}$, and of using $\bar{d}(t)$ instead of $d(t)$, are rather arbitrary, but this choice makes the numerical constants work out nicely (in particular, makes $\tau_{2}\leq\tau_{1}$ – see section 4.4). Submultiplicativity gives

###### Lemma 4.5

$d(t)\leq\bar{d}(t)\leq\exp(-\lfloor t/\tau_{1}\rfloor)\leq\exp(1-t/\tau_{1}),% \ t\geq 0$.

The point of parameter $\tau_{1}$ is to formalize the idea of “time to approach stationarity, from worst starting-place”. The fact that variation distance is just one of several distances one could use may make $\tau_{1}$ seem a very arbitrary choice, but Theorem 4.6 below says that three other possible quantifications of this idea are equivalent. Here equivalent has a technical meaning: parameters $\tau_{a}$ and $\tau_{b}$ are equivalent if their ratio is bounded above and below by numerical constants not depending on the chain. (Thus (4.2) says $\tau^{*}$ and $\max_{j}E_{\pi}T_{j}$ are equivalent parameters). More surprisingly, $\tau_{1}$ is also equivalent to two more parameters involving mean hitting times. We now define all these parameters.

xxx Warning. Parameters $\tau_{1}^{(4)},\tau_{1}^{(5)}$ in this draft were parameters $\tau_{1}^{(3)},\tau_{1}^{(4)}$ in the previous draft.

The first idea is to measure distance from stationarity by using ratios of probabilities. Define separation from stationarity to be

 $s(t)\equiv\min\{s:p_{ij}(t)\geq(1-s)\pi_{j}\mbox{ for all }i,j\}.$

Then $s(\cdot)$ is submultiplicative, so we naturally define the separation threshold time to be

 $\tau_{1}^{(1)}\equiv\min\{t:s(t)\leq e^{-1}\}.$

The second idea is to consider minimal random times at which the chain has exactly the stationary distribution. Let

 $\tau_{1}^{(2)}\equiv\max_{i}\min_{U_{i}}E_{i}U_{i}$

where the $min$ is over stopping times $U_{i}$ such that $P_{i}(X(U_{i})\in\cdot)=\pi(\cdot)$. As a variation on this idea, let us temporarily write, for a probability distribution $\mu$ on the state space,

 $\tau(\mu)\equiv\max_{i}\min_{U_{i}}E_{i}U_{i}$

where the $min$ is over stopping times $U_{i}$ such that $P_{i}(X(U_{i})\in\cdot)=\mu(\cdot)$. Then define

 $\tau_{1}^{(3)}=\min_{\mu}\tau(\mu).$

Turning to the parameters involving mean hitting times, we define

 $\tau_{1}^{(4)}\equiv\max_{i,k}\sum_{j}\pi_{j}|E_{i}T_{j}-E_{k}T_{j}|=\max_{i,k% }\sum_{j}|Z_{ij}-Z_{kj}|$ (4.12)

where the equality involves the fundamental matrix ${\bf Z}$ and holds by the mean hitting time formula. Parameter $\tau_{1}^{(4)}$ measures variability of mean hitting times as the starting place varies. The final parameter is

 $\tau_{1}^{(5)}\equiv\max_{i,A}\pi(A)E_{i}T_{A}.$

Here we can regard the right side as the ratio of $E_{i}T_{A}$, the Markov chain mean hitting time on $A$, to $1/\pi(A)$, the mean hitting time under independent sampling from the stationary distribution.

The definitions above make sense in either discrete or continuous time, but the following notational convention turns out to be convenient. For a discrete-time chain we define $\tau_{1}$ to be the value obtained by applying the definition (4.11) to the continuized chain, and write $\tau_{1}^{{\mbox{disc}}}$ for the value obtained for the discrete-time chain itself. Define similarly $\tau_{1}^{(1)}$ and $\tau_{1}^{1,{\mbox{disc}}}$. But the other parameters $\tau_{1}^{(2)}-\tau_{1}^{(5)}$ are defined directly in terms of the discrete-time chain. We now state the equivalence theorem, from Aldous [12].

###### Theorem 4.6

(a) In either discrete or continuous time, the parameters

$\tau_{1},\tau_{1}^{(1)},\tau_{1}^{(2)},\tau_{1}^{(3)},\tau_{1}^{(4)}$ and $\tau_{1}^{(5)}$ are equivalent.

(b) In discrete time, $\tau_{1}^{{\mbox{disc}}}$ and $\tau_{1}^{1,{\mbox{disc}}}$ are equivalent, and $\tau_{1}^{(2)}\leq\frac{e}{e-1}\tau_{1}^{1,{\mbox{disc}}}$ .

This will be (partially) proved in section 4.3.2, but let us first give a few remarks and examples. The parameter $\tau_{1}$ and total variation distance are closely related to the notion of coupling of Markov chains, discussed in Chapter 14. Analogously (see the Notes), the separation $s(t)$ and the parameter $\tau_{1}^{(1)}$ are closely related to the notion of strong stationary times $V_{i}$ for which

 $P_{i}(X(V_{i})\in\cdot\ |V_{i}=t)=\pi(\cdot)\mbox{ for all }t.$ (4.13)

Under our standing assumption of reversibility there is a close connection between separation and variation distance, indicated by the next lemma.

###### Lemma 4.7

(a) $\bar{d}(t)\leq s(t)$.

(b) $s(2t)\leq 1-(1-\bar{d}(t))^{2}$.

Proof. Part (a) is immediate from the definitions. For (b),

 $\displaystyle\frac{p_{ik}(2t)}{\pi_{k}}$ $\displaystyle=$ $\displaystyle\sum_{j}\frac{p_{ij}(t)p_{jk}(t)}{\pi_{k}}$ $\displaystyle=$ $\displaystyle\sum_{j}\pi_{j}\frac{p_{ij}(t)p_{kj}(t)}{\pi^{2}_{j}}\mbox{ by reversibility}$ $\displaystyle\geq$ $\displaystyle\left(\sum_{j}\pi_{j}\frac{p_{ij}^{1/2}(t)p_{kj}^{1/2}(t)}{\pi_{j% }}\right)^{2}\mbox{ by }EZ\geq(EZ^{1/2})^{2}$ $\displaystyle\geq$ $\displaystyle\left(\sum_{j}\min(p_{ij}(t),p_{kj}(t))\right)^{2}$ $\displaystyle=$ $\displaystyle\left(1-||P_{i}(X_{t}\in\cdot)-P_{k}(X_{t}\in\cdot)||\right)^{2}$ $\displaystyle\geq$ $\displaystyle(1-\bar{d}(t))^{2}.\ \ \ \ \Box$

Note also that the definition of $s(t)$ involves lower bounds in the convergence $\frac{p_{ij}(t)}{\pi_{j}}\rightarrow 1$. One can make a definition involving upper bounds

 $\hat{d}(t)\equiv\max_{i,j}\frac{p_{ij}(t)}{\pi_{j}}-1=\max_{i}\frac{p_{ii}(t)}% {\pi_{i}}-1\geq 0$ (4.14)

where the equality (Chapter 3 Lemma yyy) requires in discrete time that $t$ be even. This yields the following one-sided inequalities, but Example 4.9 shows there can be no such reverse inequality.

###### Lemma 4.8

(a) $4||P_{i}(X_{t}\in\cdot)-\pi(\cdot)||^{2}\leq\frac{p_{ii}(2t)}{\pi_{i}}\ -1,\ t\geq 0$.

(b) $d(t)\leq\frac{1}{2}\sqrt{\hat{d}(2t)},\ t\geq 0$

Proof. Part (b) follows from part (a) and the definitions. Part (a) is essentially just the “$||\ \ ||_{1}\leq||\ \ ||_{2}$” inequality, but let’s write it out bare-hands.

 $\displaystyle 4||P_{i}(X_{t}\in\cdot)-\pi(\cdot)||^{2}$ $\displaystyle=$ $\displaystyle\left(\sum_{j}|p_{ij}(t)-\pi_{j}|\right)^{2}$ $\displaystyle=$ $\displaystyle\left(\sum_{j}\pi^{1/2}_{j}\left|\frac{p_{ij}(t)-\pi_{j}}{\pi^{1/% 2}_{j}}\right|\right)^{2}$ $\displaystyle\leq$ $\displaystyle\sum_{j}\frac{(p_{ij}(t)-\pi_{j})^{2}}{\pi_{j}}\mbox{ by Cauchy-Schwarz}$ $\displaystyle=$ $\displaystyle-1+\sum_{j}\frac{p_{ij}^{2}(t)}{\pi_{j}}$ $\displaystyle=$ $\displaystyle-1+\frac{p_{ii}(2t)}{\pi_{i}}\mbox{ by Chapter 3 Lemma yyy.}$
###### Example 4.9

Consider a continuous-time $3$-state chain with transition rates

Here $\pi_{a}=\frac{\varepsilon}{2+\varepsilon},\ \pi_{b}=\pi_{c}=\frac{1}{2+\varepsilon}$. It is easy to check that $\tau_{1}$ is bounded as $\varepsilon\rightarrow 0$. But $p_{aa}(t)\rightarrow e^{-t}$ as $\varepsilon\rightarrow 0$, and so by considering state $a$ we have $\hat{d}(t)\rightarrow\infty$ as $\varepsilon\rightarrow 0$ for any fixed $t$.

Remark. In the nice examples discussed in Chapter 5 we can usually find a pair of states $(i_{0},j_{0})$ such that

 $\bar{d}(t)=||P_{i_{0}}(X_{t}\in\cdot)-P_{j_{0}}(X_{t}\in\cdot)||\ \mbox{ for % all }t.$

The next example shows this is false in general.

###### Example 4.10

Consider random walk on the weighted graph

for suitably small $\varepsilon$. As $t\rightarrow 0$ we have $1-\bar{d}(t)\sim c_{\varepsilon}t^{2}$, the $max$ attained by pairs $(0,2)$ or $(1,3)$. But as $t\rightarrow\infty$ we have $\bar{d}(t)\sim a_{\varepsilon}\exp(-t/\tau_{2}(\varepsilon))$ where $\tau_{2}(\varepsilon)=\Theta(1/\varepsilon)$ and where the max is attained by pairs $(i,*)$. $\Box$

As a final comment, one might wonder whether the minimizing distribution $\mu$ in the definition of $\tau_{1}^{(3)}$ were always $\pi$, i.e. whether $\tau_{1}^{(3)}=\tau_{1}^{(2)}$ always. But a counter-example is provided by random walk on the $n$-star (Chapter 5 yyy) where $\tau_{1}^{(3)}=1$ (by taking $\mu$ to be concentrated on the center vertex) but $\tau_{1}^{(2)}\rightarrow 3/2$.

## 4.3.2 Proof of Theorem 6

We will prove

###### Lemma 4.11

$\tau_{1}\leq\tau_{1}^{(1)}\leq 4\tau_{1}$

###### Lemma 4.12

$\tau_{1}^{(3)}\leq\tau_{1}^{(2)}\leq\frac{e}{e-1}\tau_{1}^{(1)}$.

###### Lemma 4.13

$\tau_{1}^{(4)}\leq 4\tau_{1}^{(3)}$

###### Lemma 4.14

$\tau_{1}^{(5)}\leq\tau_{1}^{(4)}$

These lemmas hold in discrete and continuous time, interpreting $tau_{1},\tau_{1}^{(1)}$ as $\tau_{1}^{{\mbox{disc}}},\tau_{1}^{1,{\mbox{disc}}}$ is discrete time. Incidentally, Lemmas 4.12, 4.13 and 4.14 do not depend on reversibility. To complete the proof of Theorem 4.6 in continuous time we would need to show

 $\tau_{1}\leq K\tau^{(5)}_{1}\mbox{ in continuous time }$ (4.15)

for some absolute constant $K$. The proof I know is too lengthy to repeat here – see [12]. Note that (from its definition) $\tau_{1}^{(2)}\leq\tau_{0}$, so that (4.15) and the lemmas above imply $\tau_{1}\leq 2K\tau_{0}$ in continuous time. We shall instead give a direct proof of a result weaker than (4.15):

###### Lemma 4.15

$\tau_{1}\leq 66\tau_{0}$.

Turning to the assertions of Theorem 4.6 is discrete time, (b) is given by the discrete-time versions of Lemmas 4.11 and 4.12. To prove (a), it is enough to show that the numerical values of the parameters $\tau_{1}^{(2)}--\tau_{1}^{(5)}$ are unchanged by continuizing the discrete-time chain. For $\tau_{1}^{(5)}$ and $\tau_{1}^{(4)}$ this is clear, because continuization doesn’t affect mean hitting times. For $\tau_{1}^{(3)}$ and $\tau_{1}^{(2)}$ it reduces to the following lemma.

###### Lemma 4.16

Let $X_{t}$ be a discrete-time chain and $Y_{t}$ be its continuization, both started with the same distribution. Let $T$ be a randomized stopping time for $Y$. Then there exists a randomized stopping time $\hat{T}$ for $X$ such that $P(X(\hat{T})\in\cdot)=P(Y(T)\in\cdot)$ and $E\hat{T}=ET$.

Proof of Lemma 4.11. The left inequality is immediate from Lemma 4.7(a), and the right inequality holds because

 $\displaystyle s(4\tau_{1})$ $\displaystyle\leq$ $\displaystyle 1-(1-\bar{d}(2\tau_{1}))^{2}\mbox{ by Lemma }\ref{dsrel}(b)$ $\displaystyle\leq$ $\displaystyle 1-(1-e^{-2})^{2}\mbox{ by Lemma \ref{dsub}}$ $\displaystyle\leq$ $\displaystyle e^{-1}.$

Proof of Lemma 4.12. The left inequality is immediate from the definitions. For the right inequality, fix $i$. Write $u=\tau_{1}^{(1)}$, so that

 $p_{jk}(u)\geq(1-e^{-1})\pi_{k}\mbox{ for all }j,k.$

We can construct a stopping time $U_{i}\in\{u,2u,3u,\ldots\}$ such that

 $P_{i}(X_{U_{i}}\in\cdot,U_{i}=u)=(1-e^{-1})\pi(\cdot)$

and then by induction on $m$ such that

 $P_{i}(X_{U_{i}}\in\cdot,U_{i}=mu)=e^{-(m-1)}(1-e^{-1})\pi(\cdot),\ m\geq 1.$

Then $P_{i}(X_{U_{i}}\in\cdot)=\pi(\cdot)$ and $E_{i}U_{i}=u(1-e^{-1})^{-1}$. So $\tau_{1}^{(2)}\leq(1-e^{-1})^{-1}\tau_{1}^{(1)}$.

Remark. What the argument shows is that we can construct a strong stationary time $V_{i}$ (in the sense of (4.13)) such that

 $E_{i}V_{i}=(1-e^{-1})^{-1}\tau_{1}^{(1)}.$ (4.16)

Proof of Lemma 4.13. Consider the probability distribution $\mu$ attaining the min in the definition of $\tau_{1}^{(3)}$, and the associated stopping times $U_{i}$. Fix $i$. Since $P_{i}(X(U_{i})\in\cdot)=\mu(\cdot)$,

 $E_{i}T_{j}\leq E_{i}U_{i}+E_{\mu}T_{j}\leq\tau_{1}^{(3)}+E_{\mu}T_{j}.$

The random target lemma (4.7) says $\sum_{j}E_{i}T_{j}\pi_{j}=\sum_{j}E_{\mu}T_{j}\pi_{j}$ and so

 $\sum_{j}\pi_{j}|E_{i}T_{j}-E_{\mu}T_{j}|=2\sum_{j}\pi_{j}(E_{i}T_{j}-E_{\mu}T_% {j})^{+}\leq 2\tau_{1}^{(3)}.$

Writing $b(i)$ for the left sum, the definition of $\tau_{1}^{(4)}$ and the triangle inequality give $\tau_{1}^{(4)}\leq\max_{i,k}(b(i)+b(k))$, and the Lemma follows.

Proof of Lemma 4.14. Fix a subset $A$ and a starting state $i\not\in A$. Then for any $j\in A$,

 $E_{i}T_{j}=E_{i}T_{A}+E_{\rho}T_{j}$

where $\rho$ is the hitting place distribution $P_{i}(X_{T_{A}}\in\cdot)$. So

 $\pi(A)E_{i}T_{A}=\sum_{j\in A}\pi_{j}E_{i}T_{A}=\sum_{j\in A}\pi_{j}(E_{i}T_{j% }-E_{\rho}T_{j})$
 $\leq\max_{k}\sum_{j\in A}\pi_{j}(E_{i}T_{j}-E_{k}T_{j})\leq\tau_{1}^{(4)}.$

Proof of Lemma 4.15. For small $\delta>0$ to be specified later, define

 $A=\{j:E_{\pi}T_{j}\leq\tau_{0}/\delta\}.$

Note that Markov’s inequality and the definition of $\tau_{0}$ give

 $\pi(A^{c})=\pi\{j:E_{\pi}T_{j}>\tau_{0}/\delta\}\leq\frac{\sum_{j}\pi_{j}E_{% \pi}T_{j}}{\tau_{0}/\delta}=\frac{\tau_{0}}{\tau_{0}/\delta}=\delta.$ (4.17)

Next, for any $j$

 $\displaystyle E_{\pi}T_{j}$ $\displaystyle=$ $\displaystyle\int_{0}^{\infty}\left(\frac{p_{jj}(s)}{\pi_{j}}-1\right)ds\mbox{% by Chapter 2 Lemma yyy }$ $\displaystyle\geq$ $\displaystyle t\left(\frac{p_{jj}(t)}{\pi_{j}}-1\right)\mbox{ for any }t$

by monotonicity of $p_{jj}(t)$. Thus for $j\in A$ we have

 $\frac{p_{jj}(t)}{\pi_{j}}-1\leq\frac{E_{\pi}T_{j}}{t}\leq\frac{\tau_{0}}{% \delta t}$

and applying Chapter 3 Lemma yyy (b)

 $\frac{p_{jk}(t)}{\pi_{k}}\geq 1-\frac{\tau_{0}}{\delta t},\ j,k\in A.$

Now let $i$ be arbitrary and let $k\in A$. For any $0\leq s\leq u$,

 $\frac{P_{i}(X_{u+t}=k|T_{A}=s)}{\pi_{k}}\geq\min_{j\in A}\frac{P_{j}(X_{u+t-s}% =k)}{\pi_{k}}\geq 1-\frac{\tau_{0}}{\delta(u+t-s)}\geq 1-\frac{\tau_{0}}{% \delta t}$

and so

 $\frac{p_{ik}(u+t)}{\pi_{k}}\geq\left(1-\frac{\tau_{0}}{\delta t}\right)^{+}P_{% i}(T_{A}\leq u).$ (4.18)

Now

 $P_{i}(T_{A}>u)\leq\frac{E_{i}T_{A}}{u}\leq\frac{\tau_{1}^{(5)}}{u\pi(A)}.$

using Markov’s inequality and the definition of $\tau_{1}^{(5)}$. And $\tau_{1}^{(5)}\leq\tau_{1}^{(4)}\leq 2\tau_{0}$, the first inequality being Lemma 4.14 and the second being an easy consequence of the definitions. Combining (4.18) and the subsequent inequalities shows that, for $k\in A$ and arbitrary $i$

 $\frac{p_{ik}(u+t)}{\pi_{k}}\geq\left(1-\frac{\tau_{0}}{\delta t}\right)^{+}% \left(1-\frac{2\tau_{0}}{u\pi(A)}\right)^{+}\equiv\eta,\mbox{ say}.$

Applying this to arbitrary $i$ and $j$ we get

 $\bar{d}(u+t)\leq 1-\eta\pi(A)\leq 1-\left(1-\frac{\tau_{0}}{\delta t}\right)^{% +}\left(\pi(A)-\frac{2\tau_{0}}{u}\right)^{+}$
 $\leq 1-\left(1-\frac{\tau_{0}}{\delta t}\right)^{+}\left(1-\delta-\frac{2\tau_% {0}}{u}\right)^{+}\mbox{ by }(\ref{p50}).$

Putting $t=49\tau_{0},u=17\tau_{0},\delta=1/7$ makes the bound $=\frac{305}{833}.

Remark. The ingredients of the proof above are complete monotonicity and conditioning on carefully chosen hitting times. The proof of (4.15) in [12] uses these ingredients, plus the minimal hitting time construction in the recurrent balayage theorem (Chapter 2 yyy).

Outline proof of Lemma 4.16. The observant reader will have noticed (Chapter 2 yyy) that we avoided writing down a careful definition of stopping times in the continuous setting. The definition involves measure-theoretic issues which I don’t intend to engage, and giving a rigorous proof of the lemma is a challenging exercise in the measure-theoretic formulation of continuous-time chains. However, the underlying idea is very simple. Regard the chain $Y_{t}$ as constructed from the chain $(X_{0},X_{1},X_{2},\ldots)$ and exponential($1$) holds $(\xi_{i})$. Define $\hat{T}=N(T)$, where $N(t)$ is the Poisson counting process $N(t)=\max\{m:\xi_{1}+\ldots+\xi_{m}\leq t\}$. Then $X(\hat{T})=Y(T)$ by construction and $E\hat{T}=ET$ by the optional sampling theorem for the martingale $N(t)-t$. $\Box$

## 4.3.3 $\tau_{1}$ in discrete time, and algorithmic issues

Of course for period-$2$ chains we don’t have convergence to stationarity in discrete time, so we regard $\tau_{1}^{{\mbox{disc}}}=\tau_{1}^{1,{\mbox{disc}}}=\infty$. Such chains – random walks on bipartite weighted graphs – include several simple examples of unweighted graphs we will discuss in Chapter 5 (e.g. the $n$-path and $n$-cycle for even $n$, and the $d$-cube) and Chapter 7 (e.g. card-shuffling by random transpositions, if we insist on transposing distinct cards).

As mentioned in Chapter 1 xxx, a topic of much recent interest has been “Markov Chain Monte Carlo”, where one constructs a discrete-time reversible chain with specified stationary distribution $\pi$ and we wish to use the chain to sample from $\pi$. We defer systematic discussion to xxx, but a few comments are appropriate here. We have to start a simulation somewhere. In practice one might use as initial distribution some distribution which is feasible to simulate and which looks intuitively “close” to $\pi$, but this idea is hard to formalize and so in theoretical analysis we seek results which hold regardless of the initial distribution, i.e. “worst-case start” results. In this setting $\tau_{1}^{(2)}$ is, by definition, the minimum expected time to generate a sample with distribution $\pi$. But the definition of $\tau_{1}^{(2)}$ merely says a stopping time exists, and doesn’t tell us how to implement it algorithmically. For algorithmic purposes we want rules which don’t involve detailed structure of the chain. The most natural idea – stopping at a deterministic time – requires one to worry unnecessarily about near-periodicity. One way to avoid this worry is to introduce holds into the discrete-time chain, i.e. simulate $({\bf P}+I)/2$ instead of ${\bf P}$. As an alternative, the distribution of the continuized chain at time $t$ can be obtained by simply running the discrete-time chain for a Poisson($t$) number of steps. “In practice” there is little difference between these alternatives. But the continuization method, as well as being mathematically less artificial, allows us to avoid the occasional messiness of discrete-time theory (see e.g. Proposition 4.29 below). In this sense our use of $\tau_{1}$ for discrete-time chains as the value for continuous-time chains is indeed sensible: it measures the accuracy of a natural algorithmic procedure applied to a discrete-time chain.

Returning to technical matters, the fact that a periodic (reversible, by our standing assumption) chain can only have period $2$ suggests that the discrete-time periodicity effect could be eliminated by averaging over times $t$ and $t+1$ only, as follows.

###### Open Problem 4.17

Show there exist $\psi(x)\downarrow 0$ as $x\downarrow 0$ and $\phi(t)\sim t$ as $t\rightarrow\infty$ such that, for any discrete-time chain,

 $\max_{i}\left|\left|\frac{P_{i}(X_{t}\in\cdot)+P_{i}(X_{t+1}\in\cdot)}{2}-\pi(% \cdot)\right|\right|\leq\psi(d(\phi(t))),\ \ \ t=0,1,2,\ldots$

where $d(\cdot)$ refers to the continuized chain.

See the Notes for some comments on this problem.

If one does wish to study distributions of discrete-time chains at deterministic times, then in place of $\tau_{2}$ one needs to use

 $\beta\equiv\max(|\lambda_{m}|:2\leq m\leq n)=\max(\lambda_{2},-\lambda_{n}).$ (4.19)

The spectral representation then implies

 $|P_{i}(X_{t}=i)-\pi_{i}|\leq\beta^{t},\ t=0,1,2,\ldots.$ (4.20)

## 4.3.4 $\tau_{1}$ and mean hitting times

In general $\tau_{1}$ may be much smaller than $\tau^{*}$ or $\tau_{0}$. For instance, random walk on the complete graph has $\tau_{0}\sim n$ while $\tau_{1}\rightarrow 1$. So we cannot (without extra assumptions) hope to improve much on the following result.

###### Lemma 4.18

For an $n$-state chain, in discrete or continuous time,

 $\tau_{0}\leq n\tau^{(2)}_{1}$
 $\tau^{*}\leq\frac{2\tau^{(2)}_{1}}{\min_{j}\pi_{j}}.$

Lemmas 4.24 and 4.25 later are essentially stronger, giving corresponding upper bounds in terms of $\tau_{2}$ instead of $\tau_{1}$. But a proof of Lemma 4.18 is interesting for comparison with the cat-and-mouse game below.

Proof of Lemma 4.18. By definition of $\tau^{(2)}_{1}$, for the chain started at $i_{0}$ we can find stopping times $U_{1},U_{2},\ldots$ such that

 $E(U_{s+1}-U_{s}|X_{u},u\leq U_{s})\leq\tau^{(2)}_{1}$
 $(X(U_{s});s\geq 1)\mbox{ are independent with distribution }\pi.$

So $S_{j}\equiv\min\{s:X(U_{s})=j\}$ has $E_{i_{0}}S_{j}=1/\pi_{j}$, and so

 $E_{i_{0}}T_{j}\leq E_{i_{0}}U_{S_{j}}\leq\frac{\tau^{(2)}_{1}}{\pi_{j}}$

where the second inequality is justified below. The second assertion of the lemma is now clear, and the first holds by averaging over $j$.

The second inequality is justified by the following martingale result, which is a simple application of the optional sampling theorem. The “equality” assertion is sometimes called Wald’s equation for martingales.

###### Lemma 4.19

Let $0=Y_{0}\leq Y_{1}\leq Y_{2}\ldots$ be such that

 $E(Y_{i+1}-Y_{i}|Y_{j},j\leq i)\leq c,\ i\geq 0$

for a constant $c$. Then for any stopping time $T$,

 $EY_{T}\leq cET.$

If in the hypothesis we replace “$\leq c$” by “$=c$”, then $EY_{T}=cET$.

Cat-and-Mouse Game. Here is another variation on the type of game described in Chapter 3 section yyy. Fix a graph. The cat starts at some vertex $v_{c}$ and follows a continuous-time simple random walk. The mouse starts at some vertex $v_{m}$ and is allowed an arbitrary strategy. Recall the mouse can’t see the cat, so it must use a deterministic strategy, or a random strategy independent of the cat’s moves. The mouse seeks to maximize $EM$, the time until meeting. Write $m^{*}$ for the $sup$ of $EM$ over all starting positions $v_{c},v_{m}$ and all strategies for the mouse. So $m^{*}$ just depends on the graph. Clearly $m^{*}\geq\max_{i,j}E_{i}T_{j}$, since the mouse can just stand still.

###### Open Problem 4.20

Does $m^{*}=\max_{i,j}E_{i}T_{j}$? In other words, is it never better to run than to hide?

Here’s a much weaker upper bound on $m^{*}$. Consider for simplicity a regular $n$-vertex graph. Then

 $m^{*}\leq\frac{en\tau^{(1)}_{1}}{e-1}.$ (4.21)

Because as remarked at (4.16), we can construct a strong stationary time $V$ such that $EV=\frac{e\tau_{1}^{(1)}}{e-1}=c$, say. So we can construct $0=V_{0} such that

 $E(V_{i+1}-V_{i}|V_{j},j\leq i)\leq c,\ i\geq 0$
 $(X(V_{i}),i\geq 1)\mbox{ are independent with the uniform distribution }\pi$
 $(X(V_{i}),i\geq 1)\mbox{ are independent of }(V_{i},i\geq 1).$

So regardless of the mouse’s strategy, the cat has chance $1/n$ to meet the mouse at time $V_{i}$, independently as $i$ varies, so the meeting time $M$ satisfies $M\leq V_{T}$ where $T$ is a stopping time with mean $n$, and (4.21) follows from Lemma 4.19. This topic will be pursued in Chapter 6 yyy.

## 4.3.5 $\tau_{1}$ and flows

Since discrete-time chains can be identified with random walks on weighted graphs, relating properties of the chain to properties of “flows” on the graph is a recurring theme. Thompson’s principle (Chapter 3 yyy) identified mean commute times and mean hitting times from stationarity as infs over flows of certain quantities. Sinclair [307] noticed that $\tau_{1}$ could be related to “multicommodity flow” issues, and we give a streamlined version of his result (essentially Corollary 4.22) here. Recall from Chapter 3 section yyy the general notation of a unit flow from $a$ to $\pi$, and the special flow ${\bf f}^{a\rightarrow\pi}$ induced by the Markov chain.

###### Lemma 4.21

Consider a family ${\bf f}=({\bf f}^{(a)})$, where, for each state $a$, ${\bf f}^{(a)}$ is a unit flow from $a$ to the stationary distribution $\pi$. Define

 $\psi({\bf f})=\max_{\mbox{edges }(i,j)}\sum_{a}\pi_{a}\frac{|f_{ij}^{(a)}|}{% \pi_{i}p_{ij}}$

in discrete time, and substitute $q_{ij}$ for $p_{ij}$ in continuous time. Let ${\bf f}^{a\rightarrow\pi}$ be the special flow induced by the chain. Then

 $\psi({\bf f}^{a\rightarrow\pi})\leq\tau_{1}^{(4)}\leq\Delta\psi({\bf f}^{a% \rightarrow\pi})$

where $\Delta$ is the diameter of the transition graph.

Proof. We work in discrete time (the continuous case is similar). By Chapter 3 yyy

 $\frac{f_{ij}^{a\rightarrow\pi}}{\pi_{i}p_{ij}}=\frac{Z_{ia}-Z_{ja}}{\pi_{a}}$

and so

 $\sum_{a}\pi_{a}\frac{|f_{ij}^{a\rightarrow\pi}|}{\pi_{i}p_{ij}}=\sum_{a}|Z_{ia% }-Z_{ja}|.$

Thus

 $\psi({\bf f}^{a\rightarrow\pi})=\max_{{\rm edge}(i,j)}\sum_{a}|Z_{ia}-Z_{ja}|.$

The result now follows because by (4.12)

 $\tau_{1}^{(4)}=\max_{i,k}\sum_{a}|Z_{ia}-Z_{ka}|$

where $i$ and $k$ are not required to be neighbors. $\Box$

Using Lemmas 4.11 - 4.13 to relate $\tau_{1}^{(4)}$ to $\tau_{1}$, we can deduce a lower bound on $\tau_{1}$ in terms of flows.

###### Corollary 4.22

$\tau_{1}\geq\frac{e-1}{16e}\ \inf_{{\bf f}}\psi({\bf f})$.

Unfortunately it seems hard to get analogous upper bounds. In particular, it is not true that

 $\tau_{1}=O\left(\Delta\inf_{{\bf f}}\psi({\bf f})\right).$

To see why, consider first random walk on the $n$-cycle (Chapter 5 Example yyy). Here $\tau_{1}=\Theta(n^{2})$ and $\psi({\bf f}^{a\rightarrow\pi})=\Theta(n)$, so the upper bound in Lemma 4.21 is the right order of magnitude, since $\Delta=\Theta(n)$. Now modify the chain by allowing transitions between arbitrary pairs $(i,j)$ with equal chance $o(n^{-3})$. The new chain will still have $\tau_{1}=\Theta(n^{2})$, and by considering the special flow in the original chain we have $\inf_{{\bf f}}\psi({\bf f})=O(n)$, but now the diameter $\Delta=1$.