9 A Second Look at General Markov Chains (April 21, 1995)

9.5 An example concerning eigenvalues and mixing times

Here is an example, adapted from Aldous [17]. Let (λu:1un)(\lambda_{u}:1\leq u\leq n) be the eigenvalues of 𝐏{\bf P} with λ1=1\lambda_{1}=1, and let

β=max{|λu|:2un}.\beta=\max\{|\lambda_{u}|:2\leq u\leq n\}.

A weak quantification of “mixing” is provided by

α(t)maxA,B|Pπ(X0A,XtB)-π(A)π(B)|.\alpha(t)\equiv\max_{A,B}|P_{\pi}(X_{0}\in A,X_{t}\in B)-\pi(A)\pi(B)|.

By definition, α(t)\alpha(t) is less than the maximal correlation ρ(t)\rho(t) discussed in Chapter 4 yyy, and so by yyy

α(t)βt for a reversible chain.\alpha(t)\leq\beta^{t}\mbox{ for a reversible chain}. (9.22)

The convergence theorem (Chapter 2 yyy) says that α(t)0\alpha(t)\rightarrow 0 as tt\rightarrow\infty provided β<1\beta<1. So one might expect some analog of (9.22) to hold in general. But this is dramatically false: Example 9.26 shows

Lemma 9.25

There exists a family of nn-state chains, with uniform stationary distributions, such that supnβn<1\sup_{n}\beta_{n}<1 while infnαn(n)>0\inf_{n}\alpha_{n}(n)>0.

Loosely, this implies there is no reasonable hypothesis on the spectrum of a nn-state chain which implies an o(n)o(n) mixing time. There is a time-asymptotic result

α(t)ρ(t)Cβtt,\alpha(t)\leq\rho(t)\leq C\beta^{t}\ \forall t,

for some CC depending on the chain. But implicit claims in the literature that bounding the spectrum of a general chain has some consequence for finite-time behavior should be treated with extreme skepticism!

Example 9.26

Let (Yt)(Y_{t}) be independent r.v.’s taking values in {0,1,,n-1}\{0,1,\ldots,n-1\} with distribution specified by

P(Yj)=j+1j+2, 0jn-2.P(Y\leq j)=\frac{j+1}{j+2},\ 0\leq j\leq n-2.

Define a Markov chain (Xt)(X_{t}) on {0,1,,n-1}\{0,1,\ldots,n-1\} by

Xt=max(Xt-1-1,Yt).X_{t}=\max(X_{t-1}-1,Y_{t}).

This chain has the property (cf. the “patterns in coin-tossing” chain) of attaining the stationary distribution in finite time. Precisely: for any initial distribution σ\sigma, the distribution of Xn-1X_{n-1} is uniform, and hence XtX_{t} is uniform for all tn-1t\geq n-1. To prove this, we simply observe that for 0jn-10\leq j\leq n-1,

Pσ(Xn-1j)\displaystyle P_{\sigma}(X_{n-1}\leq j) =\displaystyle= P(Yn-1j,Yn-2j+1,,Y0j+n-1)\displaystyle P(Y_{n-1}\leq j,Y_{n-2}\leq j+1,\ldots,Y_{0}\leq j+n-1)
=\displaystyle= j+1j+2×j+2j+3××n-1n× 1×1\displaystyle\frac{j+1}{j+2}\times\frac{j+2}{j+3}\times\ldots\times\frac{n-1}{% n}\times\ 1\times\ldots 1
=\displaystyle= j+1n.\displaystyle\frac{j+1}{n}.

If X0X_{0} is either 00 or 11 then X1X_{1} is distributed as Y1Y_{1}, implying that the vector vv with vi=1(i=0)-1(i=1)v_{i}=1_{(i=0)}-1_{(i=1)} is an eigenvector of 𝐏{\bf P} with eigenvalue 00. By soft “duality” arguments it can be shown [17] that this is the largest eigenvalue, in the sense that

(λu)0 for all 2un.{\cal R}(\lambda_{u})\leq 0\mbox{ for all }2\leq u\leq n. (9.23)

I believe it is true that

βn=max{|λu|:2un}\beta_{n}=\max\{|\lambda_{u}|:2\leq u\leq n\}

is bounded away from 11, but we can avoid proving this by considering the “lazy” chain X^t\hat{X}_{t} with transition matrix 𝐏^=(𝐈+𝐏)/2\hat{{\bf P}}=({\bf I}+{\bf P})/2, for which by (9.23)

β^nsup{|(1+λ)/2|:|λ|1,(λ)0}=1/2.\hat{\beta}_{n}\leq\sup\{|(1+\lambda)/2|:|\lambda|\leq 1,{\cal R}(\lambda)\leq 0% \}=\sqrt{1/2}.

So the family of lazy chains has the eigenvalue property asserted in Lemma 9.25. But by construction, XtX0-tX_{t}\geq X_{0}-t, and so P(X0>3n/4,Xn/2<n/4)=0P(X_{0}>3n/4,X_{n/2}<n/4)=0. For the lazy chains we get

Pπ(X0>3n/4,Xn<n/4)0 as nP_{\pi}(X_{0}>3n/4,X_{n}<n/4)\rightarrow 0\mbox{ as }n\rightarrow\infty

establishing the (non)-mixing property asserted in the lemma.