3 Reversible Markov Chains (September 10, 2002)

3.4 The spectral representation

Use the transition matrix 𝐏{\bf P} to define

sij=πi1/2pijπj-1/2.s_{ij}=\pi^{1/2}_{i}p_{ij}\pi^{-1/2}_{j}.

From definition (3.1), 𝐒{\bf S} is a symmetric matrix. So we can apply the elementary diagonalization theorem. The authors find it helpful to distinguish between the state space I={i,j,}I=\{i,j,\ldots\}, of size nn say, and the index set of integers [n]={1,2,,n}[n]=\{1,2,\ldots,n\}, the point being that the state space may have a lot of extra structure, whereas the index set has no obvious structure. The spectral theorem ([183] Theorem 4.1.5) gives a representation

𝐒=𝐔Λ𝐔T{\bf S}={\bf U}\Lambda{\bf U}^{T}

where 𝐔=(uim)iI,m[n]{\bf U}=(u_{im})_{i\in I,m\in[n]} is an orthonormal matrix, and Λ=(λm,m)m,m[n]\Lambda=(\lambda_{m,m^{\prime}})_{m,m^{\prime}\in[n]} is a diagonal real matrix. We can write the diagonal entries of Λ\Lambda as (λm)(\lambda_{m}), and arrange them in decreasing order. Then

1=λ1>λ2λn-1.1=\lambda_{1}>\lambda_{2}\geq\cdots\geq\lambda_{n}\geq-1. (3.30)

The classical fact that |λi|1|\lambda_{i}|\leq 1 follows easily from the fact that the entries of 𝐒(t){\bf S}^{(t)} are bounded as tt\rightarrow\infty by (3.31) below. These λ\lambda’s are the eigenvalues of 𝐏{\bf P}, as well as of 𝐒{\bf S}. That is, the solutions (λ;x)(\lambda;x) with xi0x_{i}\not\equiv 0 of

ixipij=λxj  for all j\sum_{i}x_{i}p_{ij}=\lambda x_{j}\mbox{\ \ for all\ }j

are exactly the pairs

(λ=λm;xi=cmπi1/2uim,i=1,,n)(\lambda=\lambda_{m};x_{i}=c_{m}\pi^{1/2}_{i}u_{im},i=1,\ldots,n)

for m=1,,nm=1,\ldots,n, where cm0c_{m}\neq 0 is arbitrary. And the solutions of

jpijyj=λyi  for all i\sum_{j}p_{ij}y_{j}=\lambda y_{i}\mbox{\ \ for all\ }i

are exactly the pairs

(λ=λm; yi=cmπi-1/2uim,  i=1,,n).(\lambda=\lambda_{m};\ \,y_{i}=c_{m}\pi^{-1/2}_{i}u_{im},\ i=1,\ldots,n).

Note that an eigenvector (ui1)(u_{i1}) of 𝐒{\bf S} corresponding to the eigenvalue λ1=1\lambda_{1}=1 is

ui1=πi1/2.u_{i1}=\pi^{1/2}_{i}.

Uniqueness of the stationary distribution now implies λ2<1\lambda_{2}<1.

Now consider matrix powers. We have

𝐒(t)=𝐔Λ(t)𝐔T{\bf S}^{(t)}={\bf U}\Lambda^{(t)}{\bf U}^{T}

and

pij(t)=πi-1/2sij(t)πj1/2,p^{(t)}_{ij}=\pi^{-1/2}_{i}s^{(t)}_{ij}\pi^{1/2}_{j}, (3.31)

so

Pi(Xt=j)=πi-1/2πj1/2m=1nλmtuimujm.P_{i}(X_{t}=j)=\pi^{-1/2}_{i}\pi^{1/2}_{j}\sum_{m=1}^{n}\lambda^{t}_{m}u_{im}u% _{jm}. (3.32)

This is the spectral representation formula. In continuous time, the analogous formula is

Pi(Xt=j)=πi-1/2πj1/2m=1nexp(-λmt)uimujm.P_{i}(X_{t}=j)=\pi^{-1/2}_{i}\pi^{1/2}_{j}\sum_{m=1}^{n}\exp(-\lambda_{m}t)u_{% im}u_{jm}. (3.33)

As before, 𝐔{\bf U} is an orthonormal matrix and ui1=πi1/2u_{i1}=\pi^{1/2}_{i}, and now the λ\lambda’s are the eigenvalues of -𝐐-{\bf Q}. In the continuous-time setting, the eigenvalues satisfy

0=λ1<λ2λn.0=\lambda_{1}<\lambda_{2}\leq\cdots\leq\lambda_{n}. (3.34)

Rather than give the general proof, let us consider the effect of continuizing the discrete-time chain (3.32). The continuized chain (Yt)(Y_{t}) can be represented as Yt=XN(t)Y_{t}=X_{N(t)} where N(t)N(t) has Poisson(t)\mbox{Poisson}(t) distribution, so by conditioning on N(t)=νN(t)=\nu,

Pi(Yt=j)\displaystyle P_{i}(Y_{t}=j) =\displaystyle= πi-1/2πj1/2m=1nuimujmν=0λmνe-ttνν!\displaystyle\pi^{-1/2}_{i}\pi^{1/2}_{j}\sum_{m=1}^{n}u_{im}u_{jm}\sum_{\nu=0}% ^{\infty}\lambda_{m}^{\nu}\frac{e^{-t}t^{\nu}}{\nu!}
=\displaystyle= πi-1/2πj1/2m=1nuimujmexp(-(1-λm)t).\displaystyle\pi^{-1/2}_{i}\pi^{1/2}_{j}\sum_{m=1}^{n}u_{im}u_{jm}\exp(-(1-% \lambda_{m})t).

So when we compare the spectral representations (3.32),(3.33) for a discrete-time chain and its continuization, the orthonormal matrices are identical, and the eigenvalues are related by

λm(c)=1-λm(d)\lambda^{\mbox{\rm\scriptsize(c)}}_{m}=1-\lambda^{\mbox{\rm\scriptsize(d)}}_{m} (3.35)

superscripts (c) and (d) indicating continuous or discrete time. In particular, this relation holds for the basic discrete and continuous time random walks on a graph.

Let us point out some interesting simple consequences of the spectral representation. For these purposes continuous time is simpler. First,

Pi(Xt=j)-πj=cije-λ2t+o(e-λ2t)  as tP_{i}(X_{t}=j)-\pi_{j}=c_{ij}e^{-\lambda_{2}t}+o(e^{-\lambda_{2}t})\mbox{\ \ % as\ }t\rightarrow\infty (3.36)

where cij=πi-1/2πj1/2m:λm=λ2uimujmc_{ij}=\pi^{-1/2}_{i}\pi^{1/2}_{j}\sum_{m:\lambda_{m}=\lambda_{2}}u_{im}u_{jm} and where “typically” cij0c_{ij}\neq 0. (A precise statement is this: there exists ii such that

Pi(Xt=i)-πiciie-λ2t,cii>0,P_{i}(X_{t}=i)-\pi_{i}\sim c_{ii}e^{-\lambda_{2}t},\ \ c_{ii}>0, (3.37)

by considering ii such that ui20u_{i2}\neq 0.) Thus λ2\lambda_{2} has the interpretation of “asymptotic rate of convergence to the stationary distribution”. The authors find it simpler to interpret parameters measuring “time” rather than “1/time1/\mbox{time}”, and so prefer to work with the relaxation time τ2\tau_{2} defined by

τ2\displaystyle\tau_{2} :=\displaystyle:= 1/λ2  for a continuous-time chain\displaystyle 1/\lambda_{2}\mbox{\ \ for a continuous-time chain} (3.38)
τ2\displaystyle\tau_{2} :=\displaystyle:= 1/(1-λ2)  for a discrete-time chain.\displaystyle 1/(1-\lambda_{2})\mbox{\ \ for a discrete-time chain.} (3.39)

Note that by (3.35) the value of τ2\tau_{2} is unchanged by continuizing a discrete-time chain.

Still in continuous time, the spectral representation gives

Pi(Xt=i)=πi+m2uim2exp(-λmt)P_{i}(X_{t}=i)=\pi_{i}+\sum_{m\geq 2}u^{2}_{im}\exp(-\lambda_{m}t) (3.40)

so the right side is decreasing with tt, and in fact is completely monotone, a subject pursued in Section 3.5. Thus ZiiZ_{ii} defined in margin: 9/10/99 versionChapter 2 Section 2.3 satisfies

Zii\displaystyle Z_{ii} =\displaystyle= 0(Pi(Xt=i)-πi)dt\displaystyle\int_{0}^{\infty}\!(P_{i}(X_{t}=i)-\pi_{i})\,dt (3.41)
=\displaystyle= m2uim2λm-1  by (3.40).\displaystyle\sum_{m\geq 2}u^{2}_{im}\lambda^{-1}_{m}\mbox{\ \ by~{}(\ref{pii}% )}.

Using the orthonormal property of 𝐔{\bf U},

iZii=m2λm-1.\sum_{i}Z_{ii}=\sum_{m\geq 2}\lambda^{-1}_{m}.

Applying Corollary 13 of margin: 9/10/99 versionChapter 2, we obtain a fundamental result relating average hitting times to eigenvalues.

Proposition 3.13 (The eigentime identity)

For each ii,

jπjEiTj\displaystyle\sum_{j}\pi_{j}E_{i}T_{j} =\displaystyle= m2λm-1          (continuous time)\displaystyle\sum_{m\geq 2}\lambda^{-1}_{m}\mbox{\ \ {\ \ \ \ \ \ \ \ \rm(}% continuous time\/{\rm)}}
jπjEiTj\displaystyle\sum_{j}\pi_{j}E_{i}T_{j} =\displaystyle= m2(1-λm)-1  (discrete time).\displaystyle\sum_{m\geq 2}(1-\lambda_{m})^{-1}\mbox{\ \ {\rm(}discrete time\/% {\rm)}}.

[The discrete-time version follows from (3.35).] Proposition 3.13 expands upon the random target lemma, which said that (even for non-reversible chains) jπjEiTj\sum_{j}\pi_{j}E_{i}T_{j} does not depend on ii.

3.4.1 Mean hitting times and reversible chains

In margin: 9/10/99 versionChapter 2 Section 2.2 we listed identities for general chains such as the mean hitting time formulas

EiTj=(Zjj-Zij)/πj;   EπTj=Zjj/πj.E_{i}T_{j}=(Z_{jj}-Z_{ij})/\pi_{j};\ \ \ E_{\pi}T_{j}=Z_{jj}/\pi_{j}.

There are a number of more complicated identities for general chains in which one side becomes zero for any reversible chain (by the symmetry property πiZij=πjZji\pi_{i}Z_{ij}=\pi_{j}Z_{ji}) and which therefore simplify to give identities for reversible chains. We have already seen one example, the cyclic tour lemma, and the following result may be considered an extension of that lemma. [Indeed, sum the following equation over successive pairs (i,j)(i,j) along a cycle to recapture the cyclic tour lemma.]

Corollary 3.14

EπTj-EπTi=EiTj-EjTiE_{\pi}T_{j}-E_{\pi}T_{i}=E_{i}T_{j}-E_{j}T_{i}.

This identity follows immediately from the mean hitting time formulas and the symmetry property. Note the following interpretation of the corollary. Define an ordering iji\preceq j on the states by

ij  iff  EπTiEπTj.i\preceq j\mbox{\ \ iff\ \ }E_{\pi}T_{i}\leq E_{\pi}T_{j}.

Then Corollary 3.14 implies

EiTjEjTi  iff  ij.E_{i}T_{j}\geq E_{j}T_{i}\mbox{\ \ iff\ \ }i\preceq j.

Warning. Corollary 3.14 does not imply

maxi,jEiTj is attained by some pair (i*,j*) such that\displaystyle\hskip{-119.2455pt}\max_{i,j}E_{i}T_{j}\mbox{\ is attained by some pair\ }(i_{*},j_{*})\mbox{\ such that}
i* attains miniEπTi and j* attains maxjEπTj.\displaystyle\hskip{-119.2455pt}\quad i_{*}\mbox{\ attains\ }\min_{i}E_{\pi}T_% {i}\mbox{\ and\ }j_{*}\mbox{\ attains\ }\max_{j}E_{\pi}T_{j}.

Here margin: I haven’t tried to find counterexamples with more than three states. is a counterexample. Choose 0<ε<1/20<\varepsilon<1/2 arbitrarily and let

𝐏:=[2ε1-2ε0ε1-2εε01-2ε2ε].{\bf P}:=\left[\begin{array}[]{ccc}2\varepsilon&1-2\varepsilon&0\\ \varepsilon&1-2\varepsilon&\varepsilon\\ 0&1-2\varepsilon&2\varepsilon\end{array}\right].

We invite the reader to perform the computations necessary to verify that 𝐏{\bf P} is reversible with π=[ε,1-2ε,ε]\pi=[\varepsilon,1-2\varepsilon,\varepsilon] and

(EiTj)=ε-1(1-2ε)-1[0ε11-ε01-ε1ε0],(E_{i}T_{j})=\varepsilon^{-1}(1-2\varepsilon)^{-1}\left[\begin{array}[]{ccc}0&% \varepsilon&1\\ 1-\varepsilon&0&1-\varepsilon\\ 1&\varepsilon&0\end{array}\right],

so that (EπTi)=ε-1(1-2ε)-1[1-2ε+2ε2,2ε2,1-2ε+2ε2](E_{\pi}T_{i})=\varepsilon^{-1}(1-2\varepsilon)^{-1}[1-2\varepsilon+2% \varepsilon^{2},2\varepsilon^{2},1-2\varepsilon+2\varepsilon^{2}]. Thus EπTiE_{\pi}T_{i} is minimized uniquely by i*=2i^{*}=2, while maxi,jEiTj\max_{i,j}E_{i}T_{j} is attained only by the pairs (1,3)(1,3) and (3,1)(3,1).

As a second instance of what reversibility implies, note, from (3.33) and the definition of ZijZ_{ij}, that

Zij=πi-1/2πj1/2m2λm-1uimujm.Z_{ij}=\pi^{-1/2}_{i}\pi^{1/2}_{j}\sum_{m\geq 2}\lambda^{-1}_{m}u_{im}u_{jm}.

This implies

the symmetrized matrix πi1/2Zijπj-1/2 is positive semidefinite.\mbox{the symmetrized matrix\ }\pi^{1/2}_{i}Z_{ij}\pi^{-1/2}_{j}\mbox{\ is % positive semidefinite.} (3.42)

Note that a symmetric positive semidefinite matrix (Mij)(M_{ij}) has the property Mij2MiiMjjM^{2}_{ij}\leq M_{ii}M_{jj}. This gives

Zij2ZiiZjjπj/πi,Z^{2}_{ij}\leq Z_{ii}Z_{jj}\pi_{j}/\pi_{i}, (3.43)

which enables us to upper-bound mean hitting times from arbitrary starts in terms of mean hitting times from stationary starts.

Lemma 3.15

maxijEiTj2maxkEπTk\max_{ij}E_{i}T_{j}\leq 2\max_{k}E_{\pi}T_{k}.

Proof. Using (3.43),

(Zij/πj)2(Zii/πi)(Zjj/πj)(Z_{ij}/\pi_{j})^{2}\leq(Z_{ii}/\pi_{i})\ (Z_{jj}/\pi_{j})

and so

-Zij/πjmaxkZkk/πk.-Z_{ij}/\pi_{j}\leq\max_{k}Z_{kk}/\pi_{k}.

So the mean hitting time formula gives the two equalities in

EiTj=Zjjπj-Zijπj2maxkZkkπk=2maxkEπTk.           E_{i}T_{j}=\frac{Z_{jj}}{\pi_{j}}-\frac{Z_{ij}}{\pi_{j}}\leq 2\max_{k}\frac{Z_% {kk}}{\pi_{k}}=2\max_{k}E_{\pi}T_{k}.\qquad\qquad~{}\ \ \rule{4.3pt}{4.3pt}