7 Symmetric Graphs and Chains (January 31, 1994)

7.1 Symmetric reversible chains

7.1.1 Definitions

Consider an irreducible transition matrix 𝐏=(pij){\bf P}=(p_{ij}) on a finite state space II. A symmetry of 𝐏{\bf P} is a 1-11-1 map γ:II\gamma:I\rightarrow I such that

pγ(i)γ(j)=pij for all i,j.p_{\gamma(i)\gamma(j)}=p_{ij}\mbox{ for all }i,j.

The set Γ\Gamma of symmetries forms a group under convolution, and in our (non-standard) terminology a symmetric Markov transition matrix is one for which Γ\Gamma acts transitively, i.e.

for all i,jI there exists γΓ such that γ(i)=j.\mbox{for all }i,j\in I\mbox{ there exists }\gamma\in\Gamma\mbox{ such that }% \gamma(i)=j.

Such a chain need not be reversible; a symmetric reversible chain is just a chain which is both symmetric and reversible. A natural setting is where II is itself a group under an operation (i,j)ij(i,j)\rightarrow ij which we write multiplicitively. If μ\mu is a probability distribution on II and (Zt;t1)(Z_{t};t\geq 1) are i.i.d. II-valued with distribution μ\mu then

Xt=x0Z1Z2ZtX_{t}=x_{0}Z_{1}Z_{2}\ldots Z_{t} (7.1)

is the symmetric Markov chain with transition probabilities


started at x0x_{0}. This chain is reversible iff

μ(i)=μ(i-1) for all i.\mu(i)=\mu(i^{-1})\mbox{ for all }i. (7.2)

We have rather painted ourselves into a corner over terminology. The usual terminology for the process (7.1) is “random walk on the group II” and if (7.2) holds then it is a

 “symmetric random walk on the group II” .\mbox{ ``symmetric random walk on the group $I$" }. (7.3)

Unfortunately in this phrase, both “symmetric” and “walk” conflict with our conventions, so we can’t use the phrase. Instead we will use “random flight on the group II” for a process (7.1), and “reversible random flight on the group II” when (7.2) also holds. Note that we always assume chains are irreducible, which in the case of a random flight holds iff the support of μ\mu generates the whole group II. Just keep in mind that the topic of this section, symmetric reversible chains, forms a minor generalization of the processes usually described by (7.3).

On an graph (𝒱,)(\mbox{${\cal V}$},\mbox{${\cal E}$}), a graph automorphism is a 1-11-1 map γ:𝒱𝒱\gamma:\mbox{${\cal V}$}\rightarrow\mbox{${\cal V}$} such that

(γ(w),γ(v)){\cal E} iff (w,v).(\gamma(w),\gamma(v))\in\mbox{${\cal E}$}\mbox{ iff }(w,v)\in\mbox{${\cal E}$}.

The graph is called vertex-transitive if the automorphism group acts transitively on vertices. Clearly, random walk on a (unweighted) graph is a symmetric reversible chain iff the graph is vertex-transitive. We specialize to this case in section 7.1.8. A further specialization is to random walk on a Cayley graph. If 𝒢=(gi)\mbox{${\cal G}$}=(g_{i}) is a set of generators of a group II, which we always assume to satisfy

g𝒢{\cal G} implies g-1𝒢g\in\mbox{${\cal G}$}\mbox{ implies }g^{-1}\in\mbox{${\cal G}$}

then the associated Cayley graph has vertex-set II and edge-set

{(v,vg):vI,g𝒢}.\{(v,vg):v\in I,g\in\mbox{${\cal G}$}\}.

A Cayley graph is vertex-transitive.

Finally, recall from Chapter 3 yyy that we can identify a reversible chain with a random walk on a weighted graph. With this identification, a symmetric reversible chain is one where the weighted graph is vertex-transitive, in the natural sense.

7.1.2 This section goes into Chapter 3

Lemma 7.1

For an irreducible reversible chain, the following are equivalent.

(a) Pi(Xt=i)=Pj(Xt=j),i,jI,t1P_{i}(X_{t}=i)=P_{j}(X_{t}=j),i,j\in I,t\geq 1

(b) Pi(Tj=t)=Pj(Ti=t),i,jI,t1P_{i}(T_{j}=t)=P_{j}(T_{i}=t),i,j\in I,t\geq 1.

Proof. In either case the stationary distribution is uniform – under (a), by letting tt\rightarrow\infty, and under (b) by taking t=1t=1, implying pijpjip_{ij}\equiv p_{ji}. So by reversibility Pi(Xt=j)=Pj(Xt=i)P_{i}(X_{t}=j)=P_{j}(X_{t}=i) for iji\neq j and t1t\geq 1. But recall from Chapter 2 Lemma yyy that the generating functions Gij(z)=tPi(Xt=j)zjG_{ij}(z)=\sum_{t}P_{i}(X_{t}=j)z^{j} and Fij(z)=tPi(Tt=j)zjF_{ij}(z)=\sum_{t}P_{i}(T_{t}=j)z^{j} satisfy

Fij=Gij/Gjj.F_{ij}=G_{ij}/G_{jj}. (7.4)

For iji\neq j we have seen that Gij=GjiG_{ij}=G_{ji}, and hence by (7.4)

Fij=Fji iff Gjj=Gii,F_{ij}=F_{ji}\mbox{ iff }G_{jj}=G_{ii},

which is the assertion of Lemma 7.1.

7.1.3 Elementary properties

Our standing assumption is that we have an irreducible symmetric reversible nn-state chain. The symmetry property implies that the stationary distribution π\pi is uniform, and also implies

Pi(Xt=i)=Pj(Xt=j),i,jI,t1.P_{i}(X_{t}=i)=P_{j}(X_{t}=j),i,j\in I,t\geq 1. (7.5)

But by Chapter 3 Lemma yyy, under reversibility (7.5) is equivalent to

Pi(Tj=t)=Pj(Ti=t),i,jI,t1.P_{i}(T_{j}=t)=P_{j}(T_{i}=t),i,j\in I,t\geq 1. (7.6)

And clearly (7.6) implies

EiTj=EjTi for all i,j.E_{i}T_{j}=E_{j}T_{i}\mbox{ for all }i,j. (7.7)

We make frequent use of these properties. Incidently, (7.7) is in general strictly weaker than (7.6): van Slijpe [330] p. 288 gives an example with a 33-state reversible chain.

We also have, from the definition of symmetric, that EπTiE_{\pi}T_{i} is constant in ii, and hence

EπTi=τ0 for all i.E_{\pi}T_{i}=\tau_{0}\mbox{ for all }i. (7.8)

So by Chapter 4 yyy

τ*4τ0.\tau^{*}\leq 4\tau_{0}. (7.9)

The formula for EπTiE_{\pi}T_{i} in terms of the fundamental matrix (Chapter 2 yyy) can be written as

τ0/n=1+t=1(Pi(Xt=i)-1/n).\tau_{0}/n=1+\sum_{t=1}^{\infty}(P_{i}(X_{t}=i)-1/n). (7.10)

Approximating τ0\tau_{0} by the first few terms is what we call the local transience heuristic. See Chapter xxx for rigorous discussion.

Lemma 7.2

(i) EiTjn1+p(i,j),  jiE_{i}T_{j}\geq\frac{n}{1+p(i,j)},\ j\neq i.

(ii) maxi,jEiTj2τ0\max_{i,j}E_{i}T_{j}\leq 2\tau_{0}

Proof. (i) This is a specialization of Chapter 6 xxx.

(ii) For any i,j,ki,j,k,

EiTjEiTk+EkTj=EiTk+EjTk.E_{i}T_{j}\leq E_{i}T_{k}+E_{k}T_{j}=E_{i}T_{k}+E_{j}T_{k}.

Averaging over kk, the right side becomes 2τ02\tau_{0}.

Recall that a simple Cauchy-Schwartz argument (Chapter 3 yyy) shows that, for any reversible chain whose stationary distribution is uniform,


So by (7.5), for a symmetric reversible chain, the most likely place to be after 2t2t steps is where you started:

Corollary 7.3

Pi(X2t=j)Pi(X2t=i), for all i,j,I,t1P_{i}(X_{2t}=j)\leq P_{i}(X_{2t}=i),\mbox{ for all }i,j,\in I,t\geq 1.

This type of result is nicer in continuous time, where the inequality holds for all times.

7.1.4 Hitting times

Here is our first non-trivial result, from Aldous [18].

Theorem 7.4

Suppose a sequence of symmetric reversible chains satisfies τ2/τ00\tau_{2}/\tau_{0}\rightarrow 0. Then

(a) For the stationary chain, and for arbitrary jj, we have Tj/τ0dξT_{j}/\tau_{0}\ \stackrel{d}{\rightarrow}\ \xi and var(Tj/τ0)1{\rm var}\ (T_{j}/\tau_{0})\rightarrow 1, where ξ\xi has exponential(1) distribution.

(b) maxi,jEiTj/τ01\max_{i,j}E_{i}T_{j}/\tau_{0}\rightarrow 1.

(c) If (in,jn)(i_{n},j_{n}) are such that EinTjn/τ01E_{i_{n}}T_{j_{n}}/\tau_{0}\rightarrow 1 then Pin(Tjn/τ0)dξP_{i_{n}}(T_{j_{n}}/\tau_{0}\in\cdot)\ \stackrel{d}{\rightarrow}\ \xi.

Note that, because τ2τ1+1\tau_{2}\leq\tau_{1}+1 and τ0(n-1)2/n\tau_{0}\geq(n-1)^{2}/n, the hypothesis “τ2/τ00\tau_{2}/\tau_{0}\rightarrow 0” is weaker than either “τ2/n0\tau_{2}/n\rightarrow 0” or “τ1/τ00\tau_{1}/\tau_{0}\rightarrow 0”.

Part (a) is a specialization of Chapter 3 Proposition yyy and its proof. Parts (b) and (c) use refinements of the same technique. Part (b) implies

 if τ2/τ00 then τ*2τ0.\mbox{ if }\tau_{2}/\tau_{0}\rightarrow 0\mbox{ then }\tau^{*}\sim 2\tau_{0}.

Because this applies in many settings in this Chapter, we shall rarely need to discuss τ*\tau^{*} further.

xxx give proof

In connection with (b), note that

EvTwτ1(2)+τ0E_{v}T_{w}\leq\tau_{1}^{(2)}+\tau_{0} (7.11)

by definition of τ1(2)\tau_{1}^{(2)} and vertex-transitivity. So (b) is obvious under the slightly stronger hypothesis τ1/τ00\tau_{1}/\tau_{0}\rightarrow 0.

Chapter 3 Proposition yyy actually gives information on hitting times TAT_{A} to more general subsets AA of vertices. Because (Chapter 3 yyy) EπTA(1-π(A))2π(A)E_{\pi}T_{A}\geq\frac{(1-\pi(A))^{2}}{\pi(A)}, we get (in continuous time) a quantification of the fact that TAT_{A} has approximately exponential distribution when |A|n/τ2|A|\ll n/\tau_{2} and when the chain starts with the uniform distribution:

supt|Pπ(TA>t)-exp(-t/EπTA)|τ2n|A|(1-|A|n)-2.\sup_{t}|P_{\pi}(T_{A}>t)-\exp(-t/E_{\pi}T_{A})|\leq\frac{\tau_{2}n}{|A|}\ % \left(1-\frac{|A|}{n}\right)^{-2}.

7.1.5 Cover times

Recall the cover time CC from Chapter 6. By symmetry, in our present setting EiCE_{i}C doesn’t depend on the starting place ii, so we can write ECEC. In this section we combine results on hitting times with various forms of Matthews method to obtain asymptotics for cover times in the setting of a sequence of symmetric reversible chains. Experience, and the informal argument above (7.15), suggest the principle

ECτ0logn, except for chains resembling random walk on the nn-cycle .EC\sim\tau_{0}\log n,\mbox{ except for chains resembling random walk on the $n% $-cycle .} (7.12)

The results in this chapter concerning cover times go some way towards formalizing this principle.

Corollary 7.5

For a sequence of symmetric reversible chains

(a) 1-o(1)1+p*nlognEC(2+o(1))τ0logn\frac{1-o(1)}{1+p^{*}}\ n\log n\leq EC\leq(2+o(1))\tau_{0}\log n, where p*maxjipi,jp^{*}\equiv\max_{j\neq i}p_{i,j}.

(b) If τ2/τ00\tau_{2}/\tau_{0}\rightarrow 0 then EC(1+o(1))τ0lognEC\leq(1+o(1))\tau_{0}\log n.

(c) If τ2/τ0=O(n-β)\tau_{2}/\tau_{0}=O(n^{-\beta}) for fixed 0<β<10<\beta<1 then

EC(β-o(1))τ0logn.EC\geq(\beta-o(1))\tau_{0}\log n.

Proof. Using the basic form of Matthews method (Chapter 2 yyy), (a) follows from Lemma 7.2 and (b) from Theorem 7.4. To prove (c), fix a state jj and ε>0\varepsilon>0. Using (7.11) and Markov’s inequality,

π{i:EiTj(1-ε)τ0}τ1(2)ετ0α, say.\pi\{i:E_{i}T_{j}\leq(1-\varepsilon)\tau_{0}\}\leq\frac{\tau_{1}^{(2)}}{% \varepsilon\tau_{0}}\equiv\alpha,\mbox{ say.}

So we can inductively choose α-1\lceil\alpha^{-1}\rceil vertices iki_{k} such that

EikTil>(1-ε)τ0; 1k<lα-1.E_{i_{k}}T_{i_{l}}>(1-\varepsilon)\tau_{0};\ 1\leq k<l\leq\lceil\alpha^{-1}\rceil.

By the extended form of Matthews method (Chapter 6 Corollary yyy)


From Chapter 4 yyy, τ1τ2(1+logn)\tau_{1}\leq\tau_{2}(1+\log n) and so the hypothesis implies τ1/τ0=O(nε-β)\tau_{1}/\tau_{0}=O(n^{\varepsilon-\beta}). So the asymptotic lower bound for ECEC becomes (1-ε)τ0(β-ε)logn(1-\varepsilon)\tau_{0}(\beta-\varepsilon)\log n, and since ε\varepsilon is arbitrary the result follows.

Since the only natural examples with τ1/τ0↛0\tau_{1}/\tau_{0}\not\rightarrow 0 are variations of random walk on the nn-cycle, for which EC=Θ(τ0)EC=\Theta(\tau_{0}) without the “logn\log n” term, we expect a positive answer to

Open Problem 7.6

In the setting of Corollary 7.5, is EC(1+o(1))τ0lognEC\leq(1+o(1))\tau_{0}\log n without further hypotheses?

Here is an artificial example to illustrate the bound in (c).

Example 7.7

Two time scales.

Take m1=m1(n),m2=m2(n)m_{1}=m_{1}(n),m_{2}=m_{2}(n) such that m1n1-β,m1m2nm_{1}\sim n^{1-\beta},m_{1}m_{2}\sim n. The underlying idea is to take two continuous-time random walks on the complete graphs Km1K_{m_{1}} and Km2K_{m_{2}}, but with the walks run on different time scales. To set this up directly in discrete time, take state space {(x,y):1xm1,1ym2}\{(x,y):1\leq x\leq m_{1},1\leq y\leq m_{2}\} and transition probabilities

(x,y)\displaystyle(x,y) (x,y)\displaystyle\rightarrow(x^{\prime},y) chance (m1-1)-1(1-1am1logm1),  xx\displaystyle\mbox{ chance }(m_{1}-1)^{-1}\left(1-\frac{1}{am_{1}\log m_{1}}% \right),\ x^{\prime}\neq x
(x,y)\displaystyle\rightarrow(x,y^{\prime}) chance (m2-1)-11am1logm1,  yy\displaystyle\mbox{ chance }(m_{2}-1)^{-1}\frac{1}{am_{1}\log m_{1}},\ y^{% \prime}\neq y

where a=a(n)a=a(n)\uparrow\infty slowly. It is not hard to formalize the following analysis. Writing the chain as (Xt,Yt)(X_{t},Y_{t}), the YY-component stays constant for time Θ(am1logm1)\Theta(am_{1}\log m_{1}), during which time every xx-value is hit, because the cover time for Km1K_{m_{1}} is m1logm1\sim m_{1}\log m_{1}. And m2logm2m_{2}\log m_{2} jumps of the YY-component are required to hit every yy-value, so

EC(m2logm2)×(am1logm1)an(logm1)(logm2).EC\sim(m_{2}\log m_{2})\times(am_{1}\log m_{1})\sim an(\log m_{1})(\log m_{2}). (7.13)

Now τ2am1logm1\tau_{2}\sim am_{1}\log m_{1}, and because the mean number of returns to the starting point before the first YY-jump is alogm1\sim a\log m_{1} we can use the local transience heuristic (7.10) to see τ0(alogm1)×n\tau_{0}\sim(a\log m_{1})\times n. So τ2/τ0m1/nn-β\tau_{2}/\tau_{0}\sim m_{1}/n\sim n^{-\beta}, and the lower bound from (c) is

(β-o(1))(alogm1)nlogn.(\beta-o(1))(a\log m_{1})n\log n.

But this agrees with the exact limit (7.13), because m2nβm_{2}\sim n^{\beta}.

We now turn to sharper distributional limits for CC. An (easy) background fact is that, for independent random variables (Zi)(Z_{i}) with exponential, mean τ\tau, distribution,

max(Z1,,Zn)-τlognτdη\frac{\max(Z_{1},\ldots,Z_{n})-\tau\log n}{\tau}\ \stackrel{d}{\rightarrow}\ \eta

where η\eta has the extreme value distribution

P(ηx)=exp(-e-x),-<x<.P(\eta\leq x)=\exp(-e^{-x}),\ -\infty<x<\infty. (7.14)

Now the cover time C=maxiTiC=\max_{i}T_{i} is the max of the hitting times, and with the uniform initial distribution the TiT_{i}’s have mean τ0\tau_{0}. So if the TiT_{i}’s have approximately exponential distribution and are roughly independent of each other then we anticipate the limit result

C-τ0lognτ0dη.\frac{C-\tau_{0}\log n}{\tau_{0}}\ \stackrel{d}{\rightarrow}\ \eta. (7.15)

Theorem 7.4 has already given us a condition for limit exponential distributions, and we shall build on this result to give (Theorem 7.9) conditions for (7.15) to hold.

The extreme value distribution (7.14) has transform

Eexp(θη)=Γ(1-θ),  -<θ<1.E\exp(\theta\eta)=\Gamma(1-\theta),\ -\infty<\theta<1. (7.16)

Classical probability theory (see Notes) says that to prove (7.15) it is enough to show that transforms converge, i.e. to show

Eexp(θC/τ0)n-θΓ(1-θ),  -<θ<1.E\exp(\theta C/\tau_{0})\sim n^{-\theta}\Gamma(1-\theta),\ -\infty<\theta<1. (7.17)

But Matthews method, which previously we have used on expectations, can just as well be applied to transforms. By essentially the same argument as in Chapter 2 Theorem yyy, Matthews [258] obtained

Proposition 7.8

The cover time CC in a not-necessarily-reversible Markov chain with arbitrary initial distribution satisfies

Γ(n+1)Γ(1/f*(β))Γ(n+1/f*(β))Eexp(βC)Γ(n+1)Γ(1/f*(β))Γ(n+1/f*(β))\frac{\Gamma(n+1)\Gamma(1/f_{*}(\beta))}{\Gamma(n+1/f_{*}(\beta))}\leq E\exp(% \beta C)\leq\frac{\Gamma(n+1)\Gamma(1/f^{*}(\beta))}{\Gamma(n+1/f^{*}(\beta))}


f*(β)maxjiEiexp(βTj)\displaystyle f^{*}(\beta)\equiv\max_{j\neq i}E_{i}\exp(\beta T_{j})
f*(β)minjiEiexp(βTj).\displaystyle f_{*}(\beta)\equiv\min_{j\neq i}E_{i}\exp(\beta T_{j}).

Substituting into (7.17), and using the fact

Γ(n+1)Γ(n+1-sn)ns as n,  sns\frac{\Gamma(n+1)}{\Gamma(n+1-s_{n})}\sim n^{s}\mbox{ as }n\rightarrow\infty,% \ s_{n}\rightarrow s

we see that to establish (7.15) it suffices to prove that for arbitrary jninj_{n}\neq i_{n} and for each fixed -<θ<1-\infty<\theta<1,

Einexp(θTjn/τ0)11-θ.E_{i_{n}}\exp(\theta T_{j_{n}}/\tau_{0})\rightarrow\frac{1}{1-\theta}. (7.18)
Theorem 7.9

For a sequence of symmetric reversible chains, if

(a) minjiEiTj=τ0(1-o(1))\min_{j\neq i}E_{i}T_{j}=\tau_{0}(1-o(1))

(b) τ2/τ0=o(1logn)\tau_{2}/\tau_{0}=o\left(\frac{1}{\log n}\right)


C-τ0lognτ0dη.\frac{C-\tau_{0}\log n}{\tau_{0}}\ \stackrel{d}{\rightarrow}\ \eta.

Proof. By hypothesis (a) and Theorem 7.4 (b,c), for arbitrary jninj_{n}\neq i_{n} we have Pin(Tjn/τ0)dξP_{i_{n}}(T_{j_{n}}/\tau_{0}\in\cdot)\ \stackrel{d}{\rightarrow}\ \xi. This implies (7.18) for θ0\theta\leq 0, and also by Fatou’s lemma implies lim infnEinexp(θTjn/τ0)11-θ\liminf_{n}E_{i_{n}}\exp(\theta T_{j_{n}}/\tau_{0})\geq\frac{1}{1-\theta}for 0<θ<10<\theta<1. Thus it is sufficient to prove

maxjiEiexp(θTj/τ0)1+o(1)1-θ, 0<θ<1.\max_{j\neq i}E_{i}\exp(\theta T_{j}/\tau_{0})\leq\frac{1+o(1)}{1-\theta},\ 0<% \theta<1. (7.19)

The proof exploits some of our earlier general inequalities. Switch to continuous time. Fix β>0\beta>0. By conditioning on the position at some fixed time ss,

Eiexp(β(Tj-s)+)maxx(nPi(Xs=x))×Eπexp(βTj).E_{i}\exp(\beta(T_{j}-s)^{+})\leq\max_{x}(nP_{i}(X_{s}=x))\times E_{\pi}\exp(% \beta T_{j}).

By Corollary 7.3 the max is attained by x=ix=i, and so

Eiexp(βTj)(nPi(Xs=i)eβs)×Eπexp(βTj).E_{i}\exp(\beta T_{j})\leq(nP_{i}(X_{s}=i)e^{\beta s})\times E_{\pi}\exp(\beta T% _{j}).

We now apply some general inequalities. Chapter 4 yyy says nPi(Xs=i)1+nexp(-s/τ2)nP_{i}(X_{s}=i)\leq 1+n\exp(-s/\tau_{2}). Writing αj\alpha_{j} for the quasistationary distribution on {j}c\{j\}^{c}, Chapter 3 (yyy) implies Pπ(Tj>t)exp(-t/EαjTj)P_{\pi}(T_{j}>t)\leq\exp(-t/E_{\alpha_{j}}T_{j}) and hence

Eπexp(βTj)11-βEαjTj.E_{\pi}\exp(\beta T_{j})\leq\frac{1}{1-\beta E_{\alpha_{j}}T_{j}}.

But Chapter 3 Theorem yyy implies EαjTjτ0+τ2E_{\alpha_{j}}T_{j}\leq\tau_{0}+\tau_{2}. So setting β=θ/τ0\beta=\theta/\tau_{0}, these inequalities combine to give

Eiexp(θTj/τ0)(1+nexp(-s/τ2))×exp(θs/τ0)×11-θ(1+τ2/τ0).E_{i}\exp(\theta T_{j}/\tau_{0})\leq(1+n\exp(-s/\tau_{2}))\times\exp(\theta s/% \tau_{0})\times\frac{1}{1-\theta(1+\tau_{2}/\tau_{0})}.

But by hypothesis (b) we can choose s=o(τ0)=Ω(τ2logn)s=o(\tau_{0})=\Omega(\tau_{2}\log n) so that each of the first two terms in the bound tends to 11, establishing (7.19). Finally, the effect of continuization is to change CC by at most O(EC)O(\sqrt{EC}), so the asymptotics remain true in discrete time.

Remark. Presumably (c.f. Open Problem 7.6) the Theorem remains true without hypothesis (b).

In view of Chapter 6 yyy it is surprising that there is no obvious example to disprove

Open Problem 7.10

Let VV denote the last state to be hit. In a sequence of vertex-transitive graphs with nn\rightarrow\infty, is it always true that VV converges (in variation distance, say) to the uniform distribution?

7.1.6 Product chains

In our collection of examples in Chapter 5 of random walks on graphs, the examples with enough symmetry to fit into the present setting have in fact extra symmetry, enough to fit into the arc-transitive setting of section 7.2. So in a sense, working at the level of generality of symmetric reversible chains merely serves to illustrate what properties of chains depend only on this minimal level of symmetry. But let us point out a general construction. Suppose we have symmetric reversible chains X(1),,X(d)X^{(1)},\ldots,X^{(d)} on state spaces I(1),,I(d)I^{(1)},\ldots,I^{(d)}. Fix constants a1,,ada_{1},\ldots,a_{d} with each ai>0a_{i}>0 and with iai=1\sum_{i}a_{i}=1. Then (c.f. Chapter 4 section yyy) we can define a “product chain” with state-space I(1)××I(d)I^{(1)}\times\ldots\times I^{(d)} and transition probabilities

(x1,,xd)(x1,,xi,,xd): probability aiP(X1(i)=xi|X0(i)=xi).(x_{1},\ldots,x_{d})\rightarrow(x_{1},\ldots,x^{\prime}_{i},\ldots,x_{d})\mbox% {: probability }a_{i}P(X^{(i)}_{1}=x^{\prime}_{i}|X^{(i)}_{0}=x_{i}).

This product chain is also symmetric reversible. But if the underlying chains have extra symmetry properties, these extra properties are typically lost when one passes to the product chain. Thus we have a general method of constructing symmetric reversible chains which lack extra structure. Example 7.14 below gives a case with distinct underlying components, and Example 7.11 gives a case with a non-uniform product. In general, writing (λu(i):1u|I(i)|)(\lambda^{(i)}_{u}:1\leq u\leq|I^{(i)}|) for the continuous-time eigenvalues of X(i)X^{(i)}, we have (Chapter 4 yyy) that the continuous-time eigenvalues of the product chain are

λ𝐮=a1λu1(1)++adλud(d)\lambda_{{\bf u}}=a_{1}\lambda^{(1)}_{u_{1}}+\ldots+a_{d}\lambda^{(d)}_{u_{d}}

indexed by 𝐮=(u1,,ud){1,,|I(1)|}××{1,,|I(d)|}{\bf u}=(u_{1},\ldots,u_{d})\in\{1,\ldots,|I^{(1)}|\}\times\ldots\times\{1,% \ldots,|I^{(d)}|\}. So in particular

τ0=𝐮(1,,1)1a1λu1(1)++adλud(d)\tau_{0}=\sum_{{\bf u}\neq(1,\ldots,1)}\frac{1}{a_{1}\lambda^{(1)}_{u_{1}}+% \ldots+a_{d}\lambda^{(d)}_{u_{d}}}

and of course these parameters take the same values in discrete time.

Example 7.11

Coordinate-biased random walk on the dd-cube.

Take I={0,1}dI=\{0,1\}^{d} and fix 0<a1a2ad0<a_{1}\leq a_{2}\leq\ldots\leq a_{d} with iai=1\sum_{i}a_{i}=1. Then the chain with transitions

(b1,,bd)(b1,,1-bi,,bd) : probability ai(b_{1},\ldots,b_{d})\rightarrow(b_{1},\ldots,1-b_{i},\ldots,b_{d})\mbox{ : % probability }a_{i}

is the weighted product of two-state chains. Most of the calculations for simple symmetric random walk on the dd-cube done in Chapter 5 Example yyy extend to this example, with some increase of complexity. In particular,

τ0=12𝐮I,𝐮𝟎1i=1duiai.\tau_{0}=\frac{1}{2}\sum_{{\bf u}\in I,{\bf u}\neq{\bf 0}}\frac{1}{\sum_{i=1}^% {d}u_{i}a_{i}}.

In continuout time we still get the product form for the distribution at time tt:

Pb(Xt=b)=2-di(1+ηiexp(-2ait)) ; ηi=1 if bi=bi,=0 if not.P_{b}(X_{t}=b^{\prime})=2^{-d}\prod_{i}(1+\eta_{i}\exp(-2a_{i}t))\mbox{ ; }% \eta_{i}=1\mbox{ if }b^{\prime}_{i}=b_{i},\ =0\mbox{ if not.}

So in a sequence of continuous time chains with dd\rightarrow\infty, the “separation” parameter τ1(1)\tau^{(1)}_{1} of Chapter 3 section yyy is asymptotic to the solution tt of


More elaborate calculations can be done to study τ1\tau_{1} and the discrete-time version.

7.1.7 The cutoff phenomenon and the upper bound lemma

Chapter 2 yyy and Chapter 4 yyy discussed quantifications of notions of “time to approach stationarity” using variation distance. The emphasis in Chapter 4 yyy was on inequalities which hold up to universal constants. In the present context of symmetric reversible chains, one can seek to do sharper calculations. Thus for random walk on the dd-cube (Chapter 5 Example yyy), with chances 1/(d+1)1/(d+1) of making each possible step or staying still, writing n=2dn=2^{d} and cn=14dlogdc_{n}=\frac{1}{4}d\log d, we have (as nn\rightarrow\infty) not only the fact τ1cn\tau_{1}\sim c_{n} but also the stronger result

d((1+ε)cn)0 and d((1-ε)cn)1, for all ε>0.d((1+\varepsilon)c_{n})\rightarrow 0\mbox{ and }d((1-\varepsilon)c_{n})% \rightarrow 1\mbox{, for all }\varepsilon>0. (7.20)

We call this the cutoff phenomenon, and when a sequence of chains satisfies (7.20) we say the sequence has “variation cutoff at cnc_{n}”. As mentioned at xxx, the general theory of Chapter 4 works smoothly using d¯(t)\bar{d}(t), but in examples it is more natural to use d(t)d(t), which we shall do in this chapter. Clearly, (7.20) implies the same result for d¯\bar{d} and implies τ1cn\tau_{1}\sim c_{n}. Also, our convention in this chapter is to work in discrete time, whereas the Chapter 4 general theory worked more smoothly in continuous time. (Clearly (7.20) in discrete time implies the same result for the continuized chains, provided cnc_{n}\rightarrow\infty). Note that, in the context of symmetric reversible chains,

d(t)=di(t)=||Pi(Xt)-π()|| for each i.d(t)=d_{i}(t)=||P_{i}(X_{t}\in\cdot)-\pi(\cdot)||\mbox{ for each }i.

We also can discuss separation distance (Chapter 4 yyy) which in this context is

s(t)=1-nminjPi(Xt=j) for each i,s(t)=1-n\min_{j}P_{i}(X_{t}=j)\mbox{ for each }i,

and introduce the analogous notion of separation threshold.

It turns out that these cut-offs automatically appear in sequences of chains defined by repeated products. An argument similar to the analysis of the dd-cube (see [8] for a slightly different version) shows

Lemma 7.12

Fix an aperiodic symmetric reversible chain with mm states and with relaxation time τ2=1/(1-λ2)\tau_{2}=1/(1-\lambda_{2}). Consider the dd-fold product chain with n=mdn=m^{d} states and transition probabilities

(x1,,xd)(x1,,yi,,yd) : probability 1dpxi,yi.(x_{1},\ldots,x_{d})\rightarrow(x_{1},\ldots,y_{i},\ldots,y_{d})\mbox{ : % probability }\frac{1}{d}\ p_{x_{i},y_{i}}.

As dd\rightarrow\infty, this sequence of chains has variation cutoff 12τ2dlogd\frac{1}{2}\tau_{2}d\log d and separation cut-off τ2dlogd\tau_{2}d\log d.

xxx discuss upper bound lemma

xxx heuristics

xxx mention later examples

7.1.8 Vertex-transitive graphs and Cayley graphs

So far we have worked in the setting of symmetric reversible chains, and haven’t used any graph theory. We now specialize to the case of random walk on a vertex-transitive or Cayley graph (𝒱,)(\mbox{${\cal V}$},\mbox{${\cal E}$}). As usual, we won’t write out all specializations of the previous results, but instead emphasize what extra we get from graph-theoretic arguments. Let dd be the degree of the graph.

Lemma 7.13

For random walk on a vertex-transitive graph,

(i) EvTxn if (v,x)E_{v}T_{x}\geq n\mbox{ if }(v,x)\not\in\mbox{${\cal E}$}

(ii) 2dnd+1-dEvTxdnd+1 if (v,x)\frac{2dn}{d+1}-d\geq E_{v}T_{x}\geq\frac{dn}{d+1}\mbox{ if }(v,x)\in\mbox{${% \cal E}$}

Proof. The lower bounds are specializations of Lemma 7.2(i), i.e. of Chapter 6 xxx. For the upper bound in (ii),

n-1\displaystyle n-1 =\displaystyle= 1dyxEyTx\displaystyle\frac{1}{d}\sum_{y\sim x}E_{y}T_{x} (7.21)
\displaystyle\geq 1d(EvTx+(d-1)dnd+1) by the lower bound in (ii).\displaystyle\frac{1}{d}\left(E_{v}T_{x}+(d-1)\frac{dn}{d+1}\right)\mbox{ by % the lower bound in (ii)}.


xxx mention general lower bound τ0(1-o(1))nd/(d-2)\tau_{0}\geq(1-o(1))nd/(d-2) via tree-cover.

It is known (xxx ref) that a Cayley graph of degree dd is dd-edge-connected, and so Chapter 6 Proposition yyy gives

τ*n2ψ(d)/d\tau^{*}\leq n^{2}\psi(d)/d

where ψ(d)/d2/d\psi(d)/d\approx\sqrt{2/d}.

Example 7.14

A Cayley graph where EvTwE_{v}T_{w} is not the same for all edges (v,w)(v,w).

Consider Zm×Z2Z_{m}\times Z_{2} with generators (1,0),(-1,0),(0,1)(1,0),(-1,0),(0,1). The figure illustrates the case m=4m=4.


Let’s calculate E00T10E_{00}T_{10} using the resistance interpretation. Put unit voltage at 1010 and zero voltage at 0000, and let aia_{i} be the voltage at i0i0. By symmetry the voltage at i1i1 is 1-ai1-a_{i}, so we get the equations

ai=13(ai-1+ai+1+(1-ai)), 1im-1a_{i}=\frac{1}{3}(a_{i-1}+a_{i+1}+(1-a_{i})),\ 1\leq i\leq m-1

with a0=am=0a_{0}=a_{m}=0. But this is just a linear difference equation, and a brief calculation gives the solution

ai=12-12θm/2-i+θi-m/2θm/2+θ-m/2a_{i}=\frac{1}{2}-\ \frac{1}{2}\frac{\theta^{m/2-i}+\theta^{i-m/2}}{\theta^{m/% 2}+\theta^{-m/2}}

where θ=2-3\theta=2-\sqrt{3}. The current flow is 1+2a11+2a_{1}, so the effective resistance is r=(1+2a1)-1r=(1+2a_{1})^{-1}. The commute interpretation of resistance gives 2E00T01=3nr2E_{00}T_{01}=3nr, and so


where n=2mn=2m is the number of vertices. In particular,

n-1E00T01γ31+3 as n.n^{-1}\ E_{00}T_{01}\rightarrow\gamma\equiv\frac{3}{1+\sqrt{3}}\mbox{ as }n% \rightarrow\infty.

Using the averaging property (7.21)

n-1E00T10γ332(1+3) as n.n^{-1}\ E_{00}T_{10}\rightarrow\gamma^{\prime}\equiv\frac{3\sqrt{3}}{2(1+\sqrt% {3})}\mbox{ as }n\rightarrow\infty.

Turning from hitting times to mixing times, recall the Cheeger constant


where AA is a proper subset of vertices and

c(A)π(Ac)Pπ(X1Ac|X0A).c(A)\equiv\frac{\pi(A^{c})}{P_{\pi}(X_{1}\in A^{c}|X_{0}\in A)}.

For random walk on a Cayley graph one can use simple “averaging” ideas to bound c(A)c(A). This is Proposition 7.15 below. The result in fact extends to vertex-transitive graphs by a covering graph argument - see xxx.

Consider a nn-vertex Cayley graph with degree dd and generators 𝒢={g1,,gd}\mbox{${\cal G}$}=\{g_{1},\ldots,g_{d}\}, where g𝒢g\in\mbox{${\cal G}$} implies g-1𝒢g^{-1}\in\mbox{${\cal G}$}. Then

Pπ(X1Ac|X0A)=1dg𝒢|AgA||A|P_{\pi}(X_{1}\in A^{c}|X_{0}\in A)=\frac{1}{d}\sum_{g\in\mbox{${\cal G}$}}% \frac{|Ag\setminus A|}{|A|}

where Ag={ag:aA}Ag=\{ag:a\in A\}. Lower bounding the sum by its maximal term, we get

c(A)dn|A||Ac|maxg𝒢|AgA|.c(A)\leq\frac{d}{n}\ \frac{|A|\ |A^{c}|}{\max_{g\in\mbox{${\cal G}$}}|Ag% \setminus A|}. (7.22)
Proposition 7.15

On a Cayley graph of degree dd

(i) τcdΔ\tau_{c}\leq d\Delta, where Δ\Delta is the diameter of the graph.

(ii) c(A)2dρ(A)c(A)\leq 2d\rho(A) for all AA with ρ(A)1\rho(A)\geq 1, where

ρ(A)minv𝒱maxwAd(v,w)\rho(A)\equiv\min_{v\in\mbox{${\cal V}$}}\max_{w\in A}d(v,w)

is the radius of AA.

Note that supAρ(A)\sup_{A}\rho(A) is bounded by Δ\Delta but not in general by Δ/2\Delta/2 (consider the cycle), so that (ii) implies (i) with an extra factor of 22. Part (i) is from Aldous [16] and (ii) is from Babai [36].

Proof. (i) Fix AA. Because

1nv𝒱|AAv|=|A|2/n\frac{1}{n}\sum_{v\in\mbox{${\cal V}$}}|A\cap Av|=|A|^{2}/n

there exists some v𝒱v\in\mbox{${\cal V}$} such that |AAv||A|2/n|A\cap Av|\leq|A|^{2}/n, implying

|AvA||A||Ac|/n.|Av\setminus A|\geq|A||A^{c}|/n. (7.23)

We can write v=g1g2gδv=g_{1}g_{2}\ldots g_{\delta} for some sequence of generators (gi)(g_{i}) and some δΔ\delta\leq\Delta, and

|AvA|i=1δ|Ag1giAg1gi-1|=i=1δ|AgiA|.|Av\setminus A|\leq\sum_{i=1}^{\delta}|Ag_{1}\ldots g_{i}\setminus Ag_{1}% \ldots g_{i-1}|=\sum_{i=1}^{\delta}|Ag_{i}\setminus A|.

So there exists g𝒢g\in\mbox{${\cal G}$} with |AgA|1Δ×|A||Ac|/n|Ag\setminus A|\geq\frac{1}{\Delta}\times|A||A^{c}|/n, and so (i) follows from (7.22). For part (ii), fix AA with |A|n/2|A|\leq n/2, write ρ=ρ(A)\rho=\rho(A) and suppose

maxg𝒢|AgA|<14ρ|A|.\max_{g\in\mbox{${\cal G}$}}|Ag\setminus A|<\frac{1}{4\rho}|A|. (7.24)

Fix vv with maxwAd(w,v)=ρ\max_{w\in A}d(w,v)=\rho. Since |AgA|<14ρ|A||Ag\setminus A|<\frac{1}{4\rho}|A| and

AAxg(AAg)(AAx)gA\setminus Axg\subseteq(A\setminus Ag)\cup(A\setminus Ax)g

we have by induction

|AAx|<14ρ|A|d(x,v).|A\setminus Ax|<\frac{1}{4\rho}|A|d(x,v). (7.25)

Write Br{vg1gi;ir,gi𝒢}B^{r}\equiv\{vg_{1}\ldots g_{i};i\leq r,g_{i}\in\mbox{${\cal G}$}\} for the ball of radius rr about vv. Since (2ρ+1)/(4ρ)<1(2\rho+1)/(4\rho)<1, inequality (7.25) shows that AAxA\cap Ax is non-empty for each xB2ρ+1x\in B^{2\rho+1}, and so B2ρ+1A-1AB^{2\rho+1}\subseteq A^{-1}A. But by definition of ρ\rho we have ABρA\subseteq B^{\rho}, implying B2ρ+1B2ρB^{2\rho+1}\subseteq B^{2\rho}, which in turn implies B2ρB^{2\rho} is the whole group. Now (7.25) implies that for every xx

|AxA|<12|A||A||Ac|n.|Ax\setminus A|<\frac{1}{2}|A|\leq\frac{|A||A^{c}|}{n}.

But this contradicts (7.23). So (7.24) is false, i.e.

maxg𝒢|AgA|14ρ|A|12ρ|A||Ac|n.\max_{g\in\mbox{${\cal G}$}}|Ag\setminus A|\geq\frac{1}{4\rho}|A|\geq\frac{1}{% 2\rho}\ \frac{|A||A^{c}|}{n}.

By complementation the final inequality remains true when |A|>n/2|A|>n/2, and the result follows from (7.22).

7.1.9 Comparison arguments for eigenvalues

The “distinguished paths” method of bounding relaxation times (Chapter 4 yyy) can also be used to compare relaxation times of two random flights on the same group, and hence to bound one “unknown” relaxation time in terms of a second “known” relaxation time. This approach has been developed in great depth in

xxx ref Diaconis Saloff-Coste papers.

Here we give only the simplest of their results, from [115].

Consider generators 𝒢{\cal G} of a group II, and consider a reversible random flight with step-distribution μ\mu supported on 𝒢{\cal G}. Write d(x,id)d(x,\mbox{id}) for the distance from xx to the identity in the Cayley graph, i.e. the minimal length of a word

x=g1g2gd;  gi𝒢.x=g_{1}g_{2}\ldots g_{d};\ g_{i}\in\mbox{${\cal G}$}.

For each xx choose some minimal-length word as above and define N(g,x)N(g,x) to be the number of occurences of gg in the word. Now consider a different reversible random flight on II with some step-distribution μ~\tilde{\mu}, not necessarily supported on 𝒢{\cal G}. If we know τ2~\tilde{\tau_{2}}, the next result allows us to bound τ2\tau_{2}.

Theorem 7.16
τ2τ2~Kmaxg𝒢1μ(g)xId(x,𝑖𝑑)N(g,x)μ~(x).\frac{\tau_{2}}{\tilde{\tau_{2}}}\leq K\equiv\max_{g\in\mbox{${\cal G}$}}\frac% {1}{\mu(g)}\sum_{x\in I}d(x,\mbox{id})N(g,x)\tilde{\mu}(x).

xxx give proof – tie up with L2L^{2} discussion

Perhaps surprisingly, Theorem 7.16 gives information even when the comparison walk is the “trivial” walk whose step-distribution μ~\tilde{\mu} is uniform on the group. In this case, both d(x,id)d(x,\mbox{id}) and N(g,x)N(g,x) are bounded by the diameter Δ\Delta, giving

Corollary 7.17

For reversible flight with step-distribution μ\mu on a group II,

τ2Δ2ming𝒢μ(g),\tau_{2}\leq\frac{\Delta^{2}}{\min_{g\in\mbox{${\cal G}$}}\mu(g)},

where 𝒢{\cal G} is the support of μ\mu and Δ\Delta is the diameter of the Cayley graph associated with 𝒢{\cal G}.

When μ\mu is uniform on 𝒢{\cal G} and |𝒢|=d|\mbox{${\cal G}$}|=d, the Corollary gives the bound dΔ2d\Delta^{2}, which improves on the bound 8d2Δ28d^{2}\Delta^{2} which follows from Proposition 7.15 and Cheeger’s inequality (Chapter 4 yyy). The examples of the torus ZNdZ^{d}_{N} show that Δ2\Delta^{2} enters naturally, but one could hope for the following variation.

Open Problem 7.18

Write τ*=τ*(I,𝒢)\tau_{*}=\tau_{*}(I,\mbox{${\cal G}$}) for the minimum of τ2\tau_{2} over all symmetric random flights on II with step-distribution supported on 𝒢{\cal G}. Is it true that τ*=O(Δ2)\tau_{*}=O(\Delta^{2})?