9 A Second Look at General Markov Chains (April 21, 1995)

9.2 Markov chains and spanning trees

9.2.1 General Chains and Directed Weighted Graphs

Let’s jump into the details and defer the discussion until later. Consider a finite irreducible discrete-time Markov chain (Xn)(X_{n}) with transition matrix 𝐏=(pvw){\bf P}=(p_{vw}), and note we are not assuming reversibility. We can identify 𝐏{\bf P} with a weighted directed graph, which has (for each (v,w)(v,w) with pvw>0p_{vw}>0) a directed edge (v,w)(v,w) with weight pvwp_{vw}. A directed spanning tree 𝐭{\bf t} is a spanning tree with one vertex distinguished as the root, and with each edge e=(v,w)e=(v,w) of 𝐭{\bf t} regarded as being directed towards the root. Write 𝒯{\cal T} for the set of directed spanning trees. For 𝐭𝒯{\bf t}\in\mbox{${\cal T}$} define

ρ¯(𝐭)(v,w)𝐭pvw.\bar{\rho}({\bf t})\equiv\prod_{(v,w)\in{\bf t}}p_{vw}.

Normalizing gives a probability distribution ρ\rho on 𝒯{\cal T}:

ρ(𝐭)ρ¯(𝐭)𝐭𝒯ρ¯(𝐭).\rho({\bf t})\equiv\frac{\bar{\rho}({\bf t})}{\sum_{{\bf t}^{\prime}\in\mbox{$% {\cal T}$}}\bar{\rho}({\bf t}^{\prime})}.

Now fix nn and consider the stationary Markov chain (Xm:-<mn)(X_{m}:-\infty<m\leq n) run from time minus infinity to time nn. We now use the chain to construct a random directed spanning tree 𝐓n{\bf T}_{n}. The root of 𝐓n{\bf T}_{n} is XnX_{n}. For each vXnv\neq X_{n} there was a final time, LvL_{v} say, before nn that the chain visited vv:

Lvmax{mn:Xm=v}.L_{v}\equiv\max\{m\leq n:X_{m}=v\}.

Define 𝐓n{\bf T}_{n} to consist of the directed edges

(v=XLv,XLv+1),vXn.(v=X_{L_{v}},X_{L_{v}+1}),\ v\neq X_{n}.

So the edges of 𝐓n{\bf T}_{n} are the last-exit edges from each vertex (other than the root XnX_{n}). It is easy to check that 𝐓n{\bf T}_{n} is a directed spanning tree.

Now consider what happens as nn changes. Clearly the process (𝐓n:-<n<)({\bf T}_{n}:-\infty<n<\infty) is a stationary Markov chain on 𝒯{\cal T}, with a certain transition matrix 𝐐=(q(𝐭,𝐭)){\bf Q}=(q({\bf t},{\bf t}^{\prime})), say. The figure below indicates a typical transition 𝐭𝐭{\bf t}\rightarrow{\bf t}^{\prime}. Here 𝐭{\bf t} was constructed by the chain finishing at its root vv, and 𝐭{\bf t}^{\prime} is the new tree obtained when the chain makes a transition vwv\rightarrow w.

\circ\circwxv𝐭{\bf t}\circ\circwxv𝐭{\bf t}^{\prime}
Theorem 9.10 (The Markov chain tree theorem)

The stationary distribution of (𝐓n)({\bf T}_{n}) is ρ\rho.

Proof. Fix a directed spanning tree 𝐭{\bf t}^{\prime}. We have to verify

𝐭ρ¯(𝐭)q(𝐭,𝐭)=ρ¯(𝐭).\sum_{{\bf t}}\bar{\rho}({\bf t})q({\bf t},{\bf t}^{\prime})=\bar{\rho}({\bf t% }^{\prime}). (9.13)

Write ww for the root of 𝐭{\bf t}^{\prime}. For each vertex xwx\neq w there is a tree 𝐭x{\bf t}_{x} constructed from 𝐭{\bf t}^{\prime} by adding an edge (w,x)(w,x) and then deleting from the resulting cycle the edge (v,w)(v,w) (say, for some v=v(x)v=v(x)) leading into ww. For x=wx=w set v(x)=xv(x)=x. It is easy to see that the only possible transitions into 𝐭{\bf t}^{\prime} are from the trees 𝐭x{\bf t}_{x}, and that

ρ¯(𝐭x)ρ¯(𝐭)=pwxpvw;    q(𝐭x,𝐭)=pvw.\frac{\bar{\rho}({\bf t}_{x})}{\bar{\rho}({\bf t}^{\prime})}=\frac{p_{wx}}{p_{% vw}};\ \ \ \ \ q({\bf t}_{x},{\bf t}^{\prime})=p_{vw}.

Thus the left side of (9.13) becomes

xρ¯(𝐭x)q(𝐭x,𝐭)=ρ¯(𝐭)xpwx=ρ¯(𝐭).\sum_{x}\bar{\rho}({\bf t}_{x})q({\bf t}_{x},{\bf t}^{\prime})=\bar{\rho}({\bf t% }^{\prime})\sum_{x}p_{wx}=\bar{\rho}({\bf t}^{\prime}).

\Box

The underlying chain XnX_{n} can be recovered from the tree-valued chain 𝐓n{\bf T}_{n} via Xn=root(𝐓n)X_{n}=\mbox{root}({\bf T}_{n}), so we can recover the stationary distribution of XX from the stationary distribution of TT, as follows.

Corollary 9.11 (The Markov chain tree formula)

For each vertex vv define

π¯(v)𝐭:v=𝑟𝑜𝑜𝑡(𝐭)ρ¯(𝐭).\bar{\pi}(v)\equiv\sum_{{\bf t}:\ v=\mbox{root}({\bf t})}\bar{\rho}({\bf t}).
π(v)π¯(v)wπ¯(w).\pi(v)\equiv\frac{\bar{\pi}(v)}{\sum_{w}\bar{\pi}(w)}.

Then π\pi is the stationary distribution of the original chain (Xn)(X_{n}).

See the Notes for comments on this classical result.

Theorem 9.10 and the definition of 𝐓0{\bf T}_{0} come close to specifying an algorithm for constructing a random spanning tree with distribution ρ\rho. Of course the notion of running the chain from time --\infty until time 00 doesn’t sound very algorithmic, but we can rephrase this notion using time-reversal. Regarding the stationary distribution π\pi as known, the time-reversed chain X*X^{*} has transition matrix pvw*πwpwv/πvp^{*}_{vw}\equiv\pi_{w}p_{wv}/\pi_{v}. Here is the restatement of Theorem 9.10 in terms of the time-reversed chain.

Corollary 9.12

Let (Xm*:0mC)(X^{*}_{m}:0\leq m\leq C) be the time-reversed chain, run until the cover time CC. Define 𝐓{\bf T} to be the directed spanning tree with root X0X_{0} and with edges (v=XTv,XTv-1),vX0(v=X_{T_{v}},X_{T_{v}-1}),\ v\neq X_{0}. If X0X_{0} has distribution π\pi then 𝐓{\bf T} has distribution ρ\rho. If X0X_{0} is deterministically v0v_{0}, say, then TT has distribution ρ\rho conditioned on being rooted at v0v_{0}.

Thus 𝐓{\bf T} consists of the edges by which each vertex is first visited, directed backwards.

For a reversible chain, we can of course use the chain itself in Corollary 9.12 above, in place of the time-reversed chain. If the chain is random walk on a unweighted graph GG, then

ρ¯(𝐭)=d(root(𝐭))v1d(v)\bar{\rho}({\bf t})=d(\mbox{root}({\bf t}))\ \prod_{v}\frac{1}{d(v)}

where d(v)d(v) is the degree of vv in GG. So ρ¯\bar{\rho}, restricted to the set of spanning trees with specified root v0v_{0}, is uniform on that set. In this setting, Corollary 9.12 specializes as follows.

Corollary 9.13

Let (Xm:0mC)(X_{m}:0\leq m\leq C) be random walk on an unweighted graph GG, started at v0v_{0} and run until the cover time CC. Define 𝐓{\bf T} to be the directed spanning tree with root v0v_{0} and with edges (v=XTv,XTv-1),vv0(v=X_{T_{v}},X_{T_{v}-1}),\ v\neq v_{0}. Then 𝐓{\bf T} is uniform on the set of all directed spanning trees of GG rooted at v0v_{0}.

We can rephrase this. If we just want “plain” spanning trees without a root and directions, then the 𝐓{\bf T} above, regarded as a plain spanning tree, is uniform on the set of all plain spanning trees. On the other hand, if we want a rooted spanning tree which is uniform on all such trees without prespecified root, the simplest procedure is to construct 𝐓{\bf T} as in Corollary 9.13 with deterministic start v0v_{0}, and at the end re-root 𝐓{\bf T} at a uniform random vertex. (This is slightly subtle – we could alternatively start with X0X_{0} uniform, which is typically not the stationary distribution π\pi.)

Using the bounds on cover time developed in Chapter 6, we now have an algorithm for generating a uniform spanning tree of a nn-vertex graph in O(n3)O(n^{3}) steps (and O(n2)O(n^{2}) steps on a regular graph). No other known algorithm achieves these bounds.

9.2.2 Electrical network theory

The ideas in this subsection (and much more) are treated in a long but very readable survey paper by Pemantle [279], which we encourage the interested reader to consult. As observed above, in the reversible setting we have the obvious simplification that we can construct uniform spanning trees using the chain itself. Deeper results can be found using the electrical network analogy. Consider random walk on a weighted graph GG. The random spanning tree 𝐓{\bf T} constructed by Corollary 9.12, interpreted as a “plain” spanning tree, has distribution

ρ(𝐭)=ce𝐭we\rho({\bf t})=c\ \prod_{e\in{\bf t}}w_{e}

where cc is the normalizing constant. If an edge ee is essential, it must be in every spanning tree, so P(e𝐓)=1P(e\in{\bf T})=1. If the edge is inessential, the probability will be strictly between 00 and 11. Intuitively, P(e𝐓)P(e\in{\bf T}) should provide a measure of “how nearly essential ee is”. This should remind the reader of the inessential edge inequality (yyy). Interpreting the weighted graph as an electrical network where an edge e=(v,x)e=(v,x) has resistance 1/we1/w_{e}, the effective resistance rvxr_{vx} between vv and xx satisfies

rvx1/wvx with equality iff (v,x)(v,x) is essential r_{vx}\leq 1/w_{vx}\mbox{ with equality iff $(v,x)$ is essential }
Proposition 9.14

For each edge (v,x)(v,x),

P((v,x)𝐓)=wvxrvx.P((v,x)\in{\bf T})=w_{vx}r_{vx}.

Note that in a nn-vertex graph, 𝐓{\bf T} has exactly n-1n-1 edges, so Proposition 9.14 implies Foster’s theorem (Chapter 3 yyy)

edges (v,x)wvxrvx=n-1.\sum_{\mbox{edges }(v,x)}w_{vx}r_{vx}=n-1.

Proof. Consider the random walk started at vv and run until the time UU of the first return to vv after the first visit to xx. Let pp be the chance that XU-1=xX_{U-1}=x, i.e. that the return to xx is along the edge (x,v)(x,v). We can calculate pp in two ways. In terms of random walk started at xx, pp is the chance that the first visit to vv is from xx, and so by Corollary 9.12 (applied to the walk started at xx) p=P((x,v)𝐓)p=P((x,v)\in{\bf T}). On the other hand, consider the walk started at vv and let SS be the first time that the walk traverses (x,v)(x,v) in that direction. Then

ES=EU/p.ES=EU/p.

But by yyy and yyy

ES=w/wvx,  EU=wrvxES=w/w_{vx},\ EU=wr_{vx}

and hence p=wvxrvxp=w_{vx}r_{vx} as required. \Box

The next result indicates the usefulness of the electrical network analogy.

Proposition 9.15

For any two edges e1e2e_{1}\neq e_{2},

P(e1𝐓,e2𝐓)P(e1𝐓)P(e2𝐓).P(e_{1}\in{\bf T},e_{2}\in{\bf T})\leq P(e_{1}\in{\bf T})P(e_{2}\in{\bf T}).

Proof. Consider the “shorted” graph GshortG^{{\rm short}} in which the end-vertices (x1,x2)(x_{1},x_{2}) of e1e_{1} are shorted into a single vertex xx, with edge-weights wxv=wx1v+wx2vw_{xv}=w_{x_{1}v}+w_{x_{2}v}. The natural 1-11-1 correspondence 𝐭𝐭{e1}{\bf t}\leftrightarrow{\bf t}\cup\{e_{1}\} between spanning trees of GshortG^{{\rm short}} and spanning trees of GG containing e1e_{1} maps the distribution ρshort\rho^{{\rm short}} to the conditional distribution ρ(|e1𝐓)\rho(\cdot|e_{1}\in{\bf T}). So, writing 𝐓short{\bf T}^{{\rm short}} for the random spanning tree associated with GshortG^{{\rm short}},

P(e2𝐓short)=P(e2𝐓|e1𝐓).P(e_{2}\in{\bf T}^{{\rm short}})=P(e_{2}\in{\bf T}|e_{1}\in{\bf T}).

But, setting e2=(z1,z2)e_{2}=(z_{1},z_{2}), Proposition 9.14 shows

P(e2𝐓short)=wz1z2rz1z2short,P(e2𝐓)=wz1z2rz1z2.P(e_{2}\in{\bf T}^{{\rm short}})=w_{z_{1}z_{2}}r^{{\rm short}}_{z_{1}z_{2}},\ % \ \ P(e_{2}\in{\bf T})=w_{z_{1}z_{2}}r_{z_{1}z_{2}}.

By Rayleigh’s monotonicity principle, rz1z2shortrz1z-2r^{{\rm short}}_{z_{1}z_{2}}\leq r_{z_{1}z-2}, and the result follows.