9 A Second Look at General Markov Chains (April 21, 1995)9.1 Minimal constructions and mixing times 9.3 Self-verifying algorithms for sampling from a stationary distribution

9.2 Markov chains and spanning trees

9.2.1 General Chains and Directed Weighted Graphs

Let’s jump into the details and defer the discussion until later. Consider a finite irreducible discrete-time Markov chain $(X_{n})$ with transition matrix ${\bf P}=(p_{vw})$ , and note we are not assuming reversibility. We can identify ${\bf P}$ with a weighted directed graph, which has (for each $(v,w)$ with $p_{vw}>0$ ) a directed edge $(v,w)$ with weight $p_{vw}$ . A directed spanning tree ${\bf t}$ is a spanning tree with one vertex distinguished as the root, and with each edge $e=(v,w)$ of ${\bf t}$ regarded as being directed towards the root. Write ${\cal T}$ for the set of directed spanning trees. For ${\bf t}\in\mbox{${\cal T}$}$ define

\bar{\rho}({\bf t})\equiv\prod_{(v,w)\in{\bf t}}p_{vw}.

Normalizing gives a probability distribution $\rho$ on ${\cal T}$ :

\rho({\bf t})\equiv\frac{\bar{\rho}({\bf t})}{\sum_{{\bf t}^{\prime}\in\mbox{$% {\cal T}$}}\bar{\rho}({\bf t}^{\prime})}.

Now fix $n$ and consider the stationary Markov chain $(X_{m}:-\infty<m\leq n)$ run from time minus infinity to time $n$ . We now use the chain to construct a random directed spanning tree ${\bf T}_{n}$ . The root of ${\bf T}_{n}$ is $X_{n}$ . For each $v\neq X_{n}$ there was a final time, $L_{v}$ say, before $n$ that the chain visited $v$ :

L_{v}\equiv\max\{m\leq n:X_{m}=v\}.

Define ${\bf T}_{n}$ to consist of the directed edges

(v=X_{L_{v}},X_{L_{v}+1}),\ v\neq X_{n}.

So the edges of ${\bf T}_{n}$ are the last-exit edges from each vertex (other than the root $X_{n}$ ). It is easy to check that ${\bf T}_{n}$ is a directed spanning tree.

Now consider what happens as $n$ changes. Clearly the process $({\bf T}_{n}:-\infty<n<\infty)$ is a stationary Markov chain on ${\cal T}$ , with a certain transition matrix ${\bf Q}=(q({\bf t},{\bf t}^{\prime}))$ , say. The figure below indicates a typical transition ${\bf t}\rightarrow{\bf t}^{\prime}$ . Here ${\bf t}$ was constructed by the chain finishing at its root $v$ , and ${\bf t}^{\prime}$ is the new tree obtained when the chain makes a transition $v\rightarrow w$ .

Theorem 9.10 (The Markov chain tree theorem)

The stationary distribution of $({\bf T}_{n})$ is $\rho$ .

Proof. Fix a directed spanning tree ${\bf t}^{\prime}$ . We have to verify

\sum_{{\bf t}}\bar{\rho}({\bf t})q({\bf t},{\bf t}^{\prime})=\bar{\rho}({\bf t% }^{\prime}).

(9.13)

Write $w$ for the root of ${\bf t}^{\prime}$ . For each vertex $x\neq w$ there is a tree ${\bf t}_{x}$ constructed from ${\bf t}^{\prime}$ by adding an edge $(w,x)$ and then deleting from the resulting cycle the edge $(v,w)$ (say, for some $v=v(x)$ ) leading into $w$ . For $x=w$ set $v(x)=x$ . It is easy to see that the only possible transitions into ${\bf t}^{\prime}$ are from the trees ${\bf t}_{x}$ , and that

\frac{\bar{\rho}({\bf t}_{x})}{\bar{\rho}({\bf t}^{\prime})}=\frac{p_{wx}}{p_{% vw}};\ \ \ \ \ q({\bf t}_{x},{\bf t}^{\prime})=p_{vw}.

Thus the left side of (9.13) becomes

\sum_{x}\bar{\rho}({\bf t}_{x})q({\bf t}_{x},{\bf t}^{\prime})=\bar{\rho}({\bf t% }^{\prime})\sum_{x}p_{wx}=\bar{\rho}({\bf t}^{\prime}).

$\Box$

The underlying chain $X_{n}$ can be recovered from the tree-valued chain ${\bf T}_{n}$ via $X_{n}=\mbox{root}({\bf T}_{n})$ , so we can recover the stationary distribution of $X$ from the stationary distribution of $T$ , as follows.

Corollary 9.11 (The Markov chain tree formula)

For each vertex $v$ define

\bar{\pi}(v)\equiv\sum_{{\bf t}:\ v=\mbox{root}({\bf t})}\bar{\rho}({\bf t}).

\pi(v)\equiv\frac{\bar{\pi}(v)}{\sum_{w}\bar{\pi}(w)}.

Then $\pi$ is the stationary distribution of the original chain $(X_{n})$ .

See the Notes for comments on this classical result.

Theorem 9.10 and the definition of ${\bf T}_{0}$ come close to specifying an algorithm for constructing a random spanning tree with distribution $\rho$ . Of course the notion of running the chain from time $-\infty$ until time $0$ doesn’t sound very algorithmic, but we can rephrase this notion using time-reversal. Regarding the stationary distribution $\pi$ as known, the time-reversed chain $X^{*}$ has transition matrix $p^{*}_{vw}\equiv\pi_{w}p_{wv}/\pi_{v}$ . Here is the restatement of Theorem 9.10 in terms of the time-reversed chain.

Corollary 9.12

Let $(X^{*}_{m}:0\leq m\leq C)$ be the time-reversed chain, run until the cover time $C$ . Define ${\bf T}$ to be the directed spanning tree with root $X_{0}$ and with edges $(v=X_{T_{v}},X_{T_{v}-1}),\ v\neq X_{0}$ . If $X_{0}$ has distribution $\pi$ then ${\bf T}$ has distribution $\rho$ . If $X_{0}$ is deterministically $v_{0}$ , say, then $T$ has distribution $\rho$ conditioned on being rooted at $v_{0}$ .

Thus ${\bf T}$ consists of the edges by which each vertex is first visited, directed backwards.

For a reversible chain, we can of course use the chain itself in Corollary 9.12 above, in place of the time-reversed chain. If the chain is random walk on a unweighted graph $G$ , then

\bar{\rho}({\bf t})=d(\mbox{root}({\bf t}))\ \prod_{v}\frac{1}{d(v)}

where $d(v)$ is the degree of $v$ in $G$ . So $\bar{\rho}$ , restricted to the set of spanning trees with specified root $v_{0}$ , is uniform on that set. In this setting, Corollary 9.12 specializes as follows.

Corollary 9.13

Let $(X_{m}:0\leq m\leq C)$ be random walk on an unweighted graph $G$ , started at $v_{0}$ and run until the cover time $C$ . Define ${\bf T}$ to be the directed spanning tree with root $v_{0}$ and with edges $(v=X_{T_{v}},X_{T_{v}-1}),\ v\neq v_{0}$ . Then ${\bf T}$ is uniform on the set of all directed spanning trees of $G$ rooted at $v_{0}$ .

We can rephrase this. If we just want “plain” spanning trees without a root and directions, then the ${\bf T}$ above, regarded as a plain spanning tree, is uniform on the set of all plain spanning trees. On the other hand, if we want a rooted spanning tree which is uniform on all such trees without prespecified root, the simplest procedure is to construct ${\bf T}$ as in Corollary 9.13 with deterministic start $v_{0}$ , and at the end re-root ${\bf T}$ at a uniform random vertex. (This is slightly subtle – we could alternatively start with $X_{0}$ uniform, which is typically not the stationary distribution $\pi$ .)

Using the bounds on cover time developed in Chapter 6, we now have an algorithm for generating a uniform spanning tree of a $n$ -vertex graph in $O(n^{3})$ steps (and $O(n^{2})$ steps on a regular graph). No other known algorithm achieves these bounds.

9.2.2 Electrical network theory

The ideas in this subsection (and much more) are treated in a long but very readable survey paper by Pemantle [279], which we encourage the interested reader to consult. As observed above, in the reversible setting we have the obvious simplification that we can construct uniform spanning trees using the chain itself. Deeper results can be found using the electrical network analogy. Consider random walk on a weighted graph $G$ . The random spanning tree ${\bf T}$ constructed by Corollary 9.12, interpreted as a “plain” spanning tree, has distribution

\rho({\bf t})=c\ \prod_{e\in{\bf t}}w_{e}

where $c$ is the normalizing constant. If an edge $e$ is essential, it must be in every spanning tree, so $P(e\in{\bf T})=1$ . If the edge is inessential, the probability will be strictly between $0$ and $1$ . Intuitively, $P(e\in{\bf T})$ should provide a measure of “how nearly essential $e$ is”. This should remind the reader of the inessential edge inequality (yyy). Interpreting the weighted graph as an electrical network where an edge $e=(v,x)$ has resistance $1/w_{e}$ , the effective resistance $r_{vx}$ between $v$ and $x$ satisfies

(v,x)

Proposition 9.14

For each edge $(v,x)$ ,

P((v,x)\in{\bf T})=w_{vx}r_{vx}.

Note that in a $n$ -vertex graph, ${\bf T}$ has exactly $n-1$ edges, so Proposition 9.14 implies Foster’s theorem (Chapter 3 yyy)

\sum_{\mbox{edges }(v,x)}w_{vx}r_{vx}=n-1.

Proof. Consider the random walk started at $v$ and run until the time $U$ of the first return to $v$ after the first visit to $x$ . Let $p$ be the chance that $X_{U-1}=x$ , i.e. that the return to $x$ is along the edge $(x,v)$ . We can calculate $p$ in two ways. In terms of random walk started at $x$ , $p$ is the chance that the first visit to $v$ is from $x$ , and so by Corollary 9.12 (applied to the walk started at $x$ ) $p=P((x,v)\in{\bf T})$ . On the other hand, consider the walk started at $v$ and let $S$ be the first time that the walk traverses $(x,v)$ in that direction. Then

ES=EU/p.

But by yyy and yyy

ES=w/w_{vx},\ EU=wr_{vx}

and hence $p=w_{vx}r_{vx}$ as required. $\Box$

The next result indicates the usefulness of the electrical network analogy.

Proposition 9.15

For any two edges $e_{1}\neq e_{2}$ ,

P(e_{1}\in{\bf T},e_{2}\in{\bf T})\leq P(e_{1}\in{\bf T})P(e_{2}\in{\bf T}).

Proof. Consider the “shorted” graph $G^{{\rm short}}$ in which the end-vertices $(x_{1},x_{2})$ of $e_{1}$ are shorted into a single vertex $x$ , with edge-weights $w_{xv}=w_{x_{1}v}+w_{x_{2}v}$ . The natural $1-1$ correspondence ${\bf t}\leftrightarrow{\bf t}\cup\{e_{1}\}$ between spanning trees of $G^{{\rm short}}$ and spanning trees of $G$ containing $e_{1}$ maps the distribution $\rho^{{\rm short}}$ to the conditional distribution $\rho(\cdot|e_{1}\in{\bf T})$ . So, writing ${\bf T}^{{\rm short}}$ for the random spanning tree associated with $G^{{\rm short}}$ ,

P(e_{2}\in{\bf T}^{{\rm short}})=P(e_{2}\in{\bf T}|e_{1}\in{\bf T}).

But, setting $e_{2}=(z_{1},z_{2})$ , Proposition 9.14 shows

P(e_{2}\in{\bf T}^{{\rm short}})=w_{z_{1}z_{2}}r^{{\rm short}}_{z_{1}z_{2}},\ % \ \ P(e_{2}\in{\bf T})=w_{z_{1}z_{2}}r_{z_{1}z_{2}}.

By Rayleigh’s monotonicity principle, $r^{{\rm short}}_{z_{1}z_{2}}\leq r_{z_{1}z-2}$ , and the result follows.

9.1 Minimal constructions and mixing times Bibliography 9.3 Self-verifying algorithms for sampling from a stationary distribution

Generated on Mon Jun 2 14:23:48 2014 by LaTeXML [LOGO]