# 5.3 Trees

For random walk on a finite tree, we can develop explicit formulas for means and variances of first passage times, and for distributions of first hitting places. We shall only treat the unweighted case, but the formulas can be extended to the weighted case without difficulty.

xxx notation below —change $w$ to $x$ ? Used $i,j,v,w,x$ haphazardly for vertices.

In this section we’ll write $r_{v}$ for the degree of a vertex $v$, and $d(v,x)$ for the distance between $v$ and $x$. On a tree we may unambiguously write $[v,x]$ for the path from $v$ to $x$. Given vertices $j,v_{1},v_{2},\ldots$ in a tree, the intersection of the paths $[j,v_{1}],\ [j,v_{2}],\ldots$ is a (maybe trivial) path; write $d(j,v_{1}\wedge v_{2}\wedge\cdots)\geq 0$ for the length of this intersection path.

On an $n$-vertex tree, the random walk’s stationary distribution is

 $\pi_{v}=\frac{r_{v}}{2(n-1)}.$

Recall from the beginning of this chapter that an edge $(v,x)$ of a graph is essential if its removal would disconnect the graph into two components $A(v,x)$ and $A(x,v)$, say, containing $v$ and $x$ respectively. Obviously, in a tree every edge is essential, so we get a lot of mileage out of the essential edge lemma (Lemma 5.1).

###### Theorem 5.20

Consider discrete-time random walk on an $n$-vertex tree.

For each edge $(i,j)$,

 $E_{i}T_{j}=2|A(i,j)|-1$ (5.81)
 $E_{i}T_{j}+E_{j}T_{i}=2(n-1).$ (5.82)

For arbitrary $i,j$,

 $E_{i}T_{j}=-d(i,j)+2\sum_{v}d(j,i\wedge v)=\sum_{v}r_{v}d(j,i\wedge v)$ (5.83)
 $E_{i}T_{j}+E_{j}T_{i}=2(n-1)d(i,j).$ (5.84)

For each edge $(i,j)$,

 ${\rm var}_{i}T_{j}=-E_{i}T_{j}+\sum_{v\in A(i,j)}\sum_{w\in A(i,j)}r_{v}r_{w}(% 2d(j,v\wedge w)-1).$ (5.85)

For arbitrary $i,j$,

 ${\rm var}_{i}T_{j}=-E_{i}T_{j}+\sum_{v}\sum_{w}r_{v}r_{w}d(j,i\wedge v\wedge w% )\left[2d(j,v\wedge w)-d(j,i\wedge v\wedge w)\right].$ (5.86)

Remarks. 1. There are several equivalent expressions for the sums above: we chose the most symmetric-looking ones. We’ve written sums over vertices, but one could rephrase in terms of sums over edges.

2. In continuous time, the terms “$-E_{i}T_{j}$” disappear from the variance formulas—see xxx.

Proof of Theorem 5.20. Equations (5.81) and (5.82) are rephrasings of (5.3) and (5.4) from the essential edge lemma. Equation (5.84) and the first equality in (5.83) follow from (5.82) and (5.81) by summing over the edges in the path $[i,j]$. Note alternatively that (5.84) can be regarded as a consequence of the commute interpretation of resistance, since the effective resistance between $i$ and $j$ is $d(i,j)$. To get the second equality in (5.83),consider the following deterministic identity (whose proof is obvious), relating sums over vertices to sums over edges.

###### Lemma 5.21

Let $f$ be a function on the vertices of a tree, and let $j$ be a distinguished vertex. Then

 $\sum_{v}r_{v}f(v)=\sum_{v\neq j}(f(v)+f(v^{*}))$

where $v^{*}$ is the first vertex (other than $v$) in the path $[v,j]$.

To apply to (5.83), note

 $\displaystyle d(j,i\wedge v^{*})$ $\displaystyle=$ $\displaystyle d(j,i\wedge v)\mbox{ if }v\not\in[i,j]$ $\displaystyle=$ $\displaystyle d(j,i\wedge v)-1\mbox{ if }v\in[i,j],\ v\neq j.$

The equality in Lemma 5.21 now becomes the equality in (5.83).

We prove (5.85) below. To derive (5.86) from it, sum over the edges in the path $[i,j]=(i=i_{0},i_{1},\ldots,i_{m}=j)$ to obtain

 ${\rm var}_{i}T_{j}=-E_{i}T_{j}+\sum_{v}\sum_{w}\sum_{l}\left(2d(i_{l+1},v% \wedge w)-1\right)$ (5.87)

where $\sum_{l}$ denotes the sum over all $0\leq l\leq m-1$ for which $A(i_{l},i_{l+1})$ contains both $v$ and $w$. Given vertices $v$ and $w$, there exist unique smallest values of $p$ and $q$ so that $v\in A(i_{p},i_{p+1})$ and $w\in A(i_{q},i_{q+1})$. If $p\neq q$, then the sum $\sum_{l}$ in (5.87) equals

 $\displaystyle\sum_{l=p\vee q}^{m-1}\left(2d(i_{l+1},i_{p\vee q})-1\right)$ $\displaystyle=$ $\displaystyle\sum_{l=p\vee q}^{m-1}\left(2((l+1)-(p\vee q))-1\right)$ $\displaystyle=$ $\displaystyle(m-(p\vee q))^{2}=d^{2}(j,v\wedge w)$ $\displaystyle=$ $\displaystyle d(j,i\wedge v\wedge w)\left[2d(j,v\wedge w)-d(j,i\wedge v\wedge w% )\right],$

as required by (5.86). If $p=q$, then the sum $\sum_{l}$ in (5.87) equals

 $\sum_{l=p}^{m-1}\left(2d(i_{l+1},i_{p})+2d(i_{p},v\wedge w)-1\right)$

which again equals $d(j,i\wedge v\wedge w)\left[2d(j,v\wedge w)-d(j,i\wedge v\wedge w)\right]$ by a similar calculation.

So it remains to prove (5.85), for which we may suppose, as in the proof of Lemma 5.1, that $j$ is a leaf. By considering the first step from $j$ to $i$ we have

 ${\rm var}_{j}T^{+}_{j}={\rm var}_{i}T_{j}.$

Now yyy of Chapter 2 gives a general expression for ${\rm var}_{j}T^{+}_{j}$ in terms of $E_{\pi}T_{j}$, and in the present setting this becomes

 ${\rm var}_{j}T^{+}_{j}=2(n-1)-(2(n-1))^{2}+\sum_{v}2r_{v}E_{v}T_{j}.$

Using the second equality in (5.83), we may rewrite the sum as

 $\sum_{v\neq j}\sum_{w\neq j}r_{v}r_{w}2d(j,v\wedge w).$

Also,

 $\sum_{v\neq j}r_{v}=2(n-1)-1.$

Combining these expressions gives

 ${\rm var}_{i}T_{j}=-(2n-3)+\sum_{v\neq j}\sum_{w\neq j}r_{v}r_{w}(2d(j,v\wedge w% )-1).$

But by (5.81), $E_{i}T_{j}=2n-3$.

## 5.3.1 Parameters for trees

Here we discuss the five parameters of Chapter 4. Obviously by (5.84)

 $\tau^{*}=2(n-1)\Delta$ (5.88)

where $\Delta$ is the diameter of the tree. As for $\tau_{c}$, it is clear that the sup in its definition is attained by $A(v,w)$ for some edge $(v,w)$. Note that

 $\pi(A(v,w))=\frac{2|A(v,w)|-1}{2(n-1)}.$ (5.89)

 $\displaystyle\tau_{c}$ $\displaystyle=$ $\displaystyle\max_{(v,w)}\frac{\frac{2|A(v,w)|-1}{2(n-1)}\frac{2|A(w,v)|-1}{2(% n-1)}}{\frac{1}{2(n-1)}}$ (5.90) $\displaystyle=$ $\displaystyle\max_{(v,w)}\frac{4|A(v,w)||A(w,v)|-2n+1}{2(n-1)}.$

Obviously the max is attained by an edge for which $|A(v,w)|$ is as close as possible to $n/2$. This is one of several notions of “centrality” of vertices and edges which arise in our discussion—see Buckley and Harary [81] for a treatment of centrality in the general graph context, and for the standard graph-theoretic terminology.

###### Proposition 5.22

On an $n$-vertex tree,

 $\tau_{0}=\frac{1}{2}+\frac{2}{n}\sum_{(v,w)}\left[|A(v,w)||A(w,v)|-\frac{1}{2(% n-1)}\left(|A(v,w)|^{2}+|A(w,v)|^{2}\right)\right]$

where $\sum_{(v,w)}$ denotes the sum over all undirected edges $(v,w)$.

Proof. Using the formula for the stationary distribution, for each $i$

 $\tau_{0}=\frac{1}{2(n-1)}\sum_{j}r_{j}E_{i}T_{j}.$

Appealing to Lemma 5.21 (with $i$ as the distinguished vertex)

 $\tau_{0}=\frac{1}{2(n-1)}\sum_{j}(2E_{i}T_{j}-a(i,j))$

where $a(i,i)=0$ and $a(i,j)=E_{x}T_{j}$, where $(j,x)$ is the first edge of the path $[j,i]$. Taking the (unweighted) average over $i$,

 $\tau_{0}=\frac{1}{2n(n-1)}\sum_{i}\sum_{j}(2E_{i}T_{j}-a(i,j)).$

Each term $E_{i}T_{j}$ is the sum of terms $E_{v}T_{w}$ along the edges $(v,w)$ of the path $[i,j]$. Counting how many times a directed edge $(v,w)$ appears,

 $\tau_{0}=\frac{1}{2n(n-1)}\sum\left(2|A(v,w)||A(w,v)|-|A(v,w\right)|)E_{v}T_{w},$

where we sum over directed edges $(v,w)$. Changing to a sum over undirected edges, using $E_{v}T_{w}+E_{w}T_{v}=2(n-1)$ and $E_{v}T_{w}=2|A(v,w)|-1$, gives

 $\displaystyle 2n(n-1)\tau_{0}$ $\displaystyle=$ $\displaystyle\sum_{(v,w)}\left[2|A(v,w)||A(w,v)|2(n-1)\right.$ $\displaystyle\ \ -|A(v,w)|(2|A(v,w)|-1)$ $\displaystyle\left.\ \ -|A(w,v)|(2|A(w,v)|-1)\right].$

This simplifies to the assertion of the Proposition.

For $\tau_{1}$ we content ourselves with a result “up to equivalence”.

###### Proposition 5.23

There exist constants $K_{1},K_{2}<\infty$ such that

 $\frac{1}{K_{1}}\min_{i}\max_{j}E_{j}T_{i}\leq\tau_{1}\leq K_{2}\min_{i}\max_{j% }E_{j}T_{i}.$

Of course the expectations can be computed by (5.83).

Proof. We work with the parameter

 $\tau_{1}^{(3)}\equiv\max_{i,j}\sum_{k}\pi_{k}|E_{j}T_{k}-E_{i}T_{k}|$

which we know is equivalent to $\tau_{1}$. Write

 $\sigma=\min_{i}\max_{j}E_{j}T_{i}.$

Fix an $i$ attaining the minimum. For arbitrary $j$ we have (the first equality uses the random target lemma, cf. the proof of Chapter 4 Lemma yyy)

 $\displaystyle\sum_{k}\pi_{k}|E_{j}T_{k}-E_{i}T_{k}|$ $\displaystyle=$ $\displaystyle 2\sum_{k}\pi_{k}(E_{j}T_{k}-E_{i}T_{k})^{+}$ $\displaystyle\leq$ $\displaystyle 2\sum_{k}\pi_{k}E_{j}T_{i}\mbox{ because }E_{j}T_{k}\leq E_{j}T_% {i}+E_{i}T_{k}$ $\displaystyle\leq$ $\displaystyle 2\sigma$

and so $\tau_{1}^{(3)}\leq 4\sigma$.

For the converse, it is elementary that we can find a vertex $i$ such that the size ($n^{*}$, say) of the largest branch from $i$ satisfies $n^{*}\leq n/2$. (This is another notion of “centrality”. To be precise, we are excluding $i$ itself from the branch.) Fix this $i$, and consider the $j$ which maximizes $E_{j}T_{i}$, so that $E_{j}T_{i}\geq\sigma$ by definition. Let $B$ denote the set of vertices in the branch from $i$ which contains $j$. Then

 $E_{j}T_{k}=E_{j}T_{i}+E_{i}T_{k},\ k\in B^{c}$

and so

 $\tau_{1}^{(3)}\geq\sum_{k}\pi_{k}|E_{j}T_{k}-E_{i}T_{k}|\geq\pi(B^{c})E_{j}T_{% i}\geq\pi(B^{c})\sigma.$

But by (5.89) $\pi(B)=\frac{2n^{*}-1}{2(n-1)}\leq\frac{1}{2}$, so we have shown $\tau_{1}^{(3)}\geq\sigma/2$.

We do not know whether $\tau_{2}$ has a simple expression “up to equivalence” analogous to Proposition 5.23. It is natural to apply the “distinguished paths” bound (Chapter 4 yyy). This gives the inequality

 $\displaystyle\tau_{2}$ $\displaystyle\leq$ $\displaystyle 2(n-1)\max_{(v,w)}\sum_{x\in A(v,w)}\sum_{y\in A(w,v)}\pi_{x}\pi% _{y}d(x,y)$ $\displaystyle=$ $\displaystyle 2(n-1)\max_{(v,w)}\left(\pi(A(v,w))E\left[d(v,V)1_{(V\in A(w,v))% }\right]\right.$ $\displaystyle\left.\ \ \ \ \ +\pi(A(w,v))E\left[d(v,V)1_{(V\in A(v,w))}\right]\right)$

where $V$ has the stationary distribution $\pi$ and where we got the equality by writing $d(x,y)=d(v,y)+d(v,x)$. The edge attaining the max gives yet another notion of “centrality.”

xxx further remarks on $\tau_{2}$.

## 5.3.2 Extremal trees

It is natural to think of the $n$-path (Example 5.8) and the $n$-star (Example 5.10) as being “extremal” amongst all $n$-vertex trees. The proposition below confirms that the values of $\tau^{*}$, $\max_{i,j}E_{i}T_{j}$, $\tau_{0}$, $\tau_{2}$, and $\tau_{c}$ in those examples are the exact extremal values (minimal for the star, maximal for the path).

###### Proposition 5.24

For any $n$-vertex tree with $n\geq 3$,

(a) $4(n-1)\leq\tau^{*}\leq 2(n-1)^{2}$

(b) $2(n-1)\leq\max_{i,j}E_{i}T_{j}\leq(n-1)^{2}$

(c) $n-{\textstyle\frac{3}{2}}\leq\tau_{0}\leq(2n^{2}-4n+3)/6$.

(d) $1\leq\tau_{2}\leq(1-\cos(\pi/(n-1)))^{-1}$.

(e) $1-\frac{1}{2(n-1)}\leq\tau_{c}\leq\frac{4\lfloor n^{2}/4\rfloor-2n+1}{2(n-1)}$.

Proof. (a) is obvious from (5.88), because $\Delta$ varies between $2$ for the $n$-star and $(n-1)$ for the $n$-path. The lower bound in (b) follows from the lower bound in (a). For the upper bound in (b), consider some path $i=v_{0},v_{1},\ldots,v_{d}=j$ in the tree, where plainly $d\leq(n-1)$. Now $|A(v_{d-1},v_{d})|\leq n-1$ and so

 $|A(v_{d-i},v_{d-i+1})|\leq n-i\mbox{ for all }i$

because the left side decreases by at least $1$ as $i$ increases. So

 $\displaystyle E_{i}T_{j}$ $\displaystyle=$ $\displaystyle\sum_{m=0}^{d-1}E_{v_{m}}T_{v_{m+1}}$ $\displaystyle=$ $\displaystyle\sum_{m=0}^{d-1}(2|A(v_{m},v_{m+1})|-1)\mbox{ by (\ref{tt1})}$ $\displaystyle\leq$ $\displaystyle\sum_{m=0}^{d-1}(2(m+n-d)-1)$ $\displaystyle\leq$ $\displaystyle\sum_{l=1}^{n-1}(2l-1)$ $\displaystyle=$ $\displaystyle(n-1)^{2}.$

To prove (c), it is enough to show that the sum in Proposition 5.22 is minimized by the $n$-star and maximized by the $n$-path. For each undirected edge $(v,w)$, let

 $b(v,w)=\min(|A(v,w)|,|A(w,v)|)\leq n/2.$

Let ${\bf b}=(b_{1},b_{2},\ldots,b_{n-1})$ be the non-decreasing rearrangement of these values $b(v,w)$. The summands in Proposition 5.22 are of the form

 $a(n-a)-\frac{1}{2(n-1)}(a^{2}+(n-a)^{2})$

with $a$ ranging over the $b_{i}$.

One can check that this quantity is an increasing function of $a\leq n/2$. Thus it is enough to show that the vector b on an arbitrary $n$-tree dominates coordinatewise the vector b for the $n$-star and is dominated by the vector b for the $n$-path. The former is obvious, since on the $n$-star ${\bf b}=(1,1,\ldots,1)$. The latter needs a little work. On the $n$-path ${\bf b}=(1,1,2,2,3,3,\ldots)$. So we must prove that in any $n$-tree

 $b_{i}\leq\left\lfloor\frac{i+1}{2}\right\rfloor\mbox{ for all }i.$ (5.91)

Consider a rooted tree on $m$ vertices. Breaking an edge $e$ gives two components; let $a(e)$ be the size of the component not containing the root. Let $(a_{1},a_{2},\ldots)$ be the non-decreasing rearrangement of $(a(e))$. For an $m$-path rooted at one leaf, $(a_{1},a_{2},\ldots)=(1,2,3,\ldots)$. We assert this is extremal, in that for any rooted tree

 $a_{i}\leq i\mbox{ for all }i.$ (5.92)

This fact can be proved by an obvious induction on $m$, growing trees by adding leaves.

Now consider an unrooted tree, and let b be as above. There exists some vertex $v$, of degree $r\geq 2$, such that each of the $r$ branches from $v$ has size (excluding $v$) at most $n/2$. Consider these branches as trees rooted at $v$, apply (5.92), and it is easy to deduce (5.91).

For (d), the lower bound is easy. Fix a leaf $v$ and let $w$ be its neighbor. We want to apply the extremal characterization (Chapter 3 yyy) of $\tau_{2}$ to the function

 $g(v)=1-\pi_{v}-\pi_{w},g(w)=0,g(\cdot)=-\pi_{v}\mbox{ elsewhere}.$

For this function, $\sum\pi_{x}g(x)=0$,

 $[g,g]=\pi_{v}(1-\pi_{v}-\pi_{w})^{2}+(1-\pi_{v}-\pi_{w})\pi^{2}_{v},$

and by considering transitions out of $w$

 $\mbox{{\cal E}}(g,g)=\pi_{v}(1-\pi_{v}-\pi_{w})^{2}+(\pi_{w}-\pi_{v})\pi^{2}% _{v}.$

Since $\pi_{w}\leq 1/2$ we have $[g,g]\geq\mbox{{\cal E}}(g,g)$ and hence $\tau_{2}\geq[g,g]/\mbox{{\cal E}}(g,g)\geq 1$.

qqq Anyone have a short proof of upper bound in (d)?

Finally, (e) is clear from (5.90).

Other extremal questions. Several other extremal questions have been studied. Results on cover time are given in Chapter 6. Yaron [340] shows that for leaves $l$ the mean hitting time $E_{\pi}T_{l}$ is maximal on the $n$-path and minimal on the $n$-star. (He actually studies the variance of return times, but Chapter 2 yyy permits the rephrasing.) Finally, if we are interested in the mean hitting time $E_{x}T_{A}$ or the hitting place distribution, we can reduce to the case where $A$ is the set $L$ of leaves, and then set up recursively-solvable equations for $h(i)\equiv E_{i}T_{L}$ or for $f(i)=P_{i}(T_{A}=T_{l})$ for fixed $l\in L$. An elementary treatment of such ideas is in Pearce [277], who essentially proved that (on $n$-vertex trees) $\max_{x}E_{x}T_{L}$ is minimized by the $n$-star and maximized by the $n$-path.