L^{2}

Techniques for Bounding Mixing Times (May 19 1999)8 Advanced

L^{2}

Techniques for Bounding Mixing Times (May 19 1999)8.2 Improved bounds on

L^{2}

distance

8.1 The comparison method for eigenvalues

xxx Revise Chapter 7, Sections 1.9 and 4, in light of this section?

The comparison method, introduced by Diaconis and Saloff-Coste [115, 116], generalizes the distinguished path method of Chapter 4, Section yyy:3 for bounding the relaxation time of a reversible Markov chain. As before, we first (xxx: delete word?) work in the setting of random walks on weighted graphs. We will proceed for given state space (vertex set) $I$ by comparing a collection $(w_{ij})$ of weights of interest to another collection $({\tilde{w}}_{ij})$ ; the idea will be to use known results for the random walk with weights $({\tilde{w}}_{ij})$ to derive corresponding results for the walk of interest. We assume that the graph is connected under each set of weights. As in Chapter 4, Section yyy:4.3, we choose (“distinguish”) paths $\gamma_{xy}$ from $x$ to $y$ . Now, however, this need be done only for those $(x,y)$ with $x\neq y$ and ${\tilde{w}}_{xy}>0$ , but we impose the additional constraint $w_{e}>0$ for each edge $e$ in the path. (Here and below, $e$ denotes a directed edge in the graph of interest.) In other words, roughly put, we need to construct a $(w_{ij})$ -path to effect each given $({\tilde{w}}_{xy})$ -edge. Recall from Chapter 3 yyy:(71) the definition of Dirichlet form:

	$\displaystyle\mbox{${\cal E}$}(g,g)$	$\displaystyle=$	$\displaystyle\frac{1}{2}\sum_{i}\sum_{j\neq i}\frac{w_{ij}}{w}(g(j)-g(i))^{2},$		(8.14)
	$\displaystyle{\tilde{\cal E}}(g,g)$	$\displaystyle=$	$\displaystyle\frac{1}{2}\sum_{i}\sum_{j\neq i}\frac{{\tilde{w}}_{ij}}{{\tilde{% w}}}(g(j)-g(i))^{2}.$

Theorem 8.1 (comparison of Dirichlet forms)

For each ordered pair $(x,y)$ of distinct vertices with ${\tilde{w}}_{xy}>0$ , let $\gamma_{xy}$ be a path from $x$ to $y$ with $w_{e}>0$ for every $e\in\gamma_{xy}$ . Then the Dirichlet forms (8.14) satisfy

{\tilde{\cal E}}(g,g)\leq A\mbox{${\cal E}$}(g,g)=\mbox{${\cal E}$}(g,g)\frac{% w}{{\tilde{w}}}\max_{e}\frac{1}{w_{e}}\sum_{x}\sum_{y\neq x}{\tilde{w}}_{xy}|% \gamma_{xy}|1_{(e\in\gamma_{xy})}

for every $g$ .

Proof. For an edge $e=(i,j)$ write $\Delta g(e)=g(j)-g(i)$ . Then

$\displaystyle 2{\tilde{w}}{\tilde{\cal E}}(g,g)$	$\displaystyle=$	$\displaystyle\sum_{x}\sum_{y\neq x}{\tilde{w}}_{xy}(g(y)-g(x))^{2}$
	$\displaystyle=$	$\displaystyle\sum_{x}\sum_{y\neq x}{\tilde{w}}_{xy}\left(\sum_{e\in\gamma_{xy}% }\Delta g(e)\right)^{2}$
	$\displaystyle\leq$	$\displaystyle\sum_{x}\sum_{y\neq x}{\tilde{w}}_{xy}\|\gamma_{xy}\|\sum_{e\in% \gamma_{xy}}(\Delta g(e))^{2}\mbox{\ \ \ by Cauchy--Schwarz}$
	$\displaystyle\leq$	$\displaystyle A\sum_{e}w_{e}(\Delta g(e))^{2}=A\cdot 2w\mbox{${\cal E}$}(g,g).% ~{}\ \ \rule{4.3pt}{4.3pt}$

Remarks. (a) Suppose the comparison weights $({\tilde{w}}_{ij})$ are given by

i,j\in I

The corresponding discrete-time random walk is then the “trivial”walk with ${\tilde{w}}=w$ and

{\tilde{w}}_{i}=w_{i},\ \ {\tilde{p}}_{ij}=\pi_{j},\ \ {\tilde{\pi}}_{j}=\pi_{j}

for all $i,j$ , and

	$\displaystyle{\tilde{\cal E}}(g,g)$	$\displaystyle=$	$\displaystyle\frac{1}{2}\sum_{i}\sum_{j\neq i}\pi_{i}\pi_{j}(g(j)-g(i))^{2}={{% \rm var}}_{\pi}\,g$
		$\displaystyle=$	$\sum_{i}\pi_{i}g(i)=0$

In this case the conclusion of Theorem 1 reduces to

\|g\|^{2}_{2}\leq\mbox{${\cal E}$}(g,g)w\max_{e}{\textstyle\frac{1}{w_{e}}}% \sum_{x}\sum_{y\neq x}\pi_{x}\pi_{y}|\gamma_{xy}|1_{(e\in\gamma_{xy})}.

This inequality was established in the proof of the distinguished path theorem (Chapter 4 Theorem yyy:32), and that theorem was an immediate consequence of the inequality. Hence the comparison Theorem 1 may be regarded as a generalization of the distinguished path theorem.

[xxx For NOTES: We’ve used simple Sinclair weighting. What about other weighting in use of Cauchy–Schwarz? Hasn’t been considered, as far as I know.]

(b) When specialized to the setting of reversible random flights on Cayley graphs described in Chapter 7 Section yyy:1.9, Theorem 1 yields Theorem yyy:14 of Chapter 7. To see this, adopt the setup in Chapter 7 Section yyy:1.9, and observe that the word

g_{i}\in\mbox{${\cal G}$}

(8.15)

corresponds uniquely to a path

\gamma_{{\rm id},x}=({\rm id},g_{1},g_{1}g_{2},\ldots,g_{1}g_{2}\cdots g_{d}=x)

(8.16)

in the Cayley graph corresponding to the generating set ${\cal G}$ of interest. Having built paths $\gamma_{{\rm id},x}$ for each $x\in I$ , we then can build paths $\gamma_{yz}$ for $y,z\in I$ by exploiting vertex-transitivity, to wit, by setting

\gamma_{yz}=(y,yg_{1},yg_{1}g_{2},\ldots,yg_{1}g_{2}\cdots g_{d}=z)

where $y^{-1}z=x$ and the path $\gamma_{{\rm id},x}$ is given by (8.16). In Theorem 1 we then have both stationary distributions $\pi$ and ${\tilde{\pi}}$ uniform,

{\tilde{w}}_{xy}={\tilde{\mu}}(x^{-1}y)/n,\ \ \ {\tilde{w}}=1,\ \ \ |\gamma_{% xy}|=|\gamma_{{\rm id},x^{-1}y}|=d({\rm id},x^{-1}y),

and, if $e=(v,vg)$ with $v\in I$ and $g\in\mbox{${\cal G}$}$ ,

w_{e}=\mu(g)/n,\ \ \ w=1,\ \ \ 1_{(e\in\gamma_{xy})}=1_{((x^{-1}v,x^{-1}vg)\in% \gamma_{{\rm id},x^{-1}y})}.

Thus $A$ of Theorem 1 equals

\max_{v\in I,\ g\in\mbox{${\cal G}$}}\frac{1}{\mu(g)}\sum_{x}\sum_{y\neq x}{% \tilde{\mu}}(x^{-1}y)d({\rm id},x^{-1}y)1_{(x^{-1}v,x^{-1}vg\in\gamma_{{\rm id% },x^{-1}y})},

which reduces easily to $K$ of Theorem yyy:14 of Chapter 7. Since $\pi$ and ${\tilde{\pi}}$ are both uniform, the extremal characterization

\tau_{2}=\sup\{\|g\|^{2}_{2}/\mbox{${\cal E}$}(g,g):\ \sum_{i}\pi_{i}g(i)=0\}

(8.17)

gives Theorem yyy:14 of Chapter 7.

Theorem 8.1 compares Dirichlet forms. To compare relaxation times using the extremal characterization (8.17), we compare $L^{2}$ -norms using the same “direct” technique as for Chapter 3 Lemma yyy:26. For any $g$ ,

\|g\|^{2}_{2}\leq\|g\|^{\sim 2}_{2}\max_{i}(\pi_{i}/{\tilde{\pi}}_{i})

(8.18)

where, as usual, $\pi_{i}:=w_{i}/w$ and ${\tilde{\pi}}_{i}={\tilde{w}}_{i}/{\tilde{w}}$ . So if $g$ has $\pi$ -mean $0$ and ${\tilde{\pi}}$ -mean $b$ , then

\frac{\|g\|^{2}_{2}}{\mbox{${\cal E}$}(g,g)}\leq\frac{\|g-b\|^{2}_{2}}{\mbox{$% {\cal E}$}(g-b,g-b)}\leq\frac{A\|g-b\|^{\sim 2}_{2}}{{\tilde{\cal E}}(g-b,g-b)% }\max_{i}(\pi_{i}/{\tilde{\pi}}_{i}).

(8.19)

Thus

Corollary 8.2 (comparison of relaxation times)

In Theorem 1,

\tau_{2}\leq\frac{A}{a}{\tilde{\tau}}_{2}

where

	$\displaystyle A$	$\displaystyle:=$	$\displaystyle\frac{w}{{\tilde{w}}}\max_{e}\frac{1}{w_{e}}\sum_{x}\sum_{y\neq x% }{\tilde{w}}_{xy}\|\gamma_{xy}\|1_{(e\in\gamma_{xy})},$
	$\displaystyle a$	$\displaystyle:=$	$\displaystyle\min_{i}({\tilde{\pi}}_{i}/\pi_{i}).$

xxx Perhaps restate as

\tau_{2}\leq{\tilde{\tau}}_{2}

where

B:=\left(\max_{e}\frac{1}{w_{e}}\sum\sum\cdots\right)\left(\max_{i}\frac{w_{i}% }{{\tilde{w}}_{i}}\right)

(and similarly for Corollaries 4 and 7)?

xxx Remark about best if $\pi={\tilde{\pi}}$ ?

Here is a simple example, taken from [115], showing the improvement in Corollary 8.2 over Theorem yyy:32 of Chapter 4 provided by the freedom in choice of benchmark chain.

xxx NOTE: After the fact, I realized that the following example was already Example yyy:20 of Chapter 7; must reconcile.

Example 8.3

Consider a card shuffle which transposes the top two cards in the deck, moves the top card to the bottom, or moves the bottom card to the top, each with probability $1/3$ . This example fits the specialized group framework of Chapter 7 Section yyy:1.9 (see also Remark (b) following Theorem 8.1 above) with $I$ taken to be the symmetric group on $m$ letters and

\mbox{${\cal G}$}:=\{(1\ 2),(m\ m-1\ m-2\ \cdots\ 1),(1\ 2\ \cdots\ m)\}

in cycle notation. [If the order of the deck is represented by a permutation $\sigma$ in such a way that $\sigma(i)$ is the position of the card with label $i$ , and if permutations are composed left to right, then $\sigma\cdot(m\ m-1\ m-2\ \cdots\ 1)$ is the order resulting from $\sigma$ by moving the top card to the bottom.]

We obtain a representation (8.15) for any given permutation $x$ by writing

x=h_{m}h_{m-1}\cdots h_{2}

in such a way that

i\leq j\leq m

(8.20)

(i.e., $h_{m}\cdots h_{i}$ and $x$ agree in positions $i$ through $m$ ) and each $h_{i}$ is explicitly represented as a product of generators. To accomplish this, we proceed inductively. Suppose that (8.20) holds for given $i\in\{3,\ldots,m+1\}$ , and that $(h_{m}\cdots h_{i})(x^{-1}(i-1))=l_{i}=l$ , with $1\leq l\leq i-1$ . Then let

	$\displaystyle h_{i-1}$	$\displaystyle:=$	$\displaystyle(m\ m-1\ m-2\ \cdots\ 1)^{l-1}[(1\ 2)(m\ m-1\ m-2\ \cdots\ 1)]^{i% -l-1}$
			$\displaystyle\ \ \cdot(m\ m-1\ m-2\ \cdots\ 1)^{m-i+2}.$

In words, beginning with $h_{m}\cdots h_{i}$ , we repeatedly move the top card to the bottom until card $x^{-1}(i-1)$ has risen to the top; then we repeatedly transpose and shift until the top $m-i+2$ cards, in order, are $x^{-1}(i-1),\ldots,x^{-1}(m)$ ; and finally we cut these $m-i+2$ cards to the bottom.

xxx Either revise Section 1.9 of Chapter 7 to delete requirement of geodesic paths, or explain one can erase cycles.

It follows that the diameter $\Delta$ of the Cayley graph associated with ${\cal G}$ satisfies

\Delta\leq\sum_{i=2}^{m+1}[(l_{i}-1)+2(i-l_{i}-1)+(m-i+2)]\leq 3{m\choose 2}

and so by Chapter 7 Corollary yyy:15 that $\tau_{2}\leq 27{m\choose 2}^{2}<\frac{27}{4}m^{4}.$

To improve this bound on the relaxation time we compare the chain of interest to the random transposition chain of Chapter 7 Example yyy:18 and employ Corollary 8.2, or rather its specialization, (yyy:Theorem 14) of Chapter 7.

xxx Continue as in Chapter 7 Example yyy:20 to get

\frac{\tau_{2}}{{\tilde{\tau}}_{2}}\leq 3\beta^{2},\ \ \ {\tilde{\tau}}_{2}=% \frac{m}{2},\ \ \ \tau_{2}\leq\frac{27}{2}m^{3}.

xxx Test function on page 2139 of [115] shows this is right order.

Corollary 8.2 can be combined with the inequality

{\hat{\tau}}\leq\tau_{2}\left(\frac{1}{2}\log\frac{1}{\pi_{*}}+1\right)

(8.21)

from (8.7) to bound the $L^{2}$ threshold parameter ${\hat{\tau}}$ for the chain of interest, but Theorem 8.1 sometimes affords a sharper result. From the Courant–Fischer “min–max” theorem ([183], Theorem 4.2.11) it follows along the same lines as in Chapter 3 Section yyy:6.3 that

\lambda^{-1}=\inf\rho(h_{1},h_{2},\ldots,h_{m-1}),\ \ \ m=2,\ldots,n,

(8.22)

where $h_{1}\equiv 1$ and xxx Say the conditions better!

	$\displaystyle\rho(h_{1},h_{2},\ldots,h_{m-1})$
		$\displaystyle:=$	$j=1,\ldots,m-1$

and the $\inf$ in (8.22) is taken over all vectors $h_{1},\ldots,h_{m-1}$ that are orthogonal in $L^{2}(\pi)$ (or, equivalently, that are linearly independent). Using (8.19), Corollary 8.2 now generalizes to

Corollary 8.4 (comparison of eigenvalues)

In Theorem 8.1, the eigenvalues $\lambda_{m}$ and ${\tilde{\lambda}}_{m}$ in the respective spectral representations satisfy

\lambda^{-1}_{m}\leq\frac{A}{a}{\tilde{\lambda}}^{-1}_{m}

with $A$ and $a$ as defined in Corollary 8.2.

Here is a simple example [116] not possessing vertex-transitivity:

xxx NOTE: This is a DIRECT comparison!: see Chapter 3 Section 6.4.

Example 8.5

Random walk on a $d$ -dimensonal grid.

To keep the notation simple, we let $d=2$ and consider the grid $I:=\{0,\ldots,m_{1}-1\}\times\{0,\ldots,m_{2}-1\}$ as an (unweighted) subgraph of ${\bf Z}^{2}$ . The eigenvalues $\lambda_{l}$ are not known in closed form. However, if we add self-loops to make a benchmark graph where $I$ is regular with degree $4$ , then the eigenvalues ${\tilde{\lambda}}_{l}$ for the continuous-time walk are

1-\frac{1}{2}\left(\cos\frac{\pi r}{m_{1}}+\cos\frac{\pi s}{m_{2}}\right),\ \ % \ 0\leq r\leq m_{1}-1,\ \ \ 0\leq s\leq m_{2}-1.

xxx Product chain. Add discussion of all eigenvalues to Section yyy:6.2 of Chapter 4?

xxx P.S. See Chapter 5, (66).

In particular, assuming $m_{1}\geq m_{2}$ we have

{\tilde{\tau}}_{2}=2\left(1-\cos\frac{\pi}{m_{1}}\right)^{-1}.

(8.23)

Now we apply Corollary 8.4 to bound the eigenvalues $\lambda_{l}$ . In Theorem 8.1, the two graphs agree except for self-loops, so

A=w/{\tilde{w}};

furthermore,

a=\min_{i}\frac{{\tilde{\pi}}_{i}}{\pi_{i}}=\frac{w}{{\tilde{w}}}\min_{i}{{% \tilde{w}}_{i}}{w_{i}},

\frac{A}{a}=\max_{i}\frac{w_{i}}{{\tilde{w}}_{i}}\leq 1.

Thus $\lambda^{-1}_{l}\leq{\tilde{\lambda}}^{-1}_{l}$ for $1\leq l\leq n:=m_{1}m_{2}$ ; in particular,

\tau_{2}\leq{\tilde{\tau}}_{2}.

(8.24)

Comparing the other way around gives

{\tilde{\lambda}}^{-1}_{l}\leq\left(\max_{i}\frac{{\tilde{w}}_{i}}{w_{i}}% \right)\lambda^{-1}_{l}=2\lambda^{-1}_{l},\ \ \ 1\leq l\leq n

and in particular

\tau_{2}\geq{\textstyle\frac{1}{2}}{\tilde{\tau}}_{2}.

The result $\frac{1}{2}{\tilde{\lambda}}^{-1}_{l}\leq\lambda^{-1}_{l}\leq{\tilde{\lambda}}% ^{-1}_{l}$ extends to general $d$ , for which (for example)

{\tilde{\tau}}_{2}=d\left(1-\cos\frac{\pi}{m}\right)^{-1}

where $I=\{0,\ldots,m_{1}-1\}\times\cdots\times\{0,\ldots,m_{d}-1\}$ and $m:=\max_{i}m_{i}$ .

Example 8.6

Random walk on a thinned grid.

As a somewhat more interesting example, suppose we modify the grid in ${\bf Z}^{2}$ in Example 8.5 by deleting at most one edge from each unit square.

xxx Copy picture on page 700 in [116] as example?

Again we can apply Corollary 8.4, using the same benchmark graph as in Example 8.5. In Theorem 8.1, ${\tilde{w}}_{xy}>0$ for $x\neq y$ if and only if $x$ and $y$ are neighboring vertices in the (unthinned) grid $\{0,\ldots,m_{1}-1\}\times\{0,\ldots,m_{2}-1\}$ . We can choose $\gamma_{xy}$ to have length $1$ (if the edge joining $x$ and $y$ has not been deleted) or $3$ (if it has). For any directed edge $e$ in the grid, there are at most two paths of length $3$ and at most one path of length $1$ passing through $e$ . Thus $A\leq 7w/{\tilde{w}}$ , and so $A/a\leq 7\max_{i}(w_{i}/{\tilde{w}}_{i})\leq 7$ ; comparing the other way around is even easier (all paths have length $1$ ), and we find

{\textstyle\frac{1}{4}}{\tilde{\lambda}}^{-1}_{l}\leq\lambda^{-1}_{l}\leq 7{% \tilde{\lambda}}^{-1}_{l},\ \ \ 2\leq l\leq n.

xxx REMINDER: NOTES OR ELSEWHERE?: Mention exclusion process [149, 116].

Example 8.7

The $n$ -path with end self-loops.

The comparison technique does not always provide results as sharp as those in the preceding two examples, even when the two chains are “close.” For example, let the chain of interest be the $n$ -path, with self-loops added at each end added to make the graph regular with degree $2$ , and let the benchmark graph be the $n$ -cycle (Chapter 5, Example yyy:7). Use of Corollary 8.2 gives only $\tau_{2}\leq n{\tilde{\tau}}_{2}$ , whereas in fact $\tau_{2}=\left(1-\cos\frac{\pi}{n}\right)^{-1}\sim\frac{2}{\pi^{2}}n^{2}$ and ${\tilde{\tau}}_{2}=\left(1-\cos\frac{2\pi}{n}\right)^{-1}\sim\frac{1}{2\pi^{2}% }n^{2}.$

It is difficult in general to use Corollary 8.4 to improve upon (8.21). However, when both the chain of interest and the benchmark chain are symmetric reversible chains (as defined in Chapter 7 Section yyy:1.1), it follows from Chapter 4 yyy:(14) by averaging over $i$ that

{\hat{d}}(t)\leq{\tilde{{\hat{d}}}}\left(\frac{a}{A}t\right),\ \ \ t\geq 0,

and hence from (8.1) we obtain

Corollary 8.8 (comparison of $L^{2}$ mixing times)

In Theorem 8.1, if both the graph of interest and the benchmark graph are vertex-transitive, then the $L^{2}$ mixing time parameters ${\hat{\tau}}$ and ${\tilde{{\hat{\tau}}}}$ satisfy

{\hat{\tau}}\leq\frac{A}{a}{\tilde{{\hat{\tau}}}}.

Example 8.9

Returning to the slow card-shuffling scheme of Example 8.3 with random transpositions benchmark, it is known from group representation methods [123, 120] which make essential use of all the eigenvalues ${\tilde{\lambda}}_{r}$ , not just ${\tilde{\lambda}}_{2}$ , that

m\to\infty

Since $a=1$ and $A$ ( $=K$ of Chapter 7, Theorem yyy:14) $\leq 27m^{2}$ , it follows that

{\hat{\tau}}\leq(1+o(1)){\textstyle\frac{27}{2}}m^{3}\log m.

(8.25)

This improves upon Example 8.3, which combines with (8.21) to give only

{\hat{\tau}}\leq(1+o(1)){\textstyle\frac{27}{4}}m^{4}\log m.

xxx Show truth is ${\hat{\tau}}=\Theta(m^{3}\log m)$ ?

8 Advanced

L^{2}

Techniques for Bounding Mixing Times (May 19 1999)Bibliography 8.2 Improved bounds on

L^{2}