1 Introduction (July 20, 1999)1 Introduction (July 20, 1999)1.2 So what’s in the book?

1.1 Word problems

1.1.1 Random knight moves

Imagine a knight on a corner square of an otherwise empty chessboard. Move the knight by choosing at random from the legal knight-moves. What is the mean time until the knight first returns to the starting square?

At first sight this looks like a messy problem which will require numerical calculations. But the knight is moving as random walk on a finite graph (rather than just some more general Markov chain), and elementary theory reduces the problem to counting the numer of edges of the graph, giving the answer of 168 moves. See Chapter 3 yyy.

1.1.2 The white screen problem

Around 1980 I wrote a little Basic program that would display a random walk on the screen of my home computer. First, a pixel in the middle of the screen was lit up. Then one of the four directions N,E,W,S was selected uniformly at random and the walk proceeded one step in the chosen direction. That new pixel was lit up on the screen, and the process was repeated from the new point, etc. For a while, the walk is almost always quickly visiting pixels it hasn’t visited before, so one sees an irregular pattern that grows in the center of the screen. After quite a long while, when the screen is perhaps 95% illuminated, the growth process will have slowed down tremendously, and the viewer can safely go read War and Peace without missing any action. After a minor eternity every cell will have been visited. Any mathematician will want to know how long, on the average, it takes until each pixel has been visited. Edited from Wilf [336].

Taking the screen to be $m\times m$ pixels, we have a random walk on the discrete two-dimensional torus $Z^{2}_{d}$ , and the problem asks for the mean cover time, that is the time to visit every vertex of the graph. Such questions have been studied for general graphs (see Chapter 6), though ironically this particular case of the two-dimensional torus is the hardest special graph. It is known that mean cover time is asymptotically at most $4\pi^{-1}m^{2}\log^{2}m$ , and conjectured this is asymptotically correct (see Chapter 7 yyy). For $m=512$ this works out to be about 13 million. Of course, in accordance with Moore’s Law what took a minor eternity in 1980 takes just a few seconds today.

1.1.3 Universal traversal sequences

Let $S(n,d)$ be the set of all $d$ -regular graphs $G$ with n vertices and with the edges at each vertex labeled $(1,2,\ldots,d)$ . A universal traversal sequence $\;\;\;i_{1},i_{2},\ldots,i_{u}\in\{1,\ldots,d\}$ is a sequence that satisfies

for each $G\in S(n,d)$ and each initial vertex of $G$ the deterministic walk “at step $t$ choose edge $i_{t}$ ” visits every vertex.

What is the shortest length $u=u(n,d)$ of such a sequence?

To get a partial answer, instead of trying to be clever about picking the sequence, consider what happens if we just choose $i_{1},i_{2},\ldots$ uniformly at random. Then the walk on a graph $G$ is just simple random walk on $G$ . Using a result that the mean cover time on a regular graph is $O(n^{2})$ one can show (see Chapter 6 yyy) that most sequences of length $O(dn^{3}\log n)$ are universal traversal sequences.

Paradoxically, no explicit example of a universal traversal sequence this short is known. The argument above fits a general theme that probabilistic methods can be useful in combinatorics to establish the existence of objects which are hard to exhibit constructively: numerous examples are in the monograph by Alon and Spencer [28].

1.1.4 How long does it take to shuffle a deck of cards?

Repeated random shuffles of a $d$ -card deck may be modeled as a Markov chain on the space of all $d!$ possible configurations of the deck. Different physical methods of shuffling correspond to different chains. The model for the most common method, riffle shuffle, is described carefully in Chapter 9 (xxx section to be written). A mathematically simpler method is top-to-random, in which the top card is reinserted at one of the $d$ possible positions, chosen uniformly at random (Chapter 9 section yyy). Giving a precise mathematical interpretation to the question

how many steps of the chain (corresponding to a specified physical shuffle) are needed until the distribution of the deck is approximately uniform (over all $d!$ configurations)?

is quite subtle; we shall formalize different interpretations as different mixing times, and relations between mixing times are discussed in Chapter 4 for reversible chains and in Chapter 8 (xxx section to be written) for general chains. Our favorite formalization is via the variation threshold time $\tau_{1}$ , and it turns out that

	$\displaystyle\tau_{1}$	$\displaystyle\sim$	$\displaystyle{\textstyle\frac{3}{2}}\log_{2}d\quad\mbox{ (riffle shuffle) }$		(1.1)
	$\displaystyle\tau_{1}$	$\displaystyle\sim$	$\displaystyle d\log d\quad\mbox{ (top-to-random shuffle) }.$

For the usual deck with $d=52$ these suggest $8$ and $205$ shuffles respectively.

1.1.5 Sampling from high-dimensional distributions: Markov chain Monte Carlo

Suppose you have a function $f:R^{d}\to[0,\infty)$ with $\kappa:=\int_{R^{d}}f(x)\ dx<\infty$ , where $f$ is given by some explicit but maybe complicated formula. How can you devise a scheme to sample a random point in $R^{d}$ with the normalized probability density $f(x)/\kappa$ ?

For $d=1$ the elementary “inverse distribution function” trick is available, and for small $d$ simple acceptance/rejection methods are often practical. For large $d$ the most popular method is some form of Markov chain Monte Carlo (MCMC) method, and this specific $d$ -dimensional sampling problem is a prototype problem for MCMC methods. The scheme is to design a chain to have stationary distribution $f(x)/\kappa$ . A simple such chain is as follows. From a point $x$ , the next point $X_{1}$ is chosen by a two-step procedure. First choose $Y$ from some reference distribution (e.g. multivariate Normal with specified variance, or uniform of a sphere of specified radius) on $R^{d}$ ; then set $X_{1}=x+Y$ with probability $\min(1,f(x+Y)/f(x))$ and set $X_{1}=x$ with the remaining probability.

Routine theory says that the stationary density is indeed $f(x)/\kappa$ and that as $t\to\infty$ the distribution of the chain after $t$ steps converges to this stationary distribution. So a heuristic algorithm for the sampling problem is

Choose a starting point, a reference distribution and a number $t$ of steps, simulate the chain for $t$ steps, and output the state of the chain after $t$ steps.

To make a rigorous algorithm one needs to know how many steps are needed to guarantee closeness to stationarity; this is a mixing time question. The conceptual issues here are discussed in Chapter 11. Despite a huge literature on methodology and applications of MCMC in many different settings, rigorous results are rather scarce. A notable exception is in the sampling setting above where $\log f$ is a concave function, where there exist complicated results (outlined in Chapter 11 xxx to be written) proving that a polynomial (in $d$ ) number of steps suffice.

1.1.6 Approximate counting of self-avoiding walks

A self-avoiding walk (SAW) of length $l$ in the lattice $Z^{d}$ is a walk $0=v_{0},v_{1},v_{2},\ldots,v_{l}$ for which the $(v_{i})$ are distinct and successive pairs $(v_{i},v_{i+1})$ are adjacent. Understanding the $l\to\infty$ asymptotics of the cardinality $|S_{l}|$ of the set $S_{l}$ of SAWs of length $l$ (dimension $d=3$ is the most interesting case) is a famous open problem. A conceptual insight is that, for large $l$ , the problem

find an algorithm which counts $|S_{l}|$ approximately

can be reduced to the problem

find an algorithm which gives an approximately uniform random sample from $S_{l}$ .

To explain, note that each walk in $S_{l+1}$ is a one-step extension of some walk in $S_{l}$ . So the ratio $|S_{l+1}|/|S_{l}|$ equals the mean number of extensions of a uniform random SAW from $S_{l}$ , which of course can be estimated from the empirical average of the number of extensions of a large sample of SAWs from $S_{l}$ .

Similar schemes work for various other families $(S_{l})$ of combinatorial sets of increasing size, provided one has some explicit connection between $S_{l}$ and $S_{l+1}$ . As in the previous word problem, one can get an approximately uniform random sample by MCMC, i.e. by designing a chain whose stationary distribution is uniform, and simulating a sufficiently large number of steps of the chain: in making a rigorous algorithm, the issue again reduces to bounding the mixing time of the chain. The case of SAWs is outlined in Chapter 11 section yyy.

1.1.7 Simulating a uniform random spanning tree

The last two word problems hinted at large classes of algorithmic problems; here is a different, more specific problem. A finite connected graph $G$ has a finite number of spanning trees, and so it makes sense to consider a uniform random spanning tree of $G$ . How can one simulate this random tree?

It turns out there is an exact method, which involves running random walk on $G$ until every vertex $v$ has been visited; then for each $v$ (other than the initial vertex) let the tree include the edge by which the walk first visited $v$ . This gives some kind of random spanning tree; it seems non-obvious that the distribution is uniform, but that is indeed true. See Chapter 8 section yyy.

1.1.8 Voter model on a finite graph

Consider a graph where each vertex is colored, initially with different colors. Each vertex from time to time (precisely, at times of independent Poisson processes of rate $1$ ) picks an adjacent vertex at random and changes its color to the color of the picked neighbor. Eventually, on a finite graph, all vertices will have the same color: how long does this take?

This question turns out to be related (via a certain notion of duality) to the following question. Imagine particles, initially one at each vertex, which perform continuous-time random walk on the graph, but which coalesce when they meet. Eventually they will all coalesce into one particle: how long does this take? On the complete graph on $n$ vertices, the mean time in each question is $\sim n$ . See Chapter 10 section yyy.

1.1.9 Are you related to your ancestors?

You have two parents, four grandparents and eight great-grandparents. In other words, for small $g\geq 1$

you have exactly $2^{g}$ g’th-generation ancestors, and you are related to each of them.

But what about larger $g$ ? Clearly you didn’t have $2^{120}\approx 10^{12}$ distinct 120’th-generation ancestors! Even taking $g=10$ , one can argue it’s unlikely you had $1,024$ different 10th-generation ancestors, though the number is likely only a bit smaller – say 1,000, in round numbers. Whether you are actually related to these people is a subtle question. At the level of grade-school genetics, you have 46 chromosomes, each a copy of one parental chromosome, and hence each a copy of some 10th-generation ancestor’s chromosome. So you’re genetically related to at most 46 of your 10th-generation ancestors. Taking account of crossover during chromosome duplication leads to a more interesting model, in which the issue is to estimate hitting probabilities in a certain continuous-time reversible Markov chain. It turns out (Chapter 13 yyy) that the number of 10th-generation ancestors who are genetically related to you is about 340. So you’re unlikely to be related to a particular 10th-generation ancestor, a fact which presents a curious sidebar to the principle of hereditary monarchy.

1 Introduction (July 20, 1999)Bibliography 1.2 So what’s in the book?

Generated on Mon Jun 2 14:23:48 2014 by LaTeXML [LOGO]