1 Introduction (July 20, 1999)

1.1 Word problems

1.1.1 Random knight moves

Imagine a knight on a corner square of an otherwise empty chessboard. Move the knight by choosing at random from the legal knight-moves. What is the mean time until the knight first returns to the starting square?

At first sight this looks like a messy problem which will require numerical calculations. But the knight is moving as random walk on a finite graph (rather than just some more general Markov chain), and elementary theory reduces the problem to counting the numer of edges of the graph, giving the answer of 168 moves. See Chapter 3 yyy.

1.1.2 The white screen problem

Around 1980 I wrote a little Basic program that would display a random walk on the screen of my home computer. First, a pixel in the middle of the screen was lit up. Then one of the four directions N,E,W,S was selected uniformly at random and the walk proceeded one step in the chosen direction. That new pixel was lit up on the screen, and the process was repeated from the new point, etc. For a while, the walk is almost always quickly visiting pixels it hasn’t visited before, so one sees an irregular pattern that grows in the center of the screen. After quite a long while, when the screen is perhaps 95% illuminated, the growth process will have slowed down tremendously, and the viewer can safely go read War and Peace without missing any action. After a minor eternity every cell will have been visited. Any mathematician will want to know how long, on the average, it takes until each pixel has been visited. Edited from Wilf [336].

Taking the screen to be m×mm\times m pixels, we have a random walk on the discrete two-dimensional torus Zd2Z^{2}_{d}, and the problem asks for the mean cover time, that is the time to visit every vertex of the graph. Such questions have been studied for general graphs (see Chapter 6), though ironically this particular case of the two-dimensional torus is the hardest special graph. It is known that mean cover time is asymptotically at most 4π-1m2log2m4\pi^{-1}m^{2}\log^{2}m, and conjectured this is asymptotically correct (see Chapter 7 yyy). For m=512m=512 this works out to be about 13 million. Of course, in accordance with Moore’s Law what took a minor eternity in 1980 takes just a few seconds today.

1.1.3 Universal traversal sequences

Let S(n,d)S(n,d) be the set of all dd-regular graphs GG with n vertices and with the edges at each vertex labeled (1,2,,d)(1,2,\ldots,d). A universal traversal sequence i1,i2,,iu{1,,d}\;\;\;i_{1},i_{2},\ldots,i_{u}\in\{1,\ldots,d\} is a sequence that satisfies

for each GS(n,d)G\in S(n,d) and each initial vertex of GG the deterministic walk “at step tt choose edge iti_{t}” visits every vertex.

What is the shortest length u=u(n,d)u=u(n,d) of such a sequence?

To get a partial answer, instead of trying to be clever about picking the sequence, consider what happens if we just choose i1,i2,i_{1},i_{2},\ldots uniformly at random. Then the walk on a graph GG is just simple random walk on GG. Using a result that the mean cover time on a regular graph is O(n2)O(n^{2}) one can show (see Chapter 6 yyy) that most sequences of length O(dn3logn)O(dn^{3}\log n) are universal traversal sequences.

Paradoxically, no explicit example of a universal traversal sequence this short is known. The argument above fits a general theme that probabilistic methods can be useful in combinatorics to establish the existence of objects which are hard to exhibit constructively: numerous examples are in the monograph by Alon and Spencer [28].

1.1.4 How long does it take to shuffle a deck of cards?

Repeated random shuffles of a dd-card deck may be modeled as a Markov chain on the space of all d!d! possible configurations of the deck. Different physical methods of shuffling correspond to different chains. The model for the most common method, riffle shuffle, is described carefully in Chapter 9 (xxx section to be written). A mathematically simpler method is top-to-random, in which the top card is reinserted at one of the dd possible positions, chosen uniformly at random (Chapter 9 section yyy). Giving a precise mathematical interpretation to the question

how many steps of the chain (corresponding to a specified physical shuffle) are needed until the distribution of the deck is approximately uniform (over all d!d! configurations)?

is quite subtle; we shall formalize different interpretations as different mixing times, and relations between mixing times are discussed in Chapter 4 for reversible chains and in Chapter 8 (xxx section to be written) for general chains. Our favorite formalization is via the variation threshold time τ1\tau_{1}, and it turns out that

τ1\displaystyle\tau_{1} \displaystyle\sim 32log2d (riffle shuffle)\displaystyle{\textstyle\frac{3}{2}}\log_{2}d\quad\mbox{ (riffle shuffle) } (1.1)
τ1\displaystyle\tau_{1} \displaystyle\sim dlogd (top-to-random shuffle) .\displaystyle d\log d\quad\mbox{ (top-to-random shuffle) }.

For the usual deck with d=52d=52 these suggest 88 and 205205 shuffles respectively.

1.1.5 Sampling from high-dimensional distributions: Markov chain Monte Carlo

Suppose you have a function f:Rd[0,)f:R^{d}\to[0,\infty) with κ:=Rdf(x)dx<\kappa:=\int_{R^{d}}f(x)\ dx<\infty, where ff is given by some explicit but maybe complicated formula. How can you devise a scheme to sample a random point in RdR^{d} with the normalized probability density f(x)/κf(x)/\kappa?

For d=1d=1 the elementary “inverse distribution function” trick is available, and for small dd simple acceptance/rejection methods are often practical. For large dd the most popular method is some form of Markov chain Monte Carlo (MCMC) method, and this specific dd-dimensional sampling problem is a prototype problem for MCMC methods. The scheme is to design a chain to have stationary distribution f(x)/κf(x)/\kappa. A simple such chain is as follows. From a point xx, the next point X1X_{1} is chosen by a two-step procedure. First choose YY from some reference distribution (e.g. multivariate Normal with specified variance, or uniform of a sphere of specified radius) on RdR^{d}; then set X1=x+YX_{1}=x+Y with probability min(1,f(x+Y)/f(x))\min(1,f(x+Y)/f(x)) and set X1=xX_{1}=x with the remaining probability.

Routine theory says that the stationary density is indeed f(x)/κf(x)/\kappa and that as tt\to\infty the distribution of the chain after tt steps converges to this stationary distribution. So a heuristic algorithm for the sampling problem is

Choose a starting point, a reference distribution and a number tt of steps, simulate the chain for tt steps, and output the state of the chain after tt steps.

To make a rigorous algorithm one needs to know how many steps are needed to guarantee closeness to stationarity; this is a mixing time question. The conceptual issues here are discussed in Chapter 11. Despite a huge literature on methodology and applications of MCMC in many different settings, rigorous results are rather scarce. A notable exception is in the sampling setting above where logf\log f is a concave function, where there exist complicated results (outlined in Chapter 11 xxx to be written) proving that a polynomial (in dd) number of steps suffice.

1.1.6 Approximate counting of self-avoiding walks

A self-avoiding walk (SAW) of length ll in the lattice ZdZ^{d} is a walk 0=v0,v1,v2,,vl0=v_{0},v_{1},v_{2},\ldots,v_{l} for which the (vi)(v_{i}) are distinct and successive pairs (vi,vi+1)(v_{i},v_{i+1}) are adjacent. Understanding the ll\to\infty asymptotics of the cardinality |Sl||S_{l}| of the set SlS_{l} of SAWs of length ll (dimension d=3d=3 is the most interesting case) is a famous open problem. A conceptual insight is that, for large ll, the problem

find an algorithm which counts |Sl||S_{l}| approximately

can be reduced to the problem

find an algorithm which gives an approximately uniform random sample from SlS_{l}.

To explain, note that each walk in Sl+1S_{l+1} is a one-step extension of some walk in SlS_{l}. So the ratio |Sl+1|/|Sl||S_{l+1}|/|S_{l}| equals the mean number of extensions of a uniform random SAW from SlS_{l}, which of course can be estimated from the empirical average of the number of extensions of a large sample of SAWs from SlS_{l}.

Similar schemes work for various other families (Sl)(S_{l}) of combinatorial sets of increasing size, provided one has some explicit connection between SlS_{l} and Sl+1S_{l+1}. As in the previous word problem, one can get an approximately uniform random sample by MCMC, i.e. by designing a chain whose stationary distribution is uniform, and simulating a sufficiently large number of steps of the chain: in making a rigorous algorithm, the issue again reduces to bounding the mixing time of the chain. The case of SAWs is outlined in Chapter 11 section yyy.

1.1.7 Simulating a uniform random spanning tree

The last two word problems hinted at large classes of algorithmic problems; here is a different, more specific problem. A finite connected graph GG has a finite number of spanning trees, and so it makes sense to consider a uniform random spanning tree of GG. How can one simulate this random tree?

It turns out there is an exact method, which involves running random walk on GG until every vertex vv has been visited; then for each vv (other than the initial vertex) let the tree include the edge by which the walk first visited vv. This gives some kind of random spanning tree; it seems non-obvious that the distribution is uniform, but that is indeed true. See Chapter 8 section yyy.

1.1.8 Voter model on a finite graph

Consider a graph where each vertex is colored, initially with different colors. Each vertex from time to time (precisely, at times of independent Poisson processes of rate 11) picks an adjacent vertex at random and changes its color to the color of the picked neighbor. Eventually, on a finite graph, all vertices will have the same color: how long does this take?

This question turns out to be related (via a certain notion of duality) to the following question. Imagine particles, initially one at each vertex, which perform continuous-time random walk on the graph, but which coalesce when they meet. Eventually they will all coalesce into one particle: how long does this take? On the complete graph on nn vertices, the mean time in each question is n\sim n. See Chapter 10 section yyy.

1.1.9 Are you related to your ancestors?

You have two parents, four grandparents and eight great-grandparents. In other words, for small g1g\geq 1

you have exactly 2g2^{g} g’th-generation ancestors, and you are related to each of them.

But what about larger gg? Clearly you didn’t have 212010122^{120}\approx 10^{12} distinct 120’th-generation ancestors! Even taking g=10g=10, one can argue it’s unlikely you had 1,0241,024 different 10th-generation ancestors, though the number is likely only a bit smaller – say 1,000, in round numbers. Whether you are actually related to these people is a subtle question. At the level of grade-school genetics, you have 46 chromosomes, each a copy of one parental chromosome, and hence each a copy of some 10th-generation ancestor’s chromosome. So you’re genetically related to at most 46 of your 10th-generation ancestors. Taking account of crossover during chromosome duplication leads to a more interesting model, in which the issue is to estimate hitting probabilities in a certain continuous-time reversible Markov chain. It turns out (Chapter 13 yyy) that the number of 10th-generation ancestors who are genetically related to you is about 340. So you’re unlikely to be related to a particular 10th-generation ancestor, a fact which presents a curious sidebar to the principle of hereditary monarchy.