1 Introduction (July 20, 1999)

1.2 So what’s in the book?

1.2.1 Conceptual themes

Classical mathematical probability focuses on time-asymptotics, describing what happens if some random process runs for ever. In contrast, the word problems each ask “how long until a chain does something?”, and the focus of this book is on finite-time behavior. More precisely, the word problems ask about hitting times, the time until a state or a set of states is first visited, or until each state in a set is visited; or ask about mixing times, the number of steps until the distribution is approximately the stationary distribution. The card-shuffling problems (section 1.1.4) provide a very intuitive setting for such questions; how many shuffles are needed, as a function of the size of the deck, until the deck is well shuffled? Such size-asymptotic results, of which (1.1) is perhaps the best-known, are one of the themes of this book. Thus in one sense our work is in the spirit of the birthday and coupon-collector’s problems in undergraduate probability; in another sense our goals are reminiscent of those of computational complexity (P =?\stackrel{?}{=} NP and all that), which seeks to relate the time required to solve an algorithmic problem to the size of the problem.

1.2.2 Prerequisites

The reader who has taken a first-year graduate course in mathematical probability will have no difficulty with the mathematical content of this book. Though if the phrase “randomized algorithm” means nothing to you, then it would be helpful to look at Motwani - Raghavan [265] to get some feeling for the algorithmic viewpoint.

We have tried to keep much of the book accessible to readers whose mathematical background emphasizes discrete math and algorithms rather than analysis and probability. The minimal background required is an undergraduate course in probability including classical limit theory for finite Markov chains. Graduate-level mathematical probability is usually presented within the framework of measure theory, which (with some justification) is often regarded as irrelevant “general abstract nonsense” by those interested in concrete mathematics. We will point out as we go the pieces of graduate-level probability that we use (e.g. martingale techniques, Wald’s identity, weak convergence). Advice: if your research involves probability then you should at some time see what’s taught in a good first-year-graduate course, and we strongly recommend Durrett [133] for this purpose.

1.2.3 Contents and alternate reading

Amongst the numerous introductory accounts of Markov chains, Norris [270] is closest to our style. That book, like the more concise treatment in Durrett [133] Chapter 5, emphasizes probabilistic methods designed to work in the countable-state setting. Matrix-based methods designed for the finite-state setting are emphasised by Kemeny - Snell [215] and by Hunter [186]. We start in Chapter 2 by briskly reviewing standard asymptotic theory of finite-state chains, and go on to a range of small topics less often emphasised: obtaining general identities from the reward-renewal theorem, and useful metrics on distributions, for instance. Chapter 3 starts our systematic treatment of reversible chains: their identification as random walks on weighted graphs, the analogy with electrical networks, the spectral representation and its consequences for the structure of hitting time distributions, the Dirichlet formalism, extremal characterization of eigenvalues and various mean hitting times. This material has not been brought together before. Chen [88] gives a somewhat more advanced treatment of some of the analytic techniques and their applications to infinite particle systems (also overlapping partly with our Chapters 10 and 11), but without our finite-time emphasis. Kelly [213] emphasizes stationary distributions of reversible stochastic networks, Keilson [212] emphasizes structural properties such as complete monotonicity, and Doyle - Snell [131] give a delightful elementary treatment of the electrical network connection. Chapter 4 is the centerpiece of our attempt to create coherent intermediate-level theory. We give a detailed analysis of different mixing times: the relaxation time (1/spectral gap), the variation threshold (where variation distance becomes small, uniformly in initial state) and the Cheeger time constant (related to weighted connectivity). We discuss relations between these times and their surprising connection with mean hitting times; the distinguished paths method for bounding relaxation time, Cheeger-type inequalities, and how these parameters behave under operations on chains (watching only on a subset, taking product chains). Little of this exists in textbooks, though Chung [95] gives a more graph-theoretic treatment of Cheeger inequalities and of the advanced analytic techniques in Chapter 12.

The rather technical Chapter 4 may seem tough going, but the payoff is that subsequent chapters tend to “branch out” without developing further theoretical edifices. Chapter 5 gives bare-hands treatments of numerous examples of random walks on special graphs, and of two classes of chains with special structure: birth-and-death chains, and random walks on trees. Chapter 6 treats cover times (times to visit every vertex), which feature in several of our word problems, and for which a fairly complete theory exists. Chapter 7 discusses a hierarchy of symmetry conditions for random walks on graphs and groups, emphasising structural properties. A conspicuous gap is that we do not discuss how analytic techniques (e.g. group representation theory, orthogonal polynomials) can be systematically used to derive exact formulas for tt-step transition probabilities or hitting time distributions in the presence of enough symmetry. Diaconis [123] has material on this topic, but an updated account would be valuable. Chapter 8 returns to not-necessarily reversible chains, treating topics such as certain optimal stopping times, the Markov chain tree theorem, and coupling from the past. Chapter 9 xxx. Chapter 10 describes the coupling method of bounding the variation threshold mixing time, and then discusses several interacting particle systems on finite graphs related to random walks. As background, Liggett [232] is the standard reference for interacting particle systems on infinite lattices. Chapter 11 xxx. Chapter 12 recounts work of work of Diaconis and Saloff-Coste, who bring the techniques of Nash inequalities, log-Sobolev inequalities and local Poincaré inequalities to bear to obtain sharper estimates for reversible Markov chains. These techniques were originally developed by analysts in the study of heat kernels, cf. the sophisticated treatment in Varopoulos et al [332]. Chapter 13 xxx and mentions topics not treated in detail because of mathematical depth or requirements for extraneous mathematical techniques or the authors’ exhaustion.

As previously mentioned, our purpose is to provide systematic intermediate-level discussion of reversible Markov chains and random walks on graphs, built around the central theme of mixing times and hitting times developed in Chapter 4. Various topics could be tackled in a more bare-hands way; an opposite approach by Lovász [242] (N.B. second edition) is to lead the reader through half a chapter of problems concerning random walk on graphs. Our approach is to treat random walk on an unweighted graph as a specialization of reversible chain, which makes it clear where non-trivial graph theory is being used (basically, not until Chapter 6).

We have not included exercises, though filling in omitted details will provide ample exercise for a conscientious reader. Of the open problems, some seem genuinely difficult while others have just not been thought about before.