10 Some Graph Theory and Randomized Algorithms (September 1 1999)

10.3 Randomized algorithms

10.3.1 Background

Here we give some background for the mathematician with no knowledge of the theory of algorithms. Typically there are many different possible algorithms for a particular problem; the theory of algorithms seeks “optimal” algorithms according to some notion of “cost”. Cost is usually “time”, i.e. number of computational steps, but sometimes involves criteria such as (storage) space or simplicity of coding. The phrase randomized algorithm refers to settings where the problem itself does not involve randomness but where randomness is introduced into the running of the algorithm. Why this is useful is best seen by example; the textbook of Motwani and Raghavan [265] provides a comprehensive range of examples and classifications of types of problems where randomized algorithms have proved useful. We give three standard examples below (not using Markov chains) and then proceed to talk about algorithms using random walks on graphs.

Example 10.6

Statistical sampling.

Consider a population of nn individuals. Suppose we wish to know the proportion qq of the population with some attribute, i.e. who answer “Yes” to some Yes/No question. To calculate qq exactly we need to question all nn individuals. But if we can sample uniformly at random from the population, then we can estimate qq approximately and can bound the size of error in probability. To do this, we sample independently kk random individuals, question them, and calculate the empirical proportion q¯\bar{q} of Yes answers. Use q¯\bar{q} as our estimate of qq, and theory gives error probabilities

P(|q¯-q|>2(q¯(1-q¯))1/2k-1/2)5%.P(|\bar{q}-q|>2(\bar{q}(1-\bar{q}))^{1/2}k^{-1/2})\approx 5\%.

Such 95%95\% confidence intervals are discussed in every freshman statistics course. Classical statististical sampling is conceptually a bit different from algorithms, in that the “cost” kk here refers to real-world costs of interviewing human individuals (or experimenting on individual rats or whatever) rather than to computational cost. However, the key insight in the formula above is that, for prescribed allowable error ε\varepsilon, the cost of this simple random sampling is O(ε-2)O(\varepsilon^{-2}) and this cost does not depend on the “problem size” (i.e. population size) nn. The next example is a slightly more subtle use of sampling in a slightly more algorithmic context.

Example 10.7

Size of a union of sets.

It’s fun to say this as a word problem in the spirit of Chapter 1. Suppose your new cyberpunk novel has been rejected by all publishers, so you have published it privately, and seek to sell copies by mailing advertizements to individuals. So you need to buy mailing lists (from e.g. magazines and specialist bookshops catering to science fiction). Your problem is that such mailing lists might have much overlap. So before buying LL lists A1,,ALA_{1},\ldots,A_{L} (where AiA_{i} is a set of |Ai||A_{i}| names and addresses) you would like to know roughly the size |iAi||\cup_{i}A_{i}| of their union. How can you do this without knowing what the sets AiA_{i} are (the vendors won’t give them to you for free)? Statistical sampling can be used here. Suppose the vendors will allow you to randomly sample a few names (so you can check accuracy) and will allow you to “probe” whether a few specified names are on their list. Then you can sample kk times from each list, and for each sampled name XijX_{ij} probe the other lists to count the number m(Xij)1m(X_{ij})\geq 1 of lists containing that name. Consider the identity

|iAi|\displaystyle|\cup_{i}A_{i}| =\displaystyle= i|Ai|×|Ai|-1aAi1/m(a)\displaystyle\sum_{i}|A_{i}|\times|A_{i}|^{-1}\sum_{a\in A_{i}}1/m(a)
=\displaystyle= i|Ai|E(1/Mi)\displaystyle\sum_{i}|A_{i}|\ E(1/M_{i})

where MiM_{i} is the number of lists containing a uniform random name from AiA_{i}. You can estimate E(1/Mi)E(1/M_{i}) by k-1j=1k1/m(Xij)k^{-1}\sum_{j=1}^{k}1/m(X_{ij}), and the error has standard deviation k-1/2\leq k^{-1/2}, and the resulting estimate of |iAi||\cup_{i}A_{i}| has error

±O((i|Ai|2/k)1/2)=±O(k-1/2Lmaxi|Ai|).\pm O((\sum_{i}|A_{i}|^{2}/k)^{1/2})=\pm O(k^{-1/2}L\max_{i}|A_{i}|).

As in the previous example, the key point is that the cost of “approximately counting” iAi\cup_{i}A_{i} to within a small relative error does not depend on the size of the sets.

Example 10.8

Solovay-Strassen test of primality [312].

We can’t improve on the concise description given by Babai [37].

Let n>1n>1 be an odd integer. Call an integer ww a Solovay-Strassen witness (of compositeness of nn) if 1wn-11\leq w\leq n-1 and either g.c.d.(w,n)>1(w,n)>1 or w(n-1)/2(wn)modnw^{(n-1)/2}\not\equiv({\textstyle\frac{w}{n}})\bmod n, where (wn)({\textstyle\frac{w}{n}}) is the Jacobi symbol (computed via quadratic reciprocity as easily as g.c.d.’s are computed via Euclid’s algorithm). Note that no S-S witness exists if nn is prime. On the other hand (this is the theorem) if nn is composite, then at least half of the integers 1,2,,n-11,2,\ldots,n-1 are S-S witnesses.

Suppose now that we want to decide whether or not a given odd 200-digit integer nn is prime. Pick kk integers w1,,wkw_{1},\ldots,w_{k} independently at random from {1,2,,n-1}\{1,2,\ldots,n-1\}. If any one of the wiw_{i} turns out to be a witness, we know that nn is composite. If none of them are, let us conclude that nn is prime. Here we may err, but for any nn, the probability that we draw the wrong conclusion is at most ε=2-k\varepsilon=2^{-k}. Setting k=500k=500 is perfectly realistic, so we shall have proven the mathematical statement “nn is prime” beyond the shade of doubt.

10.3.2 Overview of randomized algorithms using random walks or Markov chains

Our focus is of course on randomized algorithms using random walks or Markov chains. We will loosely divide these into three categories. Markov chain Monte Carlo seeks to simulate a random sample from a (usually non-uniform) given probability distribution on a given set. This is the central topic of Chapter 11. In section 10.4 below we give a selection of miscellaneous graph algorithms. Into this category also falls the idea (Chapter 6 section 8.2) (yyy 10/31/94 version; details to be written) of using random walk as a “undirected graph connectivity” algorithm, and the idea (end of section 10.2.2) of using random walk paths as an ingredient in constructing edge-disjoint paths in an expander graph. A third, intermediate category is the specific topic of approximate counting via Markov chains, to be discussed in section 10.5.