10 Some Graph Theory and Randomized Algorithms (September 1 1999)

10.5 Approximate counting via Markov chains

For a finite set SS, there is a close connection between

(a) having an explicit formula for the size |S||S|

(b) having a bounded-time algorithm for generating a uniform random element of SS.

As an elementary illustration, we all know that there are n!n! permutations of nn objects. From a proof of this fact, we could write down an explicit 1-11-1 mapping ff between the set of permutations and the set A={(a1,a2,,an):1aii}A=\{(a_{1},a_{2},\ldots,a_{n}):1\leq a_{i}\leq i\}. Then we could simulate a uniform random permutation by first simulating a uniform random element aa of AA and then computing f(a)f(a). Conversely, given an algorithm which was guaranteed to produce a uniform random permutation after k(n)k(n) calls to a random number generator, we could (in principle) analyze the working of the algorithm in order to calculate the chance pp of getting the identity permutation. Then we can say that number of permutations equals 1/p1/p.

A more subtle observation is that, in certain settings, having an algorithm for generating an approximately uniform random element of SS can be used to estimate approximately the size |S||S|. The idea is to estimate successive ratios by sampling. Suppose we can relate SS to smaller sets

S=SLSL-1S2S1S=S_{L}\supset S_{L-1}\supset\ldots\supset S_{2}\supset S_{1} (10.20)

where |S1||S_{1}| is known, the ratios pi:=|Si+1|/|Si|p_{i}:=|S_{i+1}|/|S_{i}| are bounded away from 00, and where we can sample uniformly from each SiS_{i}. Then take kk uniform random samples from each SiS_{i} and find the sample proportion WiW_{i} which fall into Si-1S_{i-1}. Because |S|=|S1|i=2L|Si|/|Si-1||S|=|S_{1}|\ \prod_{i=2}^{L}|S_{i}|/|S_{i-1}|, we use

N^:=|S1|i=2LWi-1\hat{N}:=|S_{1}|\ \prod_{i=2}^{L}W_{i}^{-1}

as an estimate of |S||S|. To study its accuracy, it is simpler to consider |S|/N^=i=2LWi/pi|S|/\hat{N}=\prod_{i=2}^{L}W_{i}/p_{i}. Clearly E(|S|/N^)=1E(|S|/\hat{N})=1, and we can calculate the variance by

var(|S|N^)\displaystyle{\rm var}\ \left(\frac{|S|}{\hat{N}}\right) =\displaystyle= var(i=2LWipi)\displaystyle{\rm var}\ \left(\prod_{i=2}^{L}\frac{W_{i}}{p_{i}}\right)
=\displaystyle= i=2L(1+var(Wi/pi))-1\displaystyle\prod_{i=2}^{L}(1+{\rm var}\ (W_{i}/p_{i}))\quad-1
=\displaystyle= i=2L(1+1-pipik)-1.\displaystyle\prod_{i=2}^{L}(1+{\textstyle\frac{1-p_{i}}{p_{i}k}})\quad-1.

The simplest case is where we know a theoretical lower bound p*p_{*} for the pip_{i}. Then by taking k=O(ε-2L/p*)k=O(\varepsilon^{-2}L/p_{*}) we get

var(|S|N^)exp(Lp*k)-1=O(ε2).{\rm var}\ \left(\frac{|S|}{\hat{N}}\right)\leq\exp({\textstyle\frac{L}{p_{*}k% }})\ -1=O(\varepsilon^{2}).

In other words, with a total number

Lk=O(ε-2L2/p*)Lk=O(\varepsilon^{-2}L^{2}/p_{*}) (10.21)

of random samples, we can statistically estimate |S||S| to within a factor 1±O(ε)1\pm O(\varepsilon).

The conceptual point of invoking intermediate sets SiS_{i} is that the overall ratio |S1|/|S||S_{1}|/|S| may be exponentially small in some size parameter, so that trying to estimate this ratio directly by sampling from SS would involve order |S|/|S1||S|/|S_{1}|, i.e. exponentially many, samples. If we can specify the intermediate sets with ratios pip_{i} bounded away from 00 and 11 then L=O(log(|S|/|S1|))L=O(\log(|S|/|S_{1}|)) and so the number of samples required in (10.21) depends on log(|S|/|S1|)\log(|S|/|S_{1}|) instead of |S|/|S1||S|/|S_{1}|.

The discussion so far has not involved Markov chains. From our viewpoint, the interesting setting is where we cannot directly get uniform random samples from a typical SiS_{i}, but instead need to use Markov chain Monte Carlo. That is, on each SiS_{i} we set up a reversible Markov chain with uniform stationary distribution (i.e. a chain whose transition matrix is symmetric in the sense pvwpwvp_{vw}\equiv p_{wv}) Assume we have a bound τ1\tau_{1} on the τ1\tau_{1}-values of all these chains. Then as a small modification of Corollary 10.3, one can get mm samples from the combined chains whose joint distribution is close (in variation distance) to the the distribution of independent samples from the uniform distributions in O(τ1mlogm)O(\tau_{1}m\log m) steps. As above, if we can specify the intermediate sets with ratios pip_{i} bounded away from 00 and 11 then we need m=O(ε-2log2(|S|/|S1|))m=O(\varepsilon^{-2}\log^{2}(|S|/|S_{1}|)) samples, and so (ignoring dependence on ε\varepsilon) the total number of steps in all the chains is O(τ1log2+o(1)(|S|/|S1|)O(\tau_{1}\log^{2+o(1)}(|S|/|S_{1}|).

In summary, to implement this method of approximate counting via Markov chains, one needs

  • a way to specify the intermediate sets (10.20)

  • a way to specify Markov chains on the SiS_{i} whose mixing times can be rigorously bounded.

Two particular examples have been studied in detail, and historically these examples provided major impetus for the development of technical tools to estimate mixing times. Though the details are too technical for this book, we outline these examples in the next two sections, and then consider in detail the setting of self-avoiding walks.

10.5.1 Volume of a convex set

Given a closed convex set KK in RdR^{d}, for large dd, how can we algorithmically calculate the volume of KK? Regard KK as being described by an oracle, that is for any xRdx\in R^{d} we can determine in one step whether or not xKx\in K. Perhaps surprisingly, there is no known deterministic algorithm which finds vol(K){\rm vol}(K) approximately (i.e. to within a factor 1±ε1\pm\varepsilon) in a polynomial in dd number of steps. But this problem is amenable to “approximate counting via Markov chains” technique. This line of research was initiated by Dyer et al [134, 135] who produced an algorithm requiring O(d23+o(1))O(d^{23+o(1)}) steps. A long sequence of papers (see the Notes) studied variants of both the Markov chains and the analytic techniques in order to reduce the polynomial degree. Currently the best bound is O(d5+o(1))O(d^{5+o(1)}), due to Kannan et al [206].

To outline the procedure in this example, suppose we know B(1)KB(r)B(1)\subset K\subset B(r), where B(r)B(r) is the ball of radius r=r(d)r=r(d). (It turns out that one can transform any convex set into one satisfying these constraints with r=O(d3/2)r=O(d^{3/2}).) We specify an increasing sequence of convex subsets

B(1)=K0K1KL=KB(1)=K_{0}\subset K_{1}\subset\ldots\subset K_{L}=K

by setting Ki:=B(2i/d)KK_{i}:=B(2^{i/d})\cap K. This makes the ratios of successive volumes bounded by 22 and requires L=O(dlogd)L=O(d\log d) intermediate sets. So the issue is to design and analyze a chain on a typical convex set KiK_{i} whose stationary distribution is uniform. Various Markov chains have been used: simple random walk on a fine discrete lattice restricted to KiK_{i}, or spherically symmetric walks. The analysis of the chains has used Cheeger inequalities for chains and the refinement of classical isoperimetric inequalities for convex sets. Recent work of Bubley et al [78] has successfully introduced coupling methods, and it is a challenging problem to refine these coupling methods. There is a suggestive analogy with theoretical study of Brownian motion in a convex set – see Chapter 13 section 1.3 (yyy 7/29/99 version).

10.5.2 Matchings in a graph

For a finite, not necessarily regular, graph G0G_{0} let (G0){\cal M}(G_{0}) be the set of all matchings in G0G_{0}, where a matching MM is a subset of edges such that no vertex is in more than one edge of MM. Suppose we want to count |(G0)||{\cal M}(G_{0})| (for the harder setting of counting perfect matchings see the Notes). Enumerate the edges of G0G_{0} as e1,e2,,eLe_{1},e_{2},\ldots,e_{L}, where LL is the number of edges of G0G_{0}. Write GiG_{i} for the graph GG with edges e1,,eie_{1},\ldots,e_{i} deleted. A matching of GiG_{i} can be identified with a matching of Gi-1G_{i-1} which does not contain eie_{i}, so we can write

(GL-1)(GL-2)(G1)(G0).{\cal M}(G_{L-1})\subset{\cal M}(G_{L-2})\subset\ldots\subset{\cal M}(G_{1})% \subset{\cal M}(G_{0}).

Since GL-1G_{L-1} has one edge, we know |(GL-1)|=2|{\cal M}(G_{L-1})|=2. The ratio |(Gi+1)|/|(Gi)||{\cal M}(G_{i+1})|/|{\cal M}(G_{i})| is the probability that a uniform random matching of GiG_{i} does not contain the edge ei+1e_{i+1}. So the issue is to design and analyze a chain on a typical set (Gi){\cal M}(G_{i}) of matchings whose stationary distribution is uniform. Here is a natural such chain. From a matching M0M_{0}, pick a uniform random edge ee of G0G_{0}, and construct a new matching M1M_{1} from M0M_{0} and ee as follows.

If eM0e\in M_{0} then set M1=M0{e}M_{1}=M_{0}\setminus\{e\}.

If neither end-vertex of ee is in an edge of M0M_{0} then set M1=M0{e}M_{1}=M_{0}\cup\{e\}.

If exactly one end-vertex of ee is in an edge (ee^{\prime} say) of M0M_{0} then set M1={e}M0{e}M_{1}=\{e\}\cup M_{0}\setminus\{e^{\prime}\}.

This construction (the idea goes back to Broder [69]) yields a chain with symmetric transition matrix, because each possible transition has chance 1/L1/L. An elegant analysis by Jerrum and Sinclair [197], outlined in Jerrum [199] section 5.1, used the distinguished paths technique to prove that on a nn-vertex LL-edge graph

τ2=O(Ln).\tau_{2}=O(Ln).

Since the number of matchings can be bounded crudely by n!n!,

τ1=O(τ2logn!)=O(Ln2logn).\tau_{1}=O(\tau_{2}\log n!)=O(Ln^{2}\log n). (10.22)

10.5.3 Simulating self-avoiding walks

xxx to be written