# 10.5 Approximate counting via Markov chains

For a finite set $S$, there is a close connection between

(a) having an explicit formula for the size $|S|$

(b) having a bounded-time algorithm for generating a uniform random element of $S$.

As an elementary illustration, we all know that there are $n!$ permutations of $n$ objects. From a proof of this fact, we could write down an explicit $1-1$ mapping $f$ between the set of permutations and the set $A=\{(a_{1},a_{2},\ldots,a_{n}):1\leq a_{i}\leq i\}$. Then we could simulate a uniform random permutation by first simulating a uniform random element $a$ of $A$ and then computing $f(a)$. Conversely, given an algorithm which was guaranteed to produce a uniform random permutation after $k(n)$ calls to a random number generator, we could (in principle) analyze the working of the algorithm in order to calculate the chance $p$ of getting the identity permutation. Then we can say that number of permutations equals $1/p$.

A more subtle observation is that, in certain settings, having an algorithm for generating an approximately uniform random element of $S$ can be used to estimate approximately the size $|S|$. The idea is to estimate successive ratios by sampling. Suppose we can relate $S$ to smaller sets

 $S=S_{L}\supset S_{L-1}\supset\ldots\supset S_{2}\supset S_{1}$ (10.20)

where $|S_{1}|$ is known, the ratios $p_{i}:=|S_{i+1}|/|S_{i}|$ are bounded away from $0$, and where we can sample uniformly from each $S_{i}$. Then take $k$ uniform random samples from each $S_{i}$ and find the sample proportion $W_{i}$ which fall into $S_{i-1}$. Because $|S|=|S_{1}|\ \prod_{i=2}^{L}|S_{i}|/|S_{i-1}|$, we use

 $\hat{N}:=|S_{1}|\ \prod_{i=2}^{L}W_{i}^{-1}$

as an estimate of $|S|$. To study its accuracy, it is simpler to consider $|S|/\hat{N}=\prod_{i=2}^{L}W_{i}/p_{i}$. Clearly $E(|S|/\hat{N})=1$, and we can calculate the variance by

 $\displaystyle{\rm var}\ \left(\frac{|S|}{\hat{N}}\right)$ $\displaystyle=$ $\displaystyle{\rm var}\ \left(\prod_{i=2}^{L}\frac{W_{i}}{p_{i}}\right)$ $\displaystyle=$ $\displaystyle\prod_{i=2}^{L}(1+{\rm var}\ (W_{i}/p_{i}))\quad-1$ $\displaystyle=$ $\displaystyle\prod_{i=2}^{L}(1+{\textstyle\frac{1-p_{i}}{p_{i}k}})\quad-1.$

The simplest case is where we know a theoretical lower bound $p_{*}$ for the $p_{i}$. Then by taking $k=O(\varepsilon^{-2}L/p_{*})$ we get

 ${\rm var}\ \left(\frac{|S|}{\hat{N}}\right)\leq\exp({\textstyle\frac{L}{p_{*}k% }})\ -1=O(\varepsilon^{2}).$

In other words, with a total number

 $Lk=O(\varepsilon^{-2}L^{2}/p_{*})$ (10.21)

of random samples, we can statistically estimate $|S|$ to within a factor $1\pm O(\varepsilon)$.

The conceptual point of invoking intermediate sets $S_{i}$ is that the overall ratio $|S_{1}|/|S|$ may be exponentially small in some size parameter, so that trying to estimate this ratio directly by sampling from $S$ would involve order $|S|/|S_{1}|$, i.e. exponentially many, samples. If we can specify the intermediate sets with ratios $p_{i}$ bounded away from $0$ and $1$ then $L=O(\log(|S|/|S_{1}|))$ and so the number of samples required in (10.21) depends on $\log(|S|/|S_{1}|)$ instead of $|S|/|S_{1}|$.

The discussion so far has not involved Markov chains. From our viewpoint, the interesting setting is where we cannot directly get uniform random samples from a typical $S_{i}$, but instead need to use Markov chain Monte Carlo. That is, on each $S_{i}$ we set up a reversible Markov chain with uniform stationary distribution (i.e. a chain whose transition matrix is symmetric in the sense $p_{vw}\equiv p_{wv}$) Assume we have a bound $\tau_{1}$ on the $\tau_{1}$-values of all these chains. Then as a small modification of Corollary 10.3, one can get $m$ samples from the combined chains whose joint distribution is close (in variation distance) to the the distribution of independent samples from the uniform distributions in $O(\tau_{1}m\log m)$ steps. As above, if we can specify the intermediate sets with ratios $p_{i}$ bounded away from $0$ and $1$ then we need $m=O(\varepsilon^{-2}\log^{2}(|S|/|S_{1}|))$ samples, and so (ignoring dependence on $\varepsilon$) the total number of steps in all the chains is $O(\tau_{1}\log^{2+o(1)}(|S|/|S_{1}|)$.

In summary, to implement this method of approximate counting via Markov chains, one needs

• a way to specify the intermediate sets (10.20)

• a way to specify Markov chains on the $S_{i}$ whose mixing times can be rigorously bounded.

Two particular examples have been studied in detail, and historically these examples provided major impetus for the development of technical tools to estimate mixing times. Though the details are too technical for this book, we outline these examples in the next two sections, and then consider in detail the setting of self-avoiding walks.

## 10.5.1 Volume of a convex set

Given a closed convex set $K$ in $R^{d}$, for large $d$, how can we algorithmically calculate the volume of $K$? Regard $K$ as being described by an oracle, that is for any $x\in R^{d}$ we can determine in one step whether or not $x\in K$. Perhaps surprisingly, there is no known deterministic algorithm which finds ${\rm vol}(K)$ approximately (i.e. to within a factor $1\pm\varepsilon$) in a polynomial in $d$ number of steps. But this problem is amenable to “approximate counting via Markov chains” technique. This line of research was initiated by Dyer et al [134, 135] who produced an algorithm requiring $O(d^{23+o(1)})$ steps. A long sequence of papers (see the Notes) studied variants of both the Markov chains and the analytic techniques in order to reduce the polynomial degree. Currently the best bound is $O(d^{5+o(1)})$, due to Kannan et al [206].

To outline the procedure in this example, suppose we know $B(1)\subset K\subset B(r)$, where $B(r)$ is the ball of radius $r=r(d)$. (It turns out that one can transform any convex set into one satisfying these constraints with $r=O(d^{3/2})$.) We specify an increasing sequence of convex subsets

 $B(1)=K_{0}\subset K_{1}\subset\ldots\subset K_{L}=K$

by setting $K_{i}:=B(2^{i/d})\cap K$. This makes the ratios of successive volumes bounded by $2$ and requires $L=O(d\log d)$ intermediate sets. So the issue is to design and analyze a chain on a typical convex set $K_{i}$ whose stationary distribution is uniform. Various Markov chains have been used: simple random walk on a fine discrete lattice restricted to $K_{i}$, or spherically symmetric walks. The analysis of the chains has used Cheeger inequalities for chains and the refinement of classical isoperimetric inequalities for convex sets. Recent work of Bubley et al [78] has successfully introduced coupling methods, and it is a challenging problem to refine these coupling methods. There is a suggestive analogy with theoretical study of Brownian motion in a convex set – see Chapter 13 section 1.3 (yyy 7/29/99 version).

## 10.5.2 Matchings in a graph

For a finite, not necessarily regular, graph $G_{0}$ let ${\cal M}(G_{0})$ be the set of all matchings in $G_{0}$, where a matching $M$ is a subset of edges such that no vertex is in more than one edge of $M$. Suppose we want to count $|{\cal M}(G_{0})|$ (for the harder setting of counting perfect matchings see the Notes). Enumerate the edges of $G_{0}$ as $e_{1},e_{2},\ldots,e_{L}$, where $L$ is the number of edges of $G_{0}$. Write $G_{i}$ for the graph $G$ with edges $e_{1},\ldots,e_{i}$ deleted. A matching of $G_{i}$ can be identified with a matching of $G_{i-1}$ which does not contain $e_{i}$, so we can write

 ${\cal M}(G_{L-1})\subset{\cal M}(G_{L-2})\subset\ldots\subset{\cal M}(G_{1})% \subset{\cal M}(G_{0}).$

Since $G_{L-1}$ has one edge, we know $|{\cal M}(G_{L-1})|=2$. The ratio $|{\cal M}(G_{i+1})|/|{\cal M}(G_{i})|$ is the probability that a uniform random matching of $G_{i}$ does not contain the edge $e_{i+1}$. So the issue is to design and analyze a chain on a typical set ${\cal M}(G_{i})$ of matchings whose stationary distribution is uniform. Here is a natural such chain. From a matching $M_{0}$, pick a uniform random edge $e$ of $G_{0}$, and construct a new matching $M_{1}$ from $M_{0}$ and $e$ as follows.

If $e\in M_{0}$ then set $M_{1}=M_{0}\setminus\{e\}$.

If neither end-vertex of $e$ is in an edge of $M_{0}$ then set $M_{1}=M_{0}\cup\{e\}$.

If exactly one end-vertex of $e$ is in an edge ($e^{\prime}$ say) of $M_{0}$ then set $M_{1}=\{e\}\cup M_{0}\setminus\{e^{\prime}\}$.

This construction (the idea goes back to Broder [69]) yields a chain with symmetric transition matrix, because each possible transition has chance $1/L$. An elegant analysis by Jerrum and Sinclair [197], outlined in Jerrum [199] section 5.1, used the distinguished paths technique to prove that on a $n$-vertex $L$-edge graph

 $\tau_{2}=O(Ln).$

Since the number of matchings can be bounded crudely by $n!$,

 $\tau_{1}=O(\tau_{2}\log n!)=O(Ln^{2}\log n).$ (10.22)

## 10.5.3 Simulating self-avoiding walks

xxx to be written