To start with an analogy, we can in principle compute a mean hitting time from the transition matrix , but we could alternatively estimate by “pure simulation”: simulate times the chain started at and run until hitting , and then (roughly speaking) the empirical average of these hitting times will be . In particular, for fixed we can (roughly speaking) estimate to within a factor in steps. Analogously, consider some notion of mixing time (say or , in the reversible setting). The focus in this book has been on theoretical methods for bounding in terms of , and of theoretical consequences of such bounds. But can we bound by pure simulation? More importantly, in the practical context of Markov chain Monte Carlo, can we devise a “self-verifying” algorithm which produces an approximately-stationary sample from a chain in steps without having prior knowledge of ?
xxx tie up with MCMC discussion.
To say things a little more carefully, a “pure simulation” algorithm is one in which the transition matrix is unknown to the algorithm. Instead, there is a list of the states, and at each step the algorithm can obtain, for any state , a sample from the jump distribution , independent of previous samples.
In the MCMC context we typically have an exponentially large state space and seek polynomial-time estimates. The next lemma (which we leave to the reader to state and prove more precisely) shows that no pure simulation algorithm can guarantee to do this.
Consider a pure simulation algorithm which, given any irreducible -state chain, eventually outputs a random state whose distribution is guaranteed to be within of the stationary distribution in variation distance. Then the algorithm must take steps for every .
Outline of proof. If there is a state with the property that is extremely small, then the stationary distribution will be almost concentrated on ; an algorithm which has some chance of terminating without sampling a step from every state cannot possibly guarantee that no unvisited state has this property.
Lovasz and Winkler [241] observed that the Markov chain tree theorem (Theorem 9.10) could be used to give a “pure simulation” algorithm for generating exactly from the stationary distribution of an arbitrary -state chain. The algorithm takes
(9.14) |
steps, where is the mixing time parameter defined as the smallest such that
(9.15) |
where denotes a random time uniform on , independent of the chain.
xxx tie up with Chapter 4 discussion and [240].
The following two facts are the mathematical ingredients of the algorithm. We quote as Lemma 9.17(a) a result of Ross [299] (see also [53] Theorem XIV.37); part (b) is an immediate consequence.
(a) Let be a probability distribution on and let be independent with distribution . Fix , and consider the digraph with edges . Then with probability (exactly) , the digraph is a tree with edges directed toward the root .
(b) So if is first chosen uniformly at random from , then the probability above is exactly .
As the second ingredient, observe that the Markov chain tree formula (Corollary 9.11) can be rephrased as follows.
Let be the stationary distribution for a transition matrix on . Let be random, uniform on . Let be independent, with . Consider the digraph with edges . Then, conditional on the digraph being a tree with edges directed toward the root , the probability that equals .
So consider the special case of a chain with the property
(9.16) |
The probability of getting any particular digraph under the procedure of Corollary 9.18 is at least the probability of getting that digraph under the procedure of Lemma 9.17, and so the probability of getting some tree is at least , by Lemma 9.17(b). So if the procedure of Corollary 9.18 is repeated times, the chance that some repetition produces a tree is at least , and then the root of the tree has distribution exactly .
Now for any chain, fix . The submultiplicativity (yyy) property of separation, applied to the chain with transition probabilities , shows that if denotes the sum of independent copies of , and is the state reached after steps of the chain started at , then
So putting , the set of probabilities satisfy (9.16).
Combining these procedures, we have (for fixed ) an algorithm which, in a mean number of steps, has chance to produce an output, and (if so) the output has distribution exactly . Of course we initially don’t know the right to use, but we simply try in turn until some output appears, and the mean total number of steps will satisfy the asserted bound (9.14).
A second approach involves the parameter arising in the random target lemma (Chapter 2 yyy). Aldous [24] gives an algorithm which, given and , outputs a random state for which , and such that the mean number of steps is at most
(9.17) |
The details are messy, so let us just outline the (simple) underlying idea. Suppose we can define a procedure which terminates in some random number of steps, where is an estimate of : precisely, suppose that for any
(9.18) |
where is an absolute constant. We can then define an algorithm as follows.
Simulate ; then run the chain for steps and output the final state
where as above denotes a random time uniform on , independent of the chain. This works because, arguing as at xxx,
and so
And the mean number of steps is .
So the issue is to define a procedure terminating in steps, where satisfies (9.18). Label the states and consider the following coalescing paths routine.
(i) Pick a uniform random state .
(ii) Start the chain at state , run until hitting state , and write for the set of states visited along the path.
(iii) Restart the chain at state , run until hitting some state in , and write for the union of and the set of states visited by this second path.
(iiii) Restart the chain at state , and continue this procedure until every state has been visited. Let be the total number of steps.
Write for a r.v. uniform on , and for an independent sequence of copies of . Given a probability distribution on , we can find a (far from unique!) function such that has the prescribed distribution. So given a transition matrix we can find a function such that . Fix such a function. Simultaneously for each state , define
xxx tie up with coupling treatment
Consider the (forwards) coupling time
By considering an initial state chosen according to the stationary distribution ,
This can be used as the basis for an approximate sampling algorithm. As a simple implementation, repeat times the procedure defining , suppose we get finite values each time, then run the chain from an arbitrary initial start for steps and output the final state . Then the error is bounded by a function such that as .
Propp and Wilson [286] observed that by using instead a backwards coupling method (which has been exploited in other contexts – see Notes) one could make an exact sampling algorithm. Regard our i.i.d. sequence as defined for . For each state and each time define
Consider the backwards coupling time
If is a random time such that a.s. then the random variable does not depend on and has the stationary distribution .
xxx describe algorithm
xxx poset story
xxx analysis in general setting and in poset setting.
xxx compare the 3 methods