We recommend the textbook of Norris [270] for a clear treatment of the basic theory and a wide selection of applications.
Write for a finite state space. Write for the transition matrix of a discrete-time Markov chain . To avoid trivialities let’s exclude the one-state chain (two-state chains are useful, because surprisingly often general inequalities are sharp for two-state chains). The -step transition probabilities are , where is the -fold matrix product. Write and for probabilities and expectations for the chain started at state and time . More generally, write and for probabilities and expectations for the chain started at time with distribution . Write
for the first hitting time on state , and write
Of course unless , in which case we call the first return time to state . More generally, a subset of states has first hitting time
We shall frequently use without comment “obvious” facts like the following.
Start a chain at state , wait until it first hits , then wait until the time (, say) at which it next hits . Then .
The elementary proof sums over the possible values of . The sophisticated proof appeals to the strong Markov property ([270] section 1.4) of the stopping time , which implies
Recall that the symbol is the probabilist’s shorthand for “conditional on”.
Now assume the chain is irreducible. A fundamental result ([270] Theorems 1.7.7 and 1.5.6) is that there exists a unique stationary distribution , i.e. a unique probability distribution satisfying the balance equations
(2.1) |
One way to prove this existence (liked by probabilists because it extends easily to the countable state setting) is to turn Lemma 2.6 below into a definition. That is, fix arbitrary , define , and define
It can then be checked that is a stationary distribution. The point of stationarity is that, if the initial position of the chain is random with the stationary distribution , then the position at any subsequent non-random time has the same distribution , and the process is then called the stationary chain.
A highlight of elementary theory is that the stationary distribution plays the main role in asymptotic results, as follows.
Let be the number of visits to state during times . Then for any initial distribution,
For any initial distribution,
provided the chain is aperiodic.
Theorem 2.1 is the simplest illustration of the ergodic principle “time averages equal space averages”. Many general identities for Markov chains can be regarded as aspects of the ergodic principle – in particular, in section 2.2.1 we use it to derive expressions for mean hitting times. Such identities are important and useful.
The most classical topic in mathematical probability is time-asymptotics for i.i.d. (independent, identically distributed) random sequences. A vast number of results are known, and (broadly speaking) have simple analogs for Markov chains. Thus the analog of the strong law of large numbers is Theorem 2.1, and the analog of the central limit theorem is Theorem 2.17 below. As mentioned in Chapter 1 section 2.1 (yyy 7/20/99 version) this book has a different focus, on results which say something about the behavior of the chain over some specific finite time, rather than what happens in the indefinite future.
The theory of continuous-time Markov chains closely parallels that of the discrete-time chains discussed above. To the reader with background in algorithms or discrete mathematics, the introduction of continuous time may at first seem artificial and unnecessary, but it turns out that certain results are simpler in continuous time. See Norris [270] Chapters 2 and 3 for details on what follows.
A continuous-time chain is specified by transition rates which are required to be non-negative but have no constraint on the sums. Given the transition rates, define
(2.2) |
and extend to a matrix by putting The chain has two equivalent descriptions.
1. Infinitesimal description. Given that , the chance that is for each .
2. Jump-and-hold description. Define a transition matrix by and
(2.3) |
Then the continuous-time chain may be constructed by the two-step
procedure
(i) Run a discrete-time chain with transition matrix .
(ii) Given the sequence of states visited by ,
the durations spent in states are independent exponential
random variables with rates .
The discrete-time chain is called the jump chain
associated with .
The results in the previous section go over to continuous-time chains with the following modifications.
(a) , where .
(b) The definition of becomes
(c) If the chain is irreducible then there exists a unique stationary distribution characterized by
(d) In the ergodic theorem we interpret as the total duration of time spent in state during :
(e) In the convergence theorem the assumption of aperiodicity is unnecessary. [This fact is the one of the technical advantages of continuous time.]
(f) The evolution of as a function of time is given by the forwards equations
(2.4) |
Given a discrete-time chain with some transition matrix , one can define the continuized chain to have transition rates , . In other words, we replace the deterministic time- holds between jumps by holds with exponential distribution. Many quantities are unchanged by the passage from the discrete time chain to the continuized chain. In particular the stationary distribution and mean hitting times are unchanged. Therefore results stated in continuous time can often be immediately applied in discrete time, and vice versa.
In different parts of the book we shall be working with discrete or continuous time as a current convention, mentioning where appropriate how results change in the alternate setting. Chapter 4 (yyy section to be written) will give a survey of the differences between these two settings.