# 2.1 Notation and reminders of fundamental results

We recommend the textbook of Norris [270] for a clear treatment of the basic theory and a wide selection of applications.

Write $I=\{i,j,k,\ldots\}$ for a finite state space. Write ${\bf P}=p_{i,j}$ for the transition matrix of a discrete-time Markov chain $(X_{t}:t=0,1,2,\ldots)$. To avoid trivialities let’s exclude the one-state chain (two-state chains are useful, because surprisingly often general inequalities are sharp for two-state chains). The $t$-step transition probabilities are $P(X_{t}=j|X_{0}=i)=p^{(t)}_{ij}$, where ${\bf P}^{(t)}={\bf P}{\bf P}\ldots{\bf P}$ is the $t$-fold matrix product. Write $P_{i}(\cdot)$ and $E_{i}(\cdot)$ for probabilities and expectations for the chain started at state $i$ and time $0$. More generally, write $P_{\rho}(\cdot)$ and $E_{\rho}(\cdot)$ for probabilities and expectations for the chain started at time $0$ with distribution $\rho$. Write

 $T_{i}=\min\{t\geq 0:X_{t}=i\}$

for the first hitting time on state $i$, and write

 $T_{i}^{+}=\min\{t\geq 1:X_{t}=i\}.$

Of course $T^{+}_{i}=T_{i}$ unless $X_{0}=i$, in which case we call $T^{+}_{i}$ the first return time to state $i$. More generally, a subset $A$ of states has first hitting time

 $T_{A}=\min\{t\geq 0:X_{t}\in A\}.$

We shall frequently use without comment “obvious” facts like the following.

Start a chain at state $i$, wait until it first hits $j$, then wait until the time ($S$, say) at which it next hits $k$. Then $E_{i}S=E_{i}T_{j}+E_{j}T_{k}$.

The elementary proof sums over the possible values $t$ of $T_{j}$. The sophisticated proof appeals to the strong Markov property ([270] section 1.4) of the stopping time $T_{j}$, which implies

 $E_{i}(S|X_{t},t\leq T_{j})=T_{j}+E_{j}T_{k}.$

Recall that the symbol $\ |\$ is the probabilist’s shorthand for “conditional on”.

## 2.1.1 Stationary distribution and asymptotics

Now assume the chain is irreducible. A fundamental result ([270] Theorems 1.7.7 and 1.5.6) is that there exists a unique stationary distribution $\pi=(\pi_{i}:i\in I)$, i.e. a unique probability distribution satisfying the balance equations

 $\pi_{j}=\sum_{i}\pi_{i}p_{ij}\mbox{ for all }j.$ (2.1)

One way to prove this existence (liked by probabilists because it extends easily to the countable state setting) is to turn Lemma 2.6 below into a definition. That is, fix arbitrary $i_{0}$, define $\tilde{\pi}(i_{0})=1$, and define

 $j$

It can then be checked that $\pi_{i}:=\tilde{\pi}(i)/\sum_{j}\tilde{\pi}(j)$ is a stationary distribution. The point of stationarity is that, if the initial position $X_{0}$ of the chain is random with the stationary distribution $\pi$, then the position $X_{t}$ at any subsequent non-random time $t$ has the same distribution $\pi$, and the process $(X_{t},t=0,1,2,\ldots)$ is then called the stationary chain.

A highlight of elementary theory is that the stationary distribution plays the main role in asymptotic results, as follows.

###### Theorem 2.1 (The ergodic theorem: [270] Theorem 1.10.2)

Let $N_{i}(t)$ be the number of visits to state $i$ during times $0,1,\ldots,t-1$. Then for any initial distribution,

 $t^{-1}N_{i}(t)\rightarrow\pi_{i}\mbox{ a.s., as }t\rightarrow\infty.$
###### Theorem 2.2 (The convergence theorem: [270] Theorem 1.8.3)

For any initial distribution,

 $P(X_{t}=j)\rightarrow\pi_{j}\mbox{ as }t\rightarrow\infty,\mbox{ for all }j$

provided the chain is aperiodic.

Theorem 2.1 is the simplest illustration of the ergodic principle “time averages equal space averages”. Many general identities for Markov chains can be regarded as aspects of the ergodic principle – in particular, in section 2.2.1 we use it to derive expressions for mean hitting times. Such identities are important and useful.

The most classical topic in mathematical probability is time-asymptotics for i.i.d. (independent, identically distributed) random sequences. A vast number of results are known, and (broadly speaking) have simple analogs for Markov chains. Thus the analog of the strong law of large numbers is Theorem 2.1, and the analog of the central limit theorem is Theorem 2.17 below. As mentioned in Chapter 1 section 2.1 (yyy 7/20/99 version) this book has a different focus, on results which say something about the behavior of the chain over some specific finite time, rather than what happens in the indefinite future.

## 2.1.2 Continuous-time chains

The theory of continuous-time Markov chains closely parallels that of the discrete-time chains discussed above. To the reader with background in algorithms or discrete mathematics, the introduction of continuous time may at first seem artificial and unnecessary, but it turns out that certain results are simpler in continuous time. See Norris [270] Chapters 2 and 3 for details on what follows.

A continuous-time chain is specified by transition rates $(q(i,j)=q_{ij},j\neq i)$ which are required to be non-negative but have no constraint on the sums. Given the transition rates, define

 $q_{i}:=\sum_{j:j\neq i}q_{ij}$ (2.2)

and extend $(q_{ij})$ to a matrix ${\bf Q}$ by putting $q_{ii}=-q_{i}.$ The chain $(X_{t}:0\leq t<\infty)$ has two equivalent descriptions.

1. Infinitesimal description. Given that $X_{t}=i$, the chance that $X_{t+dt}=j$ is $q_{ij}dt$ for each $j\neq i$.

2. Jump-and-hold description. Define a transition matrix ${\bf J}$ by $J_{ii}=0$ and

 $J_{ij}:=q_{ij}/q_{i},\ j\neq i.$ (2.3)

Then the continuous-time chain may be constructed by the two-step procedure

(i) Run a discrete-time chain $X^{J}$ with transition matrix ${\bf J}$.

(ii) Given the sequence of states $i_{0},i_{1},i_{2},\ldots$ visited by $X^{J}$, the durations spent in states $i_{m}$ are independent exponential random variables with rates $q_{i_{m}}$.

The discrete-time chain $X^{J}$ is called the jump chain associated with $X_{t}$.

The results in the previous section go over to continuous-time chains with the following modifications.

(a) $P_{i}(X_{t}=j)=Q^{(t)}_{ij}$, where ${\bf Q}^{(t)}:=\exp({\bf Q}t)$.

(b) The definition of $T^{+}_{i}$ becomes

 $T^{+}_{i}=\min\{t\geq T_{I\setminus i}:X_{t}=i\}.$

(c) If the chain is irreducible then there exists a unique stationary distribution $\pi$ characterized by

 $\sum_{i}\pi_{i}q_{ij}=0\mbox{ for all }j.$

(d) In the ergodic theorem we interpret $N_{i}(t)$ as the total duration of time spent in state $i$ during $[0,t]$:

 $N_{i}(t):=\int_{0}^{t}1_{(X_{s}=i)}\ ds.$

(e) In the convergence theorem the assumption of aperiodicity is unnecessary. [This fact is the one of the technical advantages of continuous time.]

(f) The evolution of $P(X_{t}=j)$ as a function of time is given by the forwards equations

 $\frac{d}{dt}P(X_{t}=j)=\sum_{i}P(X_{t}=i)q_{ij}.$ (2.4)

Given a discrete-time chain $X$ with some transition matrix ${\bf P}$, one can define the continuized chain $\widetilde{X}$ to have transition rates $q_{ij}=p_{ij}$, $j\neq i$. In other words, we replace the deterministic time-$1$ holds between jumps by holds with exponential$(1)$ distribution. Many quantities are unchanged by the passage from the discrete time chain to the continuized chain. In particular the stationary distribution $\pi$ and mean hitting times $E_{i}T_{A}$ are unchanged. Therefore results stated in continuous time can often be immediately applied in discrete time, and vice versa.

In different parts of the book we shall be working with discrete or continuous time as a current convention, mentioning where appropriate how results change in the alternate setting. Chapter 4 (yyy section to be written) will give a survey of the differences between these two settings.