2 General Markov Chains (September 10, 1999)

2.5 Distributional identities

It is much harder to get useful information about distributions (rather than mere expectations). Here are a few general results.

2.5.1 Stationarity consequences

A few useful facts about stationary Markov chains are, to experts, just specializations of facts about arbitrary (i.e. not-necessarily-Markov) stationary processes. Here we give a bare-hands proof of one such fact, the relation between the distribution of return time to a subset AA and the distribution of first hitting time to AA from a stationary start. We start in discrete time.

Lemma 2.23

For t=1,2,t=1,2,\ldots,

Pπ(TA=t-1)=Pπ(TA+=t)=π(A)PπA(TA+t)P_{\pi}(T_{A}=t-1)=P_{\pi}(T^{+}_{A}=t)=\pi(A)P_{\pi_{A}}(T^{+}_{A}\geq t)

where πA(i):=πi/π(A),  iA\pi_{A}(i):=\pi_{i}/\pi(A),\ i\in A.

Proof.The first equality is obvious. Now let (Xt)(X_{t}) be the chain started with its stationary distribution π\pi. Then

Pπ(TA+=t)\displaystyle P_{\pi}(T^{+}_{A}=t) =\displaystyle= P(X1A,,Xt-1A,XtA)\displaystyle P(X_{1}\not\in A,\ldots,X_{t-1}\not\in A,X_{t}\in A)
=\displaystyle= P(X1A,,Xt-1A)-P(X1A,,XtA)\displaystyle P(X_{1}\not\in A,\ldots,X_{t-1}\not\in A)-P(X_{1}\not\in A,% \ldots,X_{t}\not\in A)
=\displaystyle= P(X1A,,Xt-1A)-P(X0A,,Xt-1A)\displaystyle P(X_{1}\not\in A,\ldots,X_{t-1}\not\in A)-P(X_{0}\not\in A,% \ldots,X_{t-1}\not\in A)
=\displaystyle= P(X0A,X1A,,Xt-1A)\displaystyle P(X_{0}\in A,X_{1}\not\in A,\ldots,X_{t-1}\not\in A)
=\displaystyle= π(A)PπA(TA+t),\displaystyle\pi(A)P_{\pi_{A}}(T^{+}_{A}\geq t),

establishing the Lemma.

We’ll give two consequences of Lemma 2.23. Summing over tt gives

Corollary 2.24 (Kac’s formula)

π(A)EπATA+=1\pi(A)E_{\pi_{A}}T^{+}_{A}=1

which extends the familiar fact EiTi+=1/πiE_{i}T^{+}_{i}=1/\pi_{i}. Multiplying the identity of Lemma 2.23 by tt and summing gives

EπTA+1\displaystyle E_{\pi}T_{A}+1 =\displaystyle= t1tPπA(TA=t-1)\displaystyle\sum_{t\geq 1}tP_{\pi_{A}}(T_{A}=t-1)
=\displaystyle= π(A)t1tPπA(TA+t)\displaystyle\pi(A)\sum_{t\geq 1}tP_{\pi_{A}}(T^{+}_{A}\geq t)
=\displaystyle= π(A)m112m(m+1)PπA(TA+=m)\displaystyle\pi(A)\sum_{m\geq 1}\frac{1}{2}m(m+1)P_{\pi_{A}}(T^{+}_{A}=m)
=\displaystyle= π(A)2(EπATA++EπA(TA+)2).\displaystyle\frac{\pi(A)}{2}\left(E_{\pi_{A}}T^{+}_{A}+E_{\pi_{A}}(T^{+}_{A})% ^{2}\right).

Appealing to Kac’s formula and rearranging,

EπA(TA+)2\displaystyle E_{\pi_{A}}(T^{+}_{A})^{2} =\displaystyle= 2EπTA+1π(A),\displaystyle\frac{2E_{\pi}T_{A}+1}{\pi(A)}, (2.21)
varπA(TA+)\displaystyle\mbox{var}_{\pi_{A}}(T^{+}_{A}) =\displaystyle= 2EπTA+1π(A)-1π2(A).\displaystyle\frac{2E_{\pi}T_{A}+1}{\pi(A)}-\frac{1}{\pi^{2}(A)}. (2.22)

More generally, there is a relation between EπA(TA+)pE_{\pi_{A}}(T^{+}_{A})^{p} and Eπ(TA+)p-1E_{\pi}(T^{+}_{A})^{p-1}.

In continuous time, the analog of Lemma 2.23 is

Pπ(TA(t,t+dt))=Q(A,Ac)PρA(TA>t)dt,t>0P_{\pi}(T_{A}\in(t,t+dt))=Q(A,A^{c})P_{\rho_{A}}(T_{A}>t)dt,\ t>0 (2.23)

where

Q(A,Ac):=iAjAcqij,   ρA(j):=iAqij/Q(A,Ac),jAc.Q(A,A^{c}):=\sum_{i\in A}\sum_{j\in A^{c}}q_{ij},\ \ \ \rho_{A}(j):=\sum_{i\in A% }q_{ij}/Q(A,A^{c}),j\in A^{c}.

Integrating over t>0t>0 gives the analog of Kac’s formula

Q(A,Ac)EρATA=π(Ac).Q(A,A^{c})E_{\rho_{A}}T_{A}=\pi(A^{c}). (2.24)

2.5.2 A generating function identity

Transform methods are useful in analyzing special examples, though that is not the main focus of this book. We record below just the simplest “transform fact”. We work in discrete time and use generating functions – the corresponding result in continuous time can be stated using Laplace transforms.

Lemma 2.25

Define

Gij(z)=t0Pi(Xt=j)zt,Fij(z)=t0Pi(Tj=t)zt.G_{ij}(z)=\sum_{t\geq 0}P_{i}(X_{t}=j)z^{t},\ F_{ij}(z)=\sum_{t\geq 0}P_{i}(T_% {j}=t)z^{t}.

Then Fij=Gij/Gjj.F_{ij}=G_{ij}/G_{jj}.

Analysis proof. Conditioning on TjT_{j} gives

pij(t)=l=0tPi(Tj=l)pjj(t-l)p^{(t)}_{ij}=\sum_{l=0}^{t}P_{i}(T_{j}=l)p^{(t-l)}_{jj}

and so

t0pij(t)zt=l0t-l0Pi(Tj=l)zlpjj(t-l)zt-l\sum_{t\geq 0}p^{(t)}_{ij}z^{t}=\sum_{l\geq 0}\sum_{t-l\geq 0}P_{i}(T_{j}=l)z^% {l}p^{(t-l)}_{jj}z^{t-l}

Thus Gij(z)=Fij(z)Gjj(z)G_{ij}(z)=F_{ij}(z)G_{jj}(z), and the lemma follows. \Box

Probability proof. Let ζ\zeta have geometric(z)(z) law P(ζ>t)=ztP(\zeta>t)=z^{t}, independent of the chain. Then

Gij(z)\displaystyle G_{ij}(z) =\displaystyle= Ei(number of visits to jj before time ζ\zeta)\displaystyle E_{i}(\mbox{number of visits to $j$ before time $\zeta$})
=\displaystyle= Pi(Tj<ζ)Ej(number of visits to jj before time ζ\zeta)\displaystyle P_{i}(T_{j}<\zeta)\ E_{j}(\mbox{number of visits to $j$ before % time $\zeta$})
=\displaystyle= Fij(z)Gjj(z).\displaystyle F_{ij}(z)G_{jj}(z).

\Box

Note that, differentiating term by term,

EiTj=ddzFij(z)|z=1.E_{i}T_{j}=\left.{\textstyle\frac{d}{dz}}F_{ij}(z)\right|_{z=1}.

This and Lemma 2.25 can be used to give an alternative derivation of the mean hitting time formula, Lemma 2.12.

2.5.3 Distributions and continuization

The distribution at time tt of the continuization X^\hat{X} of a discrete-time chain XX is most simply viewed as a Poisson mixture of the distributions (Xs)(X_{s}). That is, X^t=dXNt\hat{X}_{t}\ \stackrel{d}{=}\ X_{N_{t}} where NtN_{t} has Poisson(t)(t) distribution independent of XX. At greater length,

Pi(X^t=j)=s=0e-ttss!Pi(Xs=j).P_{i}(\hat{X}_{t}=j)=\sum_{s=0}^{\infty}\frac{e^{-t}t^{s}}{s!}P_{i}(X_{s}=j).

This holds because we can construct X^\hat{X} from XX by replacing the deterministic “time 11” holds by random, exponential(1)(1), holds (ξj)(\xi_{j}) between jumps, and then the number NtN_{t} of jumps before time tt has Poisson(t)(t) distribution. Now write Sn=j=1nξjS_{n}=\sum_{j=1}^{n}\xi_{j} for the time of the nn’th jump. Then the hitting time T^A\hat{T}_{A} for the continuized chain is related to the hitting time TAT_{A} of the discrete-time chain by T^A=STA\hat{T}_{A}=S_{T_{A}}. Though these two hitting time distributions are different, their expectations are the same, and their variances are related in a simple way. To see this, the conditional distribution of T^A\hat{T}_{A} given TAT_{A} is the distribution of the sum of TAT_{A} independent ξ\xi’s, so (using the notion of conditional expectation given a random variable)

E(T^A|TA)=TA,var(T^A|TA)=TA.E(\hat{T}_{A}|T_{A})=T_{A},\ {\rm var}\ (\hat{T}_{A}|T_{A})=T_{A}.

Thus (for any initial distribution)

ET^A=EE(T^A|TA)=ETA.E\hat{T}_{A}=EE(\hat{T}_{A}|T_{A})=ET_{A}.

And the conditional variance formula ([133] p. 198)

varZ=Evar(Z|Y)+varE(Z|Y){\rm var}\ Z=E\ {\rm var}\ (Z|Y)+{\rm var}\ E(Z|Y)

tells us that

varT^A\displaystyle{\rm var}\ \ \hat{T}_{A} =\displaystyle= Evar(T^A|TA)+varE(T^A|TA)\displaystyle E{\rm var}\ (\hat{T}_{A}|T_{A})+{\rm var}\ E(\hat{T}_{A}|T_{A}) (2.25)
=\displaystyle= ETA+varTA.\displaystyle ET_{A}+{\rm var}\ T_{A}.