4 Hitting and Convergence Time, and Flow Rate, Parameters for Reversible Markov Chains (October 11, 1994)

4.6 Induced and product chains

Here we record the behavior of our parameters under two natural operations on chains.

4.6.1 Induced chains

Given a Markov chain XtX_{t} on state space II and a function f:II^f:I\rightarrow\hat{I}, the process f(Xt)f(X_{t}) is typically not a Markov chain. But we can invent a chain which substitutes. In discrete time (the continuous case is similar) define the induced chain X^t\hat{X}_{t} to have transition matrix

p^i^,j^=Pπ(f(X1)=j^|f(X0)=i^)=i,j:f(i)=i^,f(j)=j^πipi,ji:f(i)=i^πi.\hat{p}_{\hat{i},\hat{j}}=P_{\pi}(f(X_{1})=\hat{j}|f(X_{0})=\hat{i})=\frac{% \sum\sum_{i,j:f(i)=\hat{i},f(j)=\hat{j}}\pi_{i}p_{i,j}}{\sum_{i:f(i)=\hat{i}}% \pi_{i}}. (4.39)

More informatively, we are matching the stationary flow rates:

Pπ^(X^0=i^,X^1=j^)=Pπ(f(X0)=i^,f(X1)=j^).P_{\hat{\pi}}(\hat{X}_{0}=\hat{i},\hat{X}_{1}=\hat{j})=P_{\pi}(f(X_{0})=\hat{i% },f(X_{1})=\hat{j}). (4.40)

The reader may check that (4.39) and (4.40) are equivalent. Under our standing assumption that XtX_{t} is reversible, the induced chain is also reversible (though the construction works for general chains as well). In the electrical network interpretation, we are shorting together vertices with the same ff-values. It seems intuitively plausible that this “shorting” can only decrease our parameters describing convergence and mean hitting time behavior.

Proposition 4.44 (The contraction principle)

The values of τ*,τ0,τ2\tau^{*},\tau_{0},\tau_{2}

and τc\tau_{c} in an induced chain are less than or equal to the corresponding values in the original chain.

Proof. A function g^:I^R\hat{g}:\hat{I}\rightarrow R pulls back to a function gg^(f()):IRg\equiv\hat{g}(f(\cdot)):I\rightarrow R. So the Dirichlet principle (Chapter 3 yyy) shows that mean commute times can only decrease when passing to an induced chain:

Ef(i)T^f(j)+Ef(j)T^f(i)EiTj+EjTi.E_{f(i)}\hat{T}_{f(j)}+E_{f(j)}\hat{T}_{f(i)}\leq E_{i}T_{j}+E_{j}T_{i}.

This establishes the assertion for τ*\tau^{*} and τ0\tau_{0}, and the extremal characterization of relaxation time works similarly for τ2\tau_{2}. The assertion about τc\tau_{c} is immediate from the definition, since a partition of I^\hat{I} pulls back to a partition of II. \Box

On the other hand, it is easy to see that shorting may increase a one-sided mean hitting time. For example, random walk on the unweighted graph on the left has EaTb=1E_{a}T_{b}=1, but when we short {a,d}\{a,d\} together to form vertex a^\hat{a} in the graph on the right, Ea^T^b=2E_{\hat{a}}\hat{T}_{b}=2.

abcdbca^\hat{a}

Finally, the behavior of the τ1\tau_{1}-family under shorting is unclear.

Open Problem 4.45

Is the value of τ1(2)\tau_{1}^{(2)} in an induced chain bounded by KK times the value of τ1(2)\tau_{1}^{(2)} in the original chain, for some absolute constant KK? For K=1K=1?

4.6.2 Product chains

Given Markov chains on state spaces I(1)I^{(1)} and I(2)I^{(2)}, there is a natural concept of a “product chain” on state space I(1)×I(2)I^{(1)}\times I^{(2)}. It is worth writing this concept out in detail for two reasons. First, to prevent confusion between several different possible definitions in discrete time. Second, because the behavior of relaxation times of product chains is relevant to simple examples and has a surprising application (section 4.6.3).

As usual, things are simplest in continuous time. Define the product chain to be

𝐗t=(Xt(1),Xt(2)){\bf X}_{t}=(X^{(1)}_{t},X^{(2)}_{t})

where the components Xt(1)X^{(1)}_{t} and Xt(2)X^{(2)}_{t} are independent versions of the given chains. So

Pi1,i2(𝐗t=(j1,j2))=Pi1(1)(Xt(1)=j1)Pi2(2)(Xt(2)=j2).P_{i_{1},i_{2}}({\bf X}_{t}=(j_{1},j_{2}))=P^{(1)}_{i_{1}}(X^{(1)}_{t}=j_{1})P% ^{(2)}_{i_{2}}(X^{(2)}_{t}=j_{2}). (4.41)

Using the interpretation of relaxation time as asymptotic rate of convergence of transition probabilities, (Chapter 3 yyy) it is immediate that 𝐗{\bf X} has relaxation time

τ2=max(τ2(1),τ2(2)).\tau_{2}=\max(\tau_{2}^{(1)},\tau_{2}^{(2)}). (4.42)

In discrete time there are two different general notions of “product chain”. One could consider the chain (Xt(1),Xt(2))(X^{(1)}_{t},X^{(2)}_{t}) whose coordinates are the independent chains. This is the chain with transition probabilities

(i1,i2)(j1,j2): probability P(1)(i1,j1)P(2)(i2,j2)(i_{1},i_{2})\rightarrow(j_{1},j_{2}):\ \mbox{ probability }P^{(1)}(i_{1},j_{1% })P^{(2)}(i_{2},j_{2})

and has relaxation time

τ2=max(τ2(1),τ2(2)).\tau_{2}=\max(\tau_{2}^{(1)},\tau_{2}^{(2)}).

But it is more natural to define the product chain 𝐗t{\bf X}_{t} to be the chain with transition probabilities

(i1,i2)(j1,i2): probability 12P(1)(i1,j1)(i_{1},i_{2})\rightarrow(j_{1},i_{2}):\ \mbox{ probability }\frac{1}{2}P^{(1)}% (i_{1},j_{1})
(i1,i2)(i1,j2): probability 12P(2)(i2,j2).(i_{1},i_{2})\rightarrow(i_{1},j_{2}):\ \mbox{ probability }\frac{1}{2}P^{(2)}% (i_{2},j_{2}).

This is the jump chain derived from the product of the continuized chains, and has relaxation time

τ2=2max(τ2(1),τ2(2)).\tau_{2}=2\max(\tau_{2}^{(1)},\tau_{2}^{(2)}). (4.43)

Again, this can be seen without need for calculation: the continuized chain is just the continuous-time product chain run at half speed.

This definition and (4.43) extend to dd-fold products in the obvious way. Random walk on ZdZ^{d} is the product of dd copies of random walk on Z1Z^{1}, and random walk on the dd-cube (Chapter 5 yyy) is the product of dd copies of random walk on {0,1}\{0,1\}.

Just to make things more confusing, given graphs G(1)G^{(1)} and G(2)G^{(2)} the Cartesian product graph is defined to have edges

(v1,w1)(v2,w1) for v1v2(v_{1},w_{1})\leftrightarrow(v_{2},w_{1})\mbox{ for }v_{1}\leftrightarrow v_{2}
(v1,w1)(v1,w2) for w1w2.(v_{1},w_{1})\leftrightarrow(v_{1},w_{2})\mbox{ for }w_{1}\leftrightarrow w_{2}.

If both G(1)G^{(1)} and G(2)G^{(2)} are rr-regular then random walk on the product graph is the product of the random walks on the individual graphs. But in general, discrete-time random walk on the product graph is the jump chain of the product of the fluid model (Chapter 3 yyy) continuous-time random walks. So if the graphs are r1r_{1}- and r2r_{2}-regular then the discrete-time random walk on the product graph has the product distribution as its stationary distribution and has relaxation time

τ2=(r1+r2)max(τ2(1)/r1,τ2(2)/r2).\tau_{2}=(r_{1}+r_{2})\max(\tau^{(1)}_{2}/r_{1},\tau^{(2)}_{2}/r_{2}).

But for non-regular graphs, neither assertion is true.

Let us briefly discuss the behavior of some other parameters under products. For the continuous-time product (4.41), the total variation distance d¯\bar{d} of section 4.3 satisfies

d¯(t)=1-(1-d¯(1)(t))(1-d¯(2)(t))\bar{d}(t)=1-(1-\bar{d}^{(1)}(t))(1-\bar{d}^{(2)}(t))

and we deduce the crude bound

τ12max(τ1(1),τ1(2))\tau_{1}\leq 2\max(\tau_{1}^{(1)},\tau_{1}^{(2)})

where superscripts refer to the graphs G(1),G(2)G^{(1)},G^{(2)} and not to the parameters in section 4.3.1. For the discrete-time chain, there is an extra factor of 2 from “slowing down” (cf. (4.42,4.43)), leading to

τ14max(τ1(1),τ1(2)).\tau_{1}\leq 4\max(\tau_{1}^{(1)},\tau_{1}^{(2)}).

Here our conventions are a bit confusing: this inequality refers to the discrete-time product chain, but as in section 4.3 we define τ1\tau_{1} via the continuized chain – we leave the reader to figure out the analogous result for τ1disc\tau_{1}^{{\mbox{disc}}} discussed in section 4.3.3.

To state a result for τ0\tau_{0}, consider the continuous-time product (Xt(1),Xt(2))(X^{(1)}_{t},X^{(2)}_{t}) of independent copies of the same nn-state chain. If the underlying chain has eigenvalues (λi;1in)(\lambda_{i};1\leq i\leq n) then the product chain has eigenvalues (λi+λj;1i,jn)(\lambda_{i}+\lambda_{j};1\leq i,j\leq n) and so by the eigentime identity

τ0product\displaystyle\tau_{0}^{\mbox{product}} =\displaystyle= i,j1;(i,j)(1,1)1λi+λj\displaystyle\sum_{i,j\geq 1;(i,j)\neq(1,1)}\frac{1}{\lambda_{i}+\lambda_{j}}
=\displaystyle= 2τ0+i,j21λi+λj\displaystyle 2\tau_{0}+\sum_{i,j\geq 2}\frac{1}{\lambda_{i}+\lambda_{j}}
=\displaystyle= 2τ0+2i=2nj=in1λi+λj\displaystyle 2\tau_{0}+2\sum_{i=2}^{n}\sum_{j=i}^{n}\frac{1}{\lambda_{i}+% \lambda_{j}}
\displaystyle\leq 2τ0+i=2n(n-i+1)2λi\displaystyle 2\tau_{0}+\sum_{i=2}^{n}(n-i+1)\frac{2}{\lambda_{i}}
\displaystyle\leq 2τ0+(n-1)2τ0=2nτ0.\displaystyle 2\tau_{0}+(n-1)2\tau_{0}=2n\tau_{0}.

Thus in discrete time

τ0product4nτ0.\tau_{0}^{\mbox{product}}\leq 4n\tau_{0}. (4.44)

4.6.3 Efron-Stein inequalities

The results above concerning relaxation times of product chains are essentially obvious using the interpretation of relaxation time as asymptotic rate of convergence of transition probabilities, but they are much less obvious using the extremal interpretation. Indeed, consider the nn-fold product of a single chain XX with itself. Write (X0,X1)(X_{0},X_{1}) for the distribution at times 00 and 11 of XX, and τ2\tau_{2} for the relaxation time of XX. Combining (4.43) with the extremal characterization (4.22) of the relaxation time for the product chain, a brief calculation gives the following result.

Corollary 4.46

Let f:InRf:I^{n}\rightarrow R be arbitrary. Let (X(i),Y(i))(X^{(i)},Y^{(i)}), i=1,,ni=1,\ldots,n be independent copies of (X0,X1)(X_{0},X_{1}). Let Z=f(X(1),,X(n))Z=f(X^{(1)},\ldots,X^{(n)}) and let Z(i)=f(X(1),,X(i-1),Y(i),X(i+1),,X(n))Z^{(i)}=f(X^{(1)},\ldots,X^{(i-1)},Y^{(i)},X^{(i+1)},\ldots,X^{(n)}). Then

var(Z)12ni=1nE(Z-Z(i))2nτ2.\frac{{\rm var}\ (Z)}{\frac{1}{2n}\sum_{i=1}^{n}E(Z-Z^{(i)})^{2}}\leq n\tau_{2}.

To appreciate this, consider the “trivial” case where the underlying Markov chain is just an i.i.d. sequence with distribution π\pi on II. Then τ2=1\tau_{2}=1 and the 2n2n random variables (X(i),Y(i);1in)(X^{(i)},Y^{(i)};1\leq i\leq n) are i.i.d. with distribution π\pi. And this special case of Corollary 4.46 becomes (4.45) below, because for each ii the distribution of Z-Z(i)Z-Z^{(i)} is unchanged by substituting X0X_{0} for Y(i)Y^{(i)}.

Corollary 4.47

Let f:InRf:I^{n}\rightarrow R be arbitrary. Let (X0,X1,,Xn)(X_{0},X_{1},\ldots,X_{n}) be i.i.d. with distribution π\pi. Let Z(i)=f(X1,,Xi-1,X0,Xi+1,,Xn)Z^{(i)}=f(X_{1},\ldots,X_{i-1},X_{0},X_{i+1},\ldots,X_{n}) and let Z=f(X1,,Xn)Z=f(X_{1},\ldots,X_{n}). Then

var(Z)12i=1nE(Z-Z(i))2{\rm var}\ (Z)\leq\frac{1}{2}\sum_{i=1}^{n}E(Z-Z^{(i)})^{2} (4.45)

If ff is symmetric then

var(Z)i=0nE(Z(i)-Z¯)2{\rm var}\ (Z)\leq\sum_{i=0}^{n}E(Z^{(i)}-\bar{Z})^{2} (4.46)

where Z(0)=ZZ^{(0)}=Z and Z¯=1n+1i=0nZ(i)\bar{Z}=\frac{1}{n+1}\sum_{i=0}^{n}Z^{(i)}.

Note that in the symmetric case we may rewrite

Z(i)=f(X0,X1,,Xi-1,Xi+1,,Xn).Z^{(i)}=f(X_{0},X_{1},\ldots,X_{i-1},X_{i+1},\ldots,X_{n}).

This reveals (4.46) to be the celebrated Efron-Stein inequality in statistics, and in fact (4.45) is a known variant (see Notes).

Proof. As observed above, (4.45) is a special case of Corollary 4.46. So it is enough to show that for symmetric ff the right sides of (4.45) and (4.46) are equal. Note that by symmetry a=E(Z(i))2a=E(Z^{(i)})^{2} does not depend on ii, and b=EZ(i)Z(j)b=EZ^{(i)}Z^{(j)} does not depend on (i,j)(i,j), for jij\neq i. So the right side of (4.45) is

12n(a-2b+a)=n(a-b).\frac{1}{2}n(a-2b+a)=n(a-b).

But it is easy to calculate

EZ(i)Z¯=EZ¯2=an+1+nbn+1EZ^{(i)}\bar{Z}=E\bar{Z}^{2}=\frac{a}{n+1}+\frac{nb}{n+1}

and then the right side of (4.46) equals

(n+1)(a-2EZ(i)Z¯+EZ¯2)=na-nb.(n+1)(a-2EZ^{(i)}\bar{Z}+E\bar{Z}^{2})=na-nb.

4.6.4 Why these parameters?

The choice of parameters studied in this chapter is partly arbitrary, but our choice has been guided by two criteria, one philosophical and one technical. The philosophical criterion is

when formalizing a vague idea, choose a definition which has several equivalent formulations.

This is why we used the maximal mean hitting time parameter maxi,j(EiTj+EjTi)\max_{i,j}(E_{i}T_{j}+E_{j}T_{i}) instead of maxi,jEiTj\max_{i,j}E_{i}T_{j}, because the former permits the equivalent “resistance” interpretation.

Here is the technical criterion. Given a continuous-time chain XtX_{t} and a state ii, create a new chain Xt*X^{*}_{t} by splitting ii into two states i1,i2i_{1},i_{2} and setting

qi1j*=qi2j*=qij;  jiq^{*}_{i_{1}j}=q^{*}_{i_{2}j}=q_{ij};\ j\neq i
qji1*=qji2*=12qjijiq^{*}_{ji_{1}}=q^{*}_{ji_{2}}=\frac{1}{2}q_{ji}\ j\neq i
qi1i2*=qi2i1*=ρq^{*}_{i_{1}i_{2}}=q^{*}_{i_{2}i_{1}}=\rho

with q*=qq^{*}=q elsewhere. Then π*(i1)=π*(i2)=12π(i)\pi^{*}(i_{1})=\pi^{*}(i_{2})=\frac{1}{2}\pi(i), with π*=π\pi^{*}=\pi elsewhere. As ρ\rho\rightarrow\infty, we may regard the new chain as converging to the old chain in a certain sense. So our technical criterion for parameters τ\tau is that the value of τ\tau for X*X^{*} should converge, as ρ\rho\rightarrow\infty, to the value for XX. It is easy to check this holds for the τ\tau’s we have studied, but it does not hold for, say,

τmaxijπjEiTj\tau\equiv\max_{ij}\pi_{j}E_{i}T_{j}

which at first sight might seem a natural parameter.