# 4.6 Induced and product chains

Here we record the behavior of our parameters under two natural operations on chains.

## 4.6.1 Induced chains

Given a Markov chain $X_{t}$ on state space $I$ and a function $f:I\rightarrow\hat{I}$, the process $f(X_{t})$ is typically not a Markov chain. But we can invent a chain which substitutes. In discrete time (the continuous case is similar) define the induced chain $\hat{X}_{t}$ to have transition matrix

 $\hat{p}_{\hat{i},\hat{j}}=P_{\pi}(f(X_{1})=\hat{j}|f(X_{0})=\hat{i})=\frac{% \sum\sum_{i,j:f(i)=\hat{i},f(j)=\hat{j}}\pi_{i}p_{i,j}}{\sum_{i:f(i)=\hat{i}}% \pi_{i}}.$ (4.39)

 $P_{\hat{\pi}}(\hat{X}_{0}=\hat{i},\hat{X}_{1}=\hat{j})=P_{\pi}(f(X_{0})=\hat{i% },f(X_{1})=\hat{j}).$ (4.40)

The reader may check that (4.39) and (4.40) are equivalent. Under our standing assumption that $X_{t}$ is reversible, the induced chain is also reversible (though the construction works for general chains as well). In the electrical network interpretation, we are shorting together vertices with the same $f$-values. It seems intuitively plausible that this “shorting” can only decrease our parameters describing convergence and mean hitting time behavior.

###### Proposition 4.44 (The contraction principle)

The values of $\tau^{*},\tau_{0},\tau_{2}$

and $\tau_{c}$ in an induced chain are less than or equal to the corresponding values in the original chain.

Proof. A function $\hat{g}:\hat{I}\rightarrow R$ pulls back to a function $g\equiv\hat{g}(f(\cdot)):I\rightarrow R$. So the Dirichlet principle (Chapter 3 yyy) shows that mean commute times can only decrease when passing to an induced chain:

 $E_{f(i)}\hat{T}_{f(j)}+E_{f(j)}\hat{T}_{f(i)}\leq E_{i}T_{j}+E_{j}T_{i}.$

This establishes the assertion for $\tau^{*}$ and $\tau_{0}$, and the extremal characterization of relaxation time works similarly for $\tau_{2}$. The assertion about $\tau_{c}$ is immediate from the definition, since a partition of $\hat{I}$ pulls back to a partition of $I$. $\Box$

On the other hand, it is easy to see that shorting may increase a one-sided mean hitting time. For example, random walk on the unweighted graph on the left has $E_{a}T_{b}=1$, but when we short $\{a,d\}$ together to form vertex $\hat{a}$ in the graph on the right, $E_{\hat{a}}\hat{T}_{b}=2$.

Finally, the behavior of the $\tau_{1}$-family under shorting is unclear.

###### Open Problem 4.45

Is the value of $\tau_{1}^{(2)}$ in an induced chain bounded by $K$ times the value of $\tau_{1}^{(2)}$ in the original chain, for some absolute constant $K$? For $K=1$?

## 4.6.2 Product chains

Given Markov chains on state spaces $I^{(1)}$ and $I^{(2)}$, there is a natural concept of a “product chain” on state space $I^{(1)}\times I^{(2)}$. It is worth writing this concept out in detail for two reasons. First, to prevent confusion between several different possible definitions in discrete time. Second, because the behavior of relaxation times of product chains is relevant to simple examples and has a surprising application (section 4.6.3).

As usual, things are simplest in continuous time. Define the product chain to be

 ${\bf X}_{t}=(X^{(1)}_{t},X^{(2)}_{t})$

where the components $X^{(1)}_{t}$ and $X^{(2)}_{t}$ are independent versions of the given chains. So

 $P_{i_{1},i_{2}}({\bf X}_{t}=(j_{1},j_{2}))=P^{(1)}_{i_{1}}(X^{(1)}_{t}=j_{1})P% ^{(2)}_{i_{2}}(X^{(2)}_{t}=j_{2}).$ (4.41)

Using the interpretation of relaxation time as asymptotic rate of convergence of transition probabilities, (Chapter 3 yyy) it is immediate that ${\bf X}$ has relaxation time

 $\tau_{2}=\max(\tau_{2}^{(1)},\tau_{2}^{(2)}).$ (4.42)

In discrete time there are two different general notions of “product chain”. One could consider the chain $(X^{(1)}_{t},X^{(2)}_{t})$ whose coordinates are the independent chains. This is the chain with transition probabilities

 $(i_{1},i_{2})\rightarrow(j_{1},j_{2}):\ \mbox{ probability }P^{(1)}(i_{1},j_{1% })P^{(2)}(i_{2},j_{2})$

and has relaxation time

 $\tau_{2}=\max(\tau_{2}^{(1)},\tau_{2}^{(2)}).$

But it is more natural to define the product chain ${\bf X}_{t}$ to be the chain with transition probabilities

 $(i_{1},i_{2})\rightarrow(j_{1},i_{2}):\ \mbox{ probability }\frac{1}{2}P^{(1)}% (i_{1},j_{1})$
 $(i_{1},i_{2})\rightarrow(i_{1},j_{2}):\ \mbox{ probability }\frac{1}{2}P^{(2)}% (i_{2},j_{2}).$

This is the jump chain derived from the product of the continuized chains, and has relaxation time

 $\tau_{2}=2\max(\tau_{2}^{(1)},\tau_{2}^{(2)}).$ (4.43)

Again, this can be seen without need for calculation: the continuized chain is just the continuous-time product chain run at half speed.

This definition and (4.43) extend to $d$-fold products in the obvious way. Random walk on $Z^{d}$ is the product of $d$ copies of random walk on $Z^{1}$, and random walk on the $d$-cube (Chapter 5 yyy) is the product of $d$ copies of random walk on $\{0,1\}$.

Just to make things more confusing, given graphs $G^{(1)}$ and $G^{(2)}$ the Cartesian product graph is defined to have edges

 $(v_{1},w_{1})\leftrightarrow(v_{2},w_{1})\mbox{ for }v_{1}\leftrightarrow v_{2}$
 $(v_{1},w_{1})\leftrightarrow(v_{1},w_{2})\mbox{ for }w_{1}\leftrightarrow w_{2}.$

If both $G^{(1)}$ and $G^{(2)}$ are $r$-regular then random walk on the product graph is the product of the random walks on the individual graphs. But in general, discrete-time random walk on the product graph is the jump chain of the product of the fluid model (Chapter 3 yyy) continuous-time random walks. So if the graphs are $r_{1}$- and $r_{2}$-regular then the discrete-time random walk on the product graph has the product distribution as its stationary distribution and has relaxation time

 $\tau_{2}=(r_{1}+r_{2})\max(\tau^{(1)}_{2}/r_{1},\tau^{(2)}_{2}/r_{2}).$

But for non-regular graphs, neither assertion is true.

Let us briefly discuss the behavior of some other parameters under products. For the continuous-time product (4.41), the total variation distance $\bar{d}$ of section 4.3 satisfies

 $\bar{d}(t)=1-(1-\bar{d}^{(1)}(t))(1-\bar{d}^{(2)}(t))$

and we deduce the crude bound

 $\tau_{1}\leq 2\max(\tau_{1}^{(1)},\tau_{1}^{(2)})$

where superscripts refer to the graphs $G^{(1)},G^{(2)}$ and not to the parameters in section 4.3.1. For the discrete-time chain, there is an extra factor of 2 from “slowing down” (cf. (4.42,4.43)), leading to

 $\tau_{1}\leq 4\max(\tau_{1}^{(1)},\tau_{1}^{(2)}).$

Here our conventions are a bit confusing: this inequality refers to the discrete-time product chain, but as in section 4.3 we define $\tau_{1}$ via the continuized chain – we leave the reader to figure out the analogous result for $\tau_{1}^{{\mbox{disc}}}$ discussed in section 4.3.3.

To state a result for $\tau_{0}$, consider the continuous-time product $(X^{(1)}_{t},X^{(2)}_{t})$ of independent copies of the same $n$-state chain. If the underlying chain has eigenvalues $(\lambda_{i};1\leq i\leq n)$ then the product chain has eigenvalues $(\lambda_{i}+\lambda_{j};1\leq i,j\leq n)$ and so by the eigentime identity

 $\displaystyle\tau_{0}^{\mbox{product}}$ $\displaystyle=$ $\displaystyle\sum_{i,j\geq 1;(i,j)\neq(1,1)}\frac{1}{\lambda_{i}+\lambda_{j}}$ $\displaystyle=$ $\displaystyle 2\tau_{0}+\sum_{i,j\geq 2}\frac{1}{\lambda_{i}+\lambda_{j}}$ $\displaystyle=$ $\displaystyle 2\tau_{0}+2\sum_{i=2}^{n}\sum_{j=i}^{n}\frac{1}{\lambda_{i}+% \lambda_{j}}$ $\displaystyle\leq$ $\displaystyle 2\tau_{0}+\sum_{i=2}^{n}(n-i+1)\frac{2}{\lambda_{i}}$ $\displaystyle\leq$ $\displaystyle 2\tau_{0}+(n-1)2\tau_{0}=2n\tau_{0}.$

Thus in discrete time

 $\tau_{0}^{\mbox{product}}\leq 4n\tau_{0}.$ (4.44)

## 4.6.3 Efron-Stein inequalities

The results above concerning relaxation times of product chains are essentially obvious using the interpretation of relaxation time as asymptotic rate of convergence of transition probabilities, but they are much less obvious using the extremal interpretation. Indeed, consider the $n$-fold product of a single chain $X$ with itself. Write $(X_{0},X_{1})$ for the distribution at times $0$ and $1$ of $X$, and $\tau_{2}$ for the relaxation time of $X$. Combining (4.43) with the extremal characterization (4.22) of the relaxation time for the product chain, a brief calculation gives the following result.

###### Corollary 4.46

Let $f:I^{n}\rightarrow R$ be arbitrary. Let $(X^{(i)},Y^{(i)})$, $i=1,\ldots,n$ be independent copies of $(X_{0},X_{1})$. Let $Z=f(X^{(1)},\ldots,X^{(n)})$ and let $Z^{(i)}=f(X^{(1)},\ldots,X^{(i-1)},Y^{(i)},X^{(i+1)},\ldots,X^{(n)})$. Then

 $\frac{{\rm var}\ (Z)}{\frac{1}{2n}\sum_{i=1}^{n}E(Z-Z^{(i)})^{2}}\leq n\tau_{2}.$

To appreciate this, consider the “trivial” case where the underlying Markov chain is just an i.i.d. sequence with distribution $\pi$ on $I$. Then $\tau_{2}=1$ and the $2n$ random variables $(X^{(i)},Y^{(i)};1\leq i\leq n)$ are i.i.d. with distribution $\pi$. And this special case of Corollary 4.46 becomes (4.45) below, because for each $i$ the distribution of $Z-Z^{(i)}$ is unchanged by substituting $X_{0}$ for $Y^{(i)}$.

###### Corollary 4.47

Let $f:I^{n}\rightarrow R$ be arbitrary. Let $(X_{0},X_{1},\ldots,X_{n})$ be i.i.d. with distribution $\pi$. Let $Z^{(i)}=f(X_{1},\ldots,X_{i-1},X_{0},X_{i+1},\ldots,X_{n})$ and let $Z=f(X_{1},\ldots,X_{n})$. Then

 ${\rm var}\ (Z)\leq\frac{1}{2}\sum_{i=1}^{n}E(Z-Z^{(i)})^{2}$ (4.45)

If $f$ is symmetric then

 ${\rm var}\ (Z)\leq\sum_{i=0}^{n}E(Z^{(i)}-\bar{Z})^{2}$ (4.46)

where $Z^{(0)}=Z$ and $\bar{Z}=\frac{1}{n+1}\sum_{i=0}^{n}Z^{(i)}$.

Note that in the symmetric case we may rewrite

 $Z^{(i)}=f(X_{0},X_{1},\ldots,X_{i-1},X_{i+1},\ldots,X_{n}).$

This reveals (4.46) to be the celebrated Efron-Stein inequality in statistics, and in fact (4.45) is a known variant (see Notes).

Proof. As observed above, (4.45) is a special case of Corollary 4.46. So it is enough to show that for symmetric $f$ the right sides of (4.45) and (4.46) are equal. Note that by symmetry $a=E(Z^{(i)})^{2}$ does not depend on $i$, and $b=EZ^{(i)}Z^{(j)}$ does not depend on $(i,j)$, for $j\neq i$. So the right side of (4.45) is

 $\frac{1}{2}n(a-2b+a)=n(a-b).$

But it is easy to calculate

 $EZ^{(i)}\bar{Z}=E\bar{Z}^{2}=\frac{a}{n+1}+\frac{nb}{n+1}$

and then the right side of (4.46) equals

 $(n+1)(a-2EZ^{(i)}\bar{Z}+E\bar{Z}^{2})=na-nb.$

## 4.6.4 Why these parameters?

The choice of parameters studied in this chapter is partly arbitrary, but our choice has been guided by two criteria, one philosophical and one technical. The philosophical criterion is

when formalizing a vague idea, choose a definition which has several equivalent formulations.

This is why we used the maximal mean hitting time parameter $\max_{i,j}(E_{i}T_{j}+E_{j}T_{i})$ instead of $\max_{i,j}E_{i}T_{j}$, because the former permits the equivalent “resistance” interpretation.

Here is the technical criterion. Given a continuous-time chain $X_{t}$ and a state $i$, create a new chain $X^{*}_{t}$ by splitting $i$ into two states $i_{1},i_{2}$ and setting

 $q^{*}_{i_{1}j}=q^{*}_{i_{2}j}=q_{ij};\ j\neq i$
 $q^{*}_{ji_{1}}=q^{*}_{ji_{2}}=\frac{1}{2}q_{ji}\ j\neq i$
 $q^{*}_{i_{1}i_{2}}=q^{*}_{i_{2}i_{1}}=\rho$

with $q^{*}=q$ elsewhere. Then $\pi^{*}(i_{1})=\pi^{*}(i_{2})=\frac{1}{2}\pi(i)$, with $\pi^{*}=\pi$ elsewhere. As $\rho\rightarrow\infty$, we may regard the new chain as converging to the old chain in a certain sense. So our technical criterion for parameters $\tau$ is that the value of $\tau$ for $X^{*}$ should converge, as $\rho\rightarrow\infty$, to the value for $X$. It is easy to check this holds for the $\tau$’s we have studied, but it does not hold for, say,

 $\tau\equiv\max_{ij}\pi_{j}E_{i}T_{j}$

which at first sight might seem a natural parameter.