Evaluating modified treatment policy effects under network interference

Nima Hejazi

Harvard Biostatistics

Salvador Balkus

Harvard Biostatistics

April 10, 2025

Scientific motivation: Environmental hazards

Randomization infeasible, unethical

  • Pollution
  • Wildfires
  • Extreme weather

Public policies are expensive…

  • New restrictions or possible economic incentives.
  • Need to evaluate usefulness before implementing!

Common challenges

  1. Relying on spatial geographies

    • Counties
    • ZIP codes
    • Census tracts
    • Satellite images
  1. Continuous exposures
  1. Many potential confounders (e.g., socioeconomic, demographic)

Example: What is the impact of zero-emissions vehicles on NO2 air pollution

Research questions

  1. How to identify policy-relevant causal estimands of continuous exposures in the spatial/network data setting?
    • Must be interpretable and policy-relevant
    • Must capture the effect on the population
  1. Can we build on advances in semi-parametric theory for estimation?
    • Flexibly estimate nuisances using machine learning
    • Develop asymptotically efficient estimators (or get close)

Data structure

Observed data: A tuple of \(n\)-vectors, \(O_1, \ldots, O_n\), where \[O = (L, A, Y) \sim \P_0 \in \M\]

  • \(L\): measured baseline covariates
  • \(A\): continuous exposure
  • \(Y\): outcome of interest

Network \(\bf{F}\): An adjacency matrix of each unit’s neighbors (known).

Interference complicates identification

Per Hudgens and Halloran (2008), interference occurs when the potential outcome of one unit is affected by the exposure of another unit: \[Y_i(a_i, a_j) \neq Y_i(a_i, a_j') \text{ if } a_j \neq a_j'\]

This violates consistency and SUTVA.

Network interference: Potential outcomes depend on neighboring units in the adjacency matrix \(\bf{F}\) (van der Laan 2014).

Repairing identification under interference

Under interference, consider the following structural equation: \[Y_i = f(s_A(\{A_j : j \in \mathbf{F}_i\}), s_L(\{L_j : j \in \mathbf{F}_i\}))\]

  • \(s\) : “summary” of neighbors’ exposures or exposure mapping
    • Symmetric (arguments permutation-invariant)
    • For short, denote \(s_A(\cdot)\) as \(s(A)\)
  • Taking \(s(A)\) as exposure restores SUTVA (Aronow and Samii 2017)

Emergent problem: Network consistency

Measuring the effect of a population-level intervention requires that the intervention be possible within the network.

  • Summary \(s(A)\) yields a new potential outcome \(Y_i(s(a))\)

  • …but it may be impossible to set \(s(A) = s(a)\) for all units

  • Example: Setting \(A = 1\) with

    \[s(A) = \sum_{j \in \bf{F}_i} A_j\]

Emergent problem: Network consistency

Measuring the effect of a population-level intervention requires that the intervention be possible within the network.

  • Summary \(s(A)\) yields a new potential outcome \(Y_i(s(a))\)

  • …but it may be impossible to set \(s(A) = s(a)\) for all units

  • Example: Setting \(A = 1\) with

    \[s(A) = \sum_{j \in \bf{F}_i} A_j\]

Emergent problem: Network consistency

Measuring the effect of a population-level intervention requires that the intervention be possible within the network.

  • Summary \(s(A)\) yields a new potential outcome \(Y_i(s(a))\)
  • …but it may be impossible to set \(s(A) = s(a)\) for all units
  • Enforcing \(s(A) = a\) is unnatural
  • Not a population intervention effect (Hubbard and van der Laan 2008)

How to avoid network consistency violations?

  • Estimands based on static interventions (i.e., setting \(A = a\)) are difficult to adapt…and, worse, are incompatible with the network
    • counterfactual mean \(\E[Y(a)]\) for \(A \in \{0, 1\}\)
    • causal dose-response curve \(\E[Y(a)]\) for \(A\) continuous
  • What if we alter the conditional density \(f_{A \mid L}(A, L)\), as in a stochastic intervention (Dı́az and van der Laan 2012; Ogburn et al. 2022)?
    • These are not particularly interpretable
    • …and are hard to reason about in the real world, that is, what policy engineers this change?

Modified treatment policies

A modified treatment policy (MTP) is a user-specified function \(d(A, L; \delta)\) that maps the observed exposure \(A\) to an post-intervention value \(A^+\).

  • Additive: \(d(A, L; \delta) = A + \delta\)
  • Multiplicative: \(d(A, L; \delta) = \delta \cdot A\)
  • Piecewise Additive:

\[d(A, L;\delta) = \begin{cases}A + \delta \cdot L & A \in \mathcal{A}(L) \\ A & \text{otherwise}\end{cases}\]

Identifying the causal effects of MTPs under interference

The induced MTP

  • Denote by \(A^{+}\) the post-intervention value from the MTP \(d(A, L; \delta): A \to A^+\) and by \(A_s = s(A)\) the summary
  • What happens if we apply the MTP and then summarize?

G A A B A⁺ A->B d C Aₛ A->C s   D Aₛ⁺ B->D s   C->D h

The induced MTP

Induced MTP: Function \(h\) satisfying \(h \circ s = s \circ d\)

The population intervention effect of an induced MTP is

\[\begin{align*} \theta_n =& \E\Big(\frac{1}{n}\sum_{i=1}^n Y(s(d(A_i, L_i; \delta))_i)\Big) \\ =& \E\Big(\frac{1}{n}\sum_{i=1}^n Y(h(A_{s,i}))\Big) \end{align*}\]

This is equivalent to identifying the MTP effect under intervention \(h\)

  • \(\theta_n\) is a data-adaptive sample mean, since we only observe one network.
  • The distribution of each \(Y_i\) depends on its number of neighbors.

Induced MTP example

If \(d(A)_i = A_i + \delta\), and \(s(A)_i = \sum_{j \in \mathbf{F}_i} A_j\), then

\[h(s(A))_i = s(A)_i + \delta\cdot |\mathbf{F}_i|\]

And the causal estimand \(\theta_n\) is

\[\E\Big(\frac{1}{n}\sum_{i=1}^n Y_i(s(A)_i + \delta\cdot|\mathbf{F}_i|)\Big)\]

Learning from data: Statistical estimand

\[\psi_n = \E\Big(\frac{1}{n}\sum_{i=1}^n \E(Y_i \mid A_{s,i} = h(A_{s,i}), L_{s,i})\Big)\]

Two sets of assumptions needed to compute from data:

  1. Identification: when does causal \(\theta_n\) equal a statistical functional \(\psi_n\)?

  2. Estimation: when is \(\psi_n\) efficiently estimable via semiparametric theory?

Assumptions for both identification and estimation

A0 (SCM). Data are generated from a structural causal model:

\[\begin{align*} L_i &= f_L(\varepsilon_{L_i}) \\ A_i &= f_A(L_{s,i}, \varepsilon_{A_i}) \\ Y_i &= f_Y(A_{s,i}, L_{s,i}, \varepsilon_{Y_i}) \ , \end{align*}\] with all error vectors \(\varepsilon_{X_i}\) independent if units do not share a neighbor.

A1 (Positivity). \((h(a_s), l_s) \in \text{supp}(A_s, L_s)\) if \((a_s, l_s) \in \text{supp}(A_s, L_s)\).

A2 (No unmeasured confounding). \(Y(a_s) \indep A_s \mid L_s\)

Assumptions for only estimation

A3 (Piecewise smooth invertibility).

\[ h(a_s, l_s) = \sum_{k=1}^K h_k(a_s, l_s) \cdot \I(a_s \in \mathcal{A}_k(l_s)) \] such that \(h^{-1}_k\) as a function of \(a_s\) is (piecewise) differentiable for all \(k\).

  • Semi-parametric efficient estimators are, most often, constructed based on the efficient influence function (EIF)
  • A3 is needed for the MTP effect \(\psi_n\) to be pathwise differentiable, i.e., admit an EIF (Díaz et al. 2021; Haneuse and Rotnitzky 2013)

When is efficiency theory available?

Theorem 1: If \(h\) is piecewise differentiable, then \(s\) must be piecewise linear for A3 to hold for any \(\mathbf{F}\).

Theorem 2: If A3 holds and \(s\) is piecewise linear, then \[ d(A_i, L_i; \delta) = \alpha(\delta) A_i + \beta_i(\delta, L_i) \I(A_i \in \mathcal{A}) \]

Consequences:

  • If \(s\) and \(\mathbf{F}\) are unconstrained, then only “linear” MTPs work
  • If \(s\), \(d\), or \(\mathbf{F}\) are constrained, then some more flexibility, e.g., if \(s(A)_i = \max\{A_j: j \in \mathbf{F}_i\}\), then any \(d\) satisfying A3 works

Estimating the causal effect of an induced MTP

Desiderata for estimators

  • semiparametric efficiency
    • achieve the best possible variance among the class of regular asymptotically linear (RAL) estimators
  • rate double-robustness
    • structure of second-order bias allows for flexible regression or machine learning algorithms (converge slower than \(o_{\P}(n^{-1/2})\), parametric rate) for nuisance estimation

Construct an efficient estimator based on the efficient influence function

Nonparametric estimation framework

The efficient influence function of \(\psi_n\), a special case of the EIF for the counterfactual mean of a stochastic intervention (Ogburn et al. 2022), is

\[\begin{align*} \bar{\phi}(O_i) =& \frac{1}{n}\sum_{i=1}^n w(A_{s,i}, L_{s,i}) (Y_i - m(A_{s,i}, L_{s,i}))\\ &+ \E(m(h(A_i^s, L_i^s; \delta), L_i^s) \mid L = l) - \psi_n \ , \end{align*}\] where \(w(A_{s,i}, L_{s,i})\) is the product of a ratio of conditional densities and \(h^{'(-1)}(A_{s,i})\) and \(m(A_{s,i}, L_{s,i})\) is the outcome regression

  • \(w(A_{s,i}, L_{s,i}) = f_{A \mid L}(h^{-1}(A_{s,i}), L_{s,i}) / f_{A \mid L}(A_{s,i}, L_{s,i}) h^{'(-1)}(A_{s,i})\)
  • \(m(A_{s,i}, L_{s,i}) = \E[Y_i \mid A_{s,i}, L_{s,i}]\)

Nonparametric estimation framework

Ogburn et al. (2022)’s CLT: If \(\hat{\psi}_n\) is constructed to solve \(\bar{\phi} \approx 0\) and \(K_{\text{max}}^2 / n \rightarrow 0\), then, under mild regularity conditions, \[\sqrt{C_n}(\hat{\psi}_n - \psi_n) \rightarrow \text{N}(0, \sigma^2) \ ,\] where \(K_{\text{max}}\) is the network’s maximum degree.

The estimator \(\hat{\psi}_n\) is asymptotically normal, but the appropriate scaling depends on a factor \(n/K_{\text{max}}^2 < C_n < n\).

Estimation implementation

  1. Fit estimators \(\hat{w}\) and \(\hat{m}\) of nuisance parameters \(w\) and \(m\) via cross-fitting (Bong et al. 2024) or super (ensemble machine) learning (Davies and van der Laan 2016; van der Laan et al. 2007).
  2. Construct one-step or “network-TMLE” estimators (Zivich et al. 2022) from an estimated EIF based on \(\hat{w}\) and \(\hat{m}\).
  3. Compute standard error and construct Wald-style confidence intervals based on empirical variance of the estimated EIF1.

Empirical results

Simulation study results

Performance results from numerical experiments

Semi-synthetic simulation study results

  • Simulate \(A\) and \(Y\) as linear models from 16 socioeconomic and land-use ZIP-code level covariates from the ZEV-NO2 California dataset.
  • How poor would our estimates be if all we did wrong was ignore the network interference structure?
Method Learner % Bias Coverage MSE
Network-TMLE Correct GLM 0.45 95.2% 0.013
Network-TMLE Super Learner - 6.58 95.0% 0.013
IID-TMLE Correct GLM -103.39 26.0% 0.049
Linear Regression -103.52 54.2% 0.057

Effect of electric vehicles on \(\text{NO}_2\)

  • GLM (ignores interference): ZEVs reduce NO2 by 0.015 ppb, totaling ~2.5% of average change in NO2
  • Induced MTP: ZEVs reduce NO₂ by 0.042 ppb, totaling ~7% of average change in NO2

Further work

Serious challenges remain:

  • Difficult to estimate the conditional density ratio nuisance \(w\)
  • New theoretical work was only needed to apply existing efficiency theory, not for identification
    • Can we avoid the piecewise smooth invertibility assumption?
  • If summaries are unknown, can we learn them, e.g., \(L_{\hat{s}}\)? At what cost?

Future work may benefit from moving away from standard efficiency theory in the network interference setting

  • Undersmoothing, automatic debiasing, “Riesz learning”
  • Alternative conditions for pathwise differentiability

Simulations powered by CausalTables.jl

Thank you! Questions?

Funded by NIEHS T32 ES007142
and NSF DGE 2140743

References

Aronow, P. M., and Samii, C. (2017), “Estimating average causal effects under general interference, with application to a social network experiment,” The Annals of Applied Statistics, Institute of Mathematical Statistics, 11. https://doi.org/10.1214/16-aoas1005.
Bickel, P. J., Klaassen, C. A. J., Ritov, Y., and Wellner, J. A. (1993), Efficient and adaptive estimation for semiparametric models, Springer.
Bong, H., Fogarty, C. B., Levina, E., and Zhu, J. (2024), “Heterogeneous treatment effects under network interference: A nonparametric approach based on node connectivity,” arXiv. https://doi.org/10.48550/ARXIV.2410.11797.
Davies, M. M., and van der Laan, M. J. (2016), “Optimal spatial prediction using ensemble machine learning,” The International Journal of Biostatistics, Walter de Gruyter GmbH, 12, 179–201. https://doi.org/10.1515/ijb-2014-0060.
Díaz, I., Williams, N., Hoffman, K. L., and Schenck, E. J. (2021), “Nonparametric causal effects based on longitudinal modified treatment policies,” Journal of the American Statistical Association, Informa UK Limited, 118, 846–857. https://doi.org/10.1080/01621459.2021.1955691.
Dı́az, I., and van der Laan, M. J. (2012), “Population intervention causal effects based on stochastic interventions,” Biometrics, Wiley Online Library, 68, 541–549. https://doi.org/10.1111/j.1541-0420.2011.01685.x.
Haneuse, S., and Rotnitzky, A. (2013), “Estimation of the effect of interventions that modify the received treatment,” Statistics in Medicine, Wiley, 32, 5260–5277. https://doi.org/10.1002/sim.5907.
Hubbard, A. E., and van der Laan, M. J. (2008), “Population intervention models in causal inference,” Biometrika, Oxford University Press (OUP), 95, 35–47. https://doi.org/10.1093/biomet/asm097.
Hudgens, M. G., and Halloran, M. E. (2008), “Toward causal inference with interference,” Journal of the American Statistical Association, Informa UK Limited, 103, 832–842. https://doi.org/10.1198/016214508000000292.
Ogburn, E. L., Sofrygin, O., Díaz, I., and Laan, M. J. van der (2022), “Causal inference for social network data,” Journal of the American Statistical Association, Informa UK Limited, 119, 597–611. https://doi.org/10.1080/01621459.2022.2131557.
Pfanzagl, J., and Wefelmeyer, W. (1985), “Contributions to a general asymptotic statistical theory,” Statistics & Risk Modeling, 3, 379–388.
van der Laan, M. J. (2014), “Causal inference for a population of causally connected units,” Journal of Causal Inference, Walter de Gruyter GmbH, 2, 13–74. https://doi.org/10.1515/jci-2013-0002.
van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007), “Super learner,” Statistical Applications in Genetics and Molecular Biology, De Gruyter, 6. https://doi.org/10.2202/1544-6115.1309.
van der Laan, M. J., and Rose, S. (2011), Targeted learning: Causal inference for observational and experimental data, Springer. https://doi.org/10.1007/978-1-4419-9782-1.
van der Laan, M. J., and Rubin, D. (2006), “Targeted maximum likelihood learning,” The International Journal of Biostatistics, De Gruyter, 2. https://doi.org/10.2202/1557-4679.1043.
Zivich, P. N., Hudgens, M. G., Brookhart, M. A., Moody, J., Weber, D. J., and Aiello, A. E. (2022), “Targeted maximum likelihood estimation of causal effects with interference: A simulation study,” Statistics in Medicine, Wiley, 41, 4554–4577. https://doi.org/10.1002/sim.9525.

Appendix A: Variance estimation

EIF was given in the form \(\frac{1}{n}\sum_{i=1}^n \phi_P(O_i)\), but must be centered at the means of units with the same number of neighbors \(N(|\mathbf{F}_i|)\):

\[\varphi_i = \phi_{\hat{P}_n(O_j)}(O_i) - \frac{1}{|N(|\mathbf{F}_i)|)|} \sum_{j \in N(|F_i|)} \phi_{\hat{P}_n(O_j)}\]

Then, \(\hat{\sigma}^2 = \frac{1}{n^2}\sum_{i,j} \mathbf{F}_{ij} \varphi_i\varphi_j \overset{P}{\rightarrow} \sigma^2\)

Appendix B: DGP for simulation study

Draw 200 iterations, estimate effect of MTP based on

\[\begin{align*} L_1 &\sim \text{Beta}(3,2); L_2 \sim \text{Poisson}(100);\\ L_3 &\sim \text{Gamma}(2,4); L_4 \sim \text{Bernoulli}(0.6) \\ A &\sim \text{Normal}(0.1 m_L, 1.0) \,\, \text{and} \,\, A_s = \Big[\sum_{j \in F_i} A_i\Big]_{i = 1}^n \\ Y &\sim \text{Normal}(0.2A + A_s + 0.2 m_L, 0.1) \end{align*}\]

\[\begin{align*} m_L = & (L_2 > 50) + (L_2 > 100) + (L_2 > 200) + (L_3 > 0.1) \\ & + (L_3 > 0.5) + (L_3 > 4)+ (L_3 > 10) + L_4 + \\ & L_4 \cdot \Big((L_1 > 0.4) + (L_1 > 0.6) + (L_1 + 0.8\Big) \end{align*}\]

Appendix C: Effect of ZEV on \(\text{NO}_2\)