Evaluating the causal effects of modified treatment policies under network interference

Nima Hejazi

Harvard Biostatistics

Salvador Balkus

Harvard Biostatistics

October 22, 2025

Scientific motivation: Environmental health

Environmental health is a major concern area. How to quantify health effects of…

  • air pollution?
  • wildfires?
  • extreme heat?

Common issue: continuous exposures

Air pollution from factory smoke

Canadian wildfire blankets NYC skyline

Anatomy of the data: Standard-data set-up

Observed data: A tuple of \(n\)-vectors, \(O_1, \ldots, O_n\), sampled iid, where \[\Ob = (\Lb, \Ab, \Yb) \sim \Pf \in \Pm\]

  • \(\Lb\): measured baseline covariates
  • \(\Ab\): continuous exposure
  • \(\Yb\): outcome of interest

Question: How much would \(Y\) have changed if we had intervened upon \(A\)?

Causal inference with a continuous exposure

  • Let \(Y(a)\) denote potential outcome, value of \(Y\) had \(A = a\) been set.

  • Typically, interest lies in counterfactual mean \(\E[Y(a)]\), the average value of \(Y\) had \(A\) been set according to \(A = a\)

  • What goes wrong when \(A\) is continuous…

    • cannot observe all possible \(A \in \mathcal{A}\); thus, challenging to identify and estimate dose-response non-parametrically
    • “Setting” \(A = a\), a static intervention, often is impractical or does not “make sense”
  • Solution: Consider modifying observed exposure…

Modified treatment policies

A user-specified function \(d(A, L; \delta)\) that maps the observed exposure \(A\) to a post-intervention value \(A^d\) (Haneuse and Rotnitzky 2013). Examples:

  • Additive: \(d(A, L; \delta) = A + \delta\)
  • Multiplicative: \(d(A, L; \delta) = A \cdot \delta\)
  • Piecewise additive: \[ d(A, L; \delta) = \begin{cases} A + \delta \cdot L & A \in \mathcal{A}(L) \\ A & \text{otherwise} \end{cases} \]

Causal effect of a modified treatment policy

The counterfactual mean is \[\E_{\Pf}\Big[Y(d(A, L; \delta))\Big] = \E_{\Pf}\Big[Y(A^d)\Big]\]

and the population intervention effect (PIE) is \(\E[Y(A^d)] - \E[Y]\)

  • “Average change in \(Y\) caused by modifying each \(A_i\) via \(d\)
  • Causal, non-parametric analog of a linear regression coefficient

Dependence and spatial data

Motivating example: Electric vehicles

Question: What is the impact of zero-emissions vehicles (ZEV) on NO2 air pollution in California?

  • Continuous exposure: proportion of ZEVs in a county
  • No known intervention can “set units’ proportion of ZEVs to \(A = a\)
  • But can consider MTP effects: \(\E[Y(A + 1)]\) or \(\E[Y(1.01 \cdot A)]\)

Desiderata for estimands and estimators

How to identify and estimate causal effects of MTPs in spatial data?

Must be…

  • Policy-relevant (intervention defines a population estimand)
  • Flexibly estimable (no reliance on restrictive models)
  • Efficient (attain the lowest possible variance)

Interference

Hudgens and Halloran (2008): Interference occurs when potential outcome of unit \(i\) depends on exposures of other units

\[Y_i(a_i, a_j) \neq Y_i(a_i, a_j') \text{ if } a_j \neq a_j'\]

  • Common in spatial data (dependence)
  • Causal identification fails: SUTVA violated
  • Correlated data: Challenges for estimation

Interference

Hudgens and Halloran (2008): Interference occurs when potential outcome of unit \(i\) depends on exposures of other units

\[Y_i(a_i, a_j) \neq Y_i(a_i, a_j') \text{ if } a_j \neq a_j'\]

Network interference: Potential outcomes only depend on neighbors in a known adjacency matrix \(\Fb\) (van der Laan 2014).

Effects of induced MTPs

Anatomy of the data: Dependent-data set-up

  1. Observed data: A tuple of \(n\)-vectors, \(O_1, \ldots, O_n\), where \[\Ob = (\Lb, \Ab, \Yb)\]

  2. Network \(\bf{F}\): An adjacency matrix of each unit’s neighbors (known).

Repairing identification under interference

Under interference, consider the following structural equation: \[Y_i = f\Big(s_A(A_j : j \in \Fb_i), s_L(L_j : j \in \Fb_i)\Big)\]

  • \(s\) : “summary” of neighbors’ exposures or exposure mapping
  • As shorthand, denote vector of \(s_A(A_j : j \in \Fb_i)\) as \(s(A)\)
  • Example: \(s(A)_i = \sum_{j \in \F_i} A_j\)

Treating \(s(A)\) as the exposure instead of \(A\) restores SUTVA (Aronow and Samii 2017); just use \(Y(s(a))\) instead of \(Y(a)\)

The induced MTP

But what happens if we apply the MTP and then summarize?

\[ A \overset{d}{\longrightarrow} A^d \overset{s}{\longrightarrow} A^{s \circ d} \]

  • We term the function \(s \circ d\) the induced MTP.

  • Population intervention effect (PIE) of an induced MTP: \[ \Psi_n(\Pf) = \E_{\Pf} \Big[\frac{1}{n}\sum_{i=1}^n Y_i(s(d(\Ab, \Lb; \delta))_i)\Big] - \E_{\Pf}\Big[Y\Big] \]

  • Data-adaptive parameter, since we only observe a single network

Identification

Network analogs of standard assumptions (weaker)

  • A0 (SCM). Data are generated from a structural causal model: \[ L_i = f_L(\varepsilon_{L_i}); A_i = f_A(L_i^s, \varepsilon_{A_i}); Y_i = f_Y(A_i^s, L_i^s, \varepsilon_{Y_i}) \ , \] with error vectors independent of each other, with identically distributed entries, and with \(\varepsilon_{i} \indep \varepsilon_{j}\) provided \(i, j\) are not neighbors in \(\Fb\)

  • A1 (Summary positivity). If \(s(a), s(l) \in \text{supp}(A^s, L^s)\) then \(s(a^d), s(l) \in \text{supp}(A^s, L^s)\)

  • A2 (No unmeasured confounding). \(Y(A^s) \indep A^s \mid L\)

Necessary technical conditions on \(d\) and \(s\)

  • A3 (Piecewise smooth invertibility). The MTP \(d\) has a differentiable inverse on a countable partition of \(\text{supp}(A)\).

  • A4 (Summary coarea). \(s\) has Jacobian \(Js\) satisfying \[ \sqrt{\det J s(a) J s(a)^\top} > 0 \] (adapted from measure-theoretic calculus to use \(A^s\) instead of \(\Ab\))

Statistical estimand for induced MTP effects

Statistical estimand factorizes in terms of \(A^s\): \[ \psi_n = \frac{1}{n}\sum_{i=1}^n \E_{\Pf}[\textcolor{teal}{m(A_i^s, L_i^s)} \cdot \textcolor{crimson}{r(A_i^s, A_i^{s\circ d}, L_i^s)} \cdot \textcolor{maroon}{w(\Ab, \Lb)_i}] \] with nuisance parameters \(m\) and \(r\), and deterministic weights \(w\): \[\begin{align*} & \textcolor{gray}{m(a^s, l^s) = \E_{\Pf}[Y \mid A_i^s = a^s, L_i^s = l^s]}\\ & \textcolor{crimson}{r(a^s, a^{s \circ d^{-1}}, l^s) = \frac{p(a^{s \circ d^{-1}} \mid l^s)} {p(a^s \mid l^s)}}\\ & \textcolor{maroon}{w(\ab, \lb) = \sqrt{\frac{\det J (s \circ d^{-1})(\ab)J (s \circ d^{-1})(\ab)^\top}{\det J s(\ab)J s(\ab)^\top}}} \end{align*}\]

Advantages of MTP effects in the network setting

  • Population-level estimand, so the intervention is always compatible with the network
  • Ameliorates positivity violations that would occur if enforcing static interventions directly on summaries \(s(A)\)
  • Components of estimand depend on the data only through \(A^s\) and \(L^s\)

Estimation

Desiderata for estimators

  • semi-parametric efficiency
    • Best possible variance among the class of regular asymptotically linear (RAL) estimators
  • rate double-robustness
    • structure allows flexible regression or machine learning (converge slower than \(o_{\Pf}(n^{-1/2})\), parametric rate) for nuisance estimation
  • asymptotic linearity \[ \hat{\psi} - \psi_0 = \frac{1}{n} \sum_{i=1}^n \phi(\Pf)(O_i) + o_{\Pf}(n^{-1/2}) \]
    • implies that \(\hat{\psi}\) is consistent with \(\sqrt{n}(\hat{\psi} - \psi_0) \dto N(0, \sigma_0^2)\)
    • optimal influence function derived from semi-parametric theory

Efficient estimation

Construct an asymptotically linear, efficient estimator based on the efficient influence function \(\phi(\Pf)\)

\[\frac{1}{n}\sum_{i=1}^n \phi(\Pf_{\hat{\eta}})(O_i) \ ,\]

where \(\hat{\eta}\) is a set of nuisance estimators whose product converges at \(o_{\Pf}(n^{-1/2})\) (i.e., only need \(o_{\Pf}(n^{-1/4})\), typical in statistical learning)

Efficient estimation

The efficient influence function of \(\psi_n\), a special case of the EIF for the counterfactual mean of a stochastic intervention (Ogburn et al. 2022), is

\[\begin{align*} \bar{\phi}(\Pf)(O_i) =& \frac{1}{n}\sum_{i=1}^n w(\Ab, \Lb)_i \cdot r(A_{i}^s, L_{s,i}) (Y_i - m(A_{i}^s, L_{i}^s))\\ &+ \E(m(A_i^{s\circ d}, L_i^s; \delta), L_i^s) \mid \Lb = \lb) - \psi_n \ , \end{align*}\]

Efficient estimation

Ogburn et al. (2022)’s CLT: If \(\hat{\psi}_n\) is constructed to approximately solve \(\bar{\phi} \approx 0\) and \(K_{\text{max}}^2 / n \rightarrow 0\), then, under mild regularity conditions, \[\sqrt{C_n}(\hat{\psi}_n - \psi_n) \rightarrow \text{N}(0, \sigma^2) \ ,\] where \(K_{\text{max}}\) is the network’s maximum degree.

The estimator \(\hat{\psi}_n\) is asymptotically normal, but the rate depends on a factor \(n/K_{\text{max}}^2 < C_n < n\) (“automatically” contained within \(\hat{\sigma}^2\))

Estimation framework

  1. Fit estimators \(\hat{m}\) and \(\hat{r}\) of nuisance parameters \(m\) and \(r\) via cross-fitting1 and super (ensemble machine) learning (Davies and van der Laan 2016; van der Laan et al. 2007).
  2. Construct one-step or “network-TMLE” estimators (Zivich et al. 2022) from an estimated EIF based on \(\hat{m}\) and \(w\cdot\hat{r}\) (weighted density ratio)
  3. Compute standard error and construct Wald-style confidence intervals based on empirical variance of the estimated EIF2.

Empirical results

Asymptotic properties of Network-TMLE

Performance results from numerical experiments

Versus competing methods on semisynthetic data

  • Simulate \(A\) and \(Y\) as linear models from 16 socioeconomic and land-use ZIP-code level covariates from the ZEV-NO2 California dataset.
  • How poor would estimates be if only mistake were ignoring interference?
Method Learner % Bias Variance Coverage
Network-TMLE Correct GLM 0.11 1.56 96.2%
Network-TMLE Super Learner 1.03 1.56 94.0%
IID-TMLE Correct GLM 20.42 2.11 54.8%
Linear Regression 20.62 2.12 55.0%

Data analysis

Effect of electric vehicles on \(\text{NO}_2\) in California

  • GLM (ignores interference): ZEVs reduce NO2 by 0.015 ppb, totaling ~2.5% of average change in NO2
  • Induced MTP: ZEVs reduce NO₂ by 0.042 ppb, totaling ~7% of average change in NO2

Future work

Further work

Challenges remain:

  • Difficult to estimate conditional density ratio nuisance \(r\)

    • May be amenable to undersmoothing or “Riesz learning”
  • If summaries \(s\) unknown, can we learn them automatically?

  • Same theory of the Longitudinal MTP (Díaz et al. 2021) should extend when reduced; useful for time-varying setting

Simulations powered by CausalTables.jl

Thank you! Questions?

Funded by NIEHS T32 ES007142 and NSF DGE 2140743

References

Aronow, P. M., and Samii, C. (2017), “Estimating average causal effects under general interference, with application to a social network experiment,” The Annals of Applied Statistics, Institute of Mathematical Statistics, 11. https://doi.org/10.1214/16-aoas1005.
Bickel, P. J., Klaassen, C. A. J., Ritov, Y., and Wellner, J. A. (1993), Efficient and adaptive estimation for semiparametric models, Springer.
Davies, M. M., and van der Laan, M. J. (2016), “Optimal spatial prediction using ensemble machine learning,” The International Journal of Biostatistics, Walter de Gruyter GmbH, 12, 179–201. https://doi.org/10.1515/ijb-2014-0060.
Díaz, I., Williams, N., Hoffman, K. L., and Schenck, E. J. (2021), “Nonparametric causal effects based on longitudinal modified treatment policies,” Journal of the American Statistical Association, Informa UK Limited, 118, 846–857. https://doi.org/10.1080/01621459.2021.1955691.
Haneuse, S., and Rotnitzky, A. (2013), “Estimation of the effect of interventions that modify the received treatment,” Statistics in Medicine, Wiley, 32, 5260–5277. https://doi.org/10.1002/sim.5907.
Hudgens, M. G., and Halloran, M. E. (2008), “Toward causal inference with interference,” Journal of the American Statistical Association, Informa UK Limited, 103, 832–842. https://doi.org/10.1198/016214508000000292.
Ogburn, E. L., Sofrygin, O., Díaz, I., and Laan, M. J. van der (2022), “Causal inference for social network data,” Journal of the American Statistical Association, Informa UK Limited, 119, 597–611. https://doi.org/10.1080/01621459.2022.2131557.
Pfanzagl, J., and Wefelmeyer, W. (1985), “Contributions to a general asymptotic statistical theory,” Statistics & Risk Modeling, 3, 379–388.
van der Laan, M. J. (2014), “Causal inference for a population of causally connected units,” Journal of Causal Inference, Walter de Gruyter GmbH, 2, 13–74. https://doi.org/10.1515/jci-2013-0002.
van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007), “Super learner,” Statistical Applications in Genetics and Molecular Biology, De Gruyter, 6. https://doi.org/10.2202/1544-6115.1309.
van der Laan, M. J., and Rose, S. (2011), Targeted learning: Causal inference for observational and experimental data, Springer. https://doi.org/10.1007/978-1-4419-9782-1.
van der Laan, M. J., and Rubin, D. (2006), “Targeted maximum likelihood learning,” The International Journal of Biostatistics, De Gruyter, 2. https://doi.org/10.2202/1557-4679.1043.
Zivich, P. N., Hudgens, M. G., Brookhart, M. A., Moody, J., Weber, D. J., and Aiello, A. E. (2022), “Targeted maximum likelihood estimation of causal effects with interference: A simulation study,” Statistics in Medicine, Wiley, 41, 4554–4577. https://doi.org/10.1002/sim.9525.

Appendix

Variance estimation

EIF was given in the form \(\frac{1}{n}\sum_{i=1}^n \phi_P(O_i)\), but must be centered at the means of units with the same number of neighbors \(N(|\mathbf{F}_i|)\):

\[\varphi_i = \phi_{\hat{P}_n(O_j)}(O_i) - \frac{1}{|N(|\mathbf{F}_i)|)|} \sum_{j \in N(|F_i|)} \phi_{\hat{P}_n(O_j)}\]

Then, \(\hat{\sigma}^2 = \frac{1}{n^2}\sum_{i,j} \mathbf{F}_{ij} \varphi_i\varphi_j \overset{P}{\rightarrow} \sigma^2\)

Cross-fitting in dependent data

Main idea: cross-fitting eliminates the “empirical process term”

\[ \Pf_n \phi_{\hat{\eta}} = \underbrace{\Pf_n \phi_{\eta_0}}_{\text{CLT}} + \underbrace{\Pf(\phi_{\hat{\eta}} - \phi_{\eta_0})}_{\text{Nuisance product}} + \underbrace{(\Pf_n - \Pf)(\phi_{\hat{\eta}} - \phi_{\eta_0})}_{\text{Empirical process}} \]

  • Empirical mean unbiased under cross-fitting, even in correlated units
  • \(\text{Var}(\phi_{\hat{\eta}} - \phi_{\eta_0}) = o(1/C_n)\) by Bienayme’s identity
    • Network assumes \(K_{\max}^2/n \leq C_n\)
    • There are at most \(K_{\max}^2\) correlated units
  • Therefore, \((\Pf_n - \Pf)(\phi_{\hat{\eta}} - \phi_{\eta_0}) = o_{\Pf}(1/C_n)\)

Simulation study: Data-generating process

Draw 400 iterations, estimate effect of MTP based on

Effect of ZEV on \(\text{NO}_2\)