Variance moderation for estimators of asymptotically linear parameters by empirical Bayes shrinkage


We focus on variable importance analysis in high-dimensional biological data sets with modest sample sizes, using semiparametric statistical models. We present a method that is robust in small samples, but does not rely on arbitrary parametric assumptions, in the context of studies of gene expression and environmental exposures. Such analyses are faced not only with issues of multiple testing, but also the problem of teasing out the associations of biological expression measures with exposure, among confounds such as age, race, and smoking. Specifically, we propose the use of targeted minimum loss-based estimation, along with a generalization of the moderated empirical Bayes statistics, to obtain estimates of variable importance measures. The result is a data-adaptive approach that can estimate individual associations in high-dimensional data, even in the presence of relatively small sample sizes.