candidateReduction {DSA} | R Documentation |
Ranks explanatory variables based on their univariate association with an independent variable.
candidateReduction(formula, data, id = 1:nrow(data), family = gaussian, weights=NULL, ...)
formula |
a symbolic description of the independent variable on which the
dimension reduction is based: formula is typically set to
Y ~ . or cbind(S,F) ~ . . Currently supported outcomes are
continuous or binomial (binary with 0s and 1s or a two-column
matrix of successes in the first column and failures in the
second).
|
data |
a non-optional data frame containing both the response variable(s) as well as the
candidate covariates. All variables in data will be
considered candidate variables except for the outcome variable(s).
|
id |
a vector identifying each independent experimental unit
in the data with a unique value, i.e. repeated measurements are
allowed in data . The length of id must correspond to the number of
observations (nrow(data) ). The default value for id is 1:nrow(data) which
indicates that all observations are independent.
|
family |
currently either binomial or gaussian (default). Used to determine whether
logistic (logit link only) or general linear models should be considered.
|
weights |
a vector of real numbers whose number of elements is
the number of observations. This vector contains
the weights to be applied to each observation in data . The
argument weights is ignored if the value for weights is
NULL (default).
|
... |
currently used internally to call candidateReduction within
a DSA call.
|
The candidateReduction
routine implements a dimension reduction
step for the purpose of estimator selection. The candidate/explanatory
variables are ranked based on their univariate association with an
independent variable. Univariate regressions of the independent
variable on each candidate variable are implemented and the p-values
coresponding with the test of the null hypothesis (no association) are
used to rank the candidate variables. Note that each model considered
is fit based on semiparametric estimators. With repeated measurements,
the working model for the matrix of variance-covariance is the identity.
This routine is called within the DSA
routine for dimension
reduction based on the association of each candidate variable with
the outcome of interest. This routine can also be called externally by
the user to further reduce the dimensionality of an estimator
selection problem based on, for example, the association of each
candidate variable with an exposure or treatment variable forced in
the based model supplied to the DSA
.
a matrix of two-column is returned. Each row contains the z-statistics and corresponding p-value describing the association of the independent variable with a candidate variable identified by the row name. The matrix is ranked by increasing p-values.
Romain Neugebauer
library(DSA) n <- 1000 W <- cbind(rnorm(n), rnorm(n) < 1, rnorm(n) < 2, rnorm(n, 2, 4), runif(n), rcauchy(n), rlogis(n) < .1, rnorm(n) < .1, rnorm(n, 120, 10), rnorm(n, 66, 2)) Y <- 10 + .5*W[,1] + .02*W[,1]^2 + .01*W[,1]*W[,2] + 2*W[,3] + .7*W[,4]^2 Y <- as.matrix(as.integer(Y - mean(Y))/sd(Y)) trials <- rpois(n, lambda = 20) successes <- sapply(1:n, function(i) { rbinom(1, size = trials[i], prob = pnorm(Y[i])) }) failures <- trials - successes colnames(W) <- paste("W", 1:ncol(W), sep = "") data <- as.data.frame(cbind(W, "successes" = successes, "failures" = failures)) res <- candidateReduction(cbind(successes, failures) ~ ., data,family=binomial) res res <- candidateReduction(W1 ~ ., as.data.frame(W),) res