It is as least plausible, that different
mechanisms will
characterize the disease of different sub-groups of patients. Practically, we want to allow for markers
that only characterize sub-groups of AD patients, and not necessarily
the whole
target population. For example, one
might expect for some genes, that gene expression will be different
among
normal and diseased subjects only for a subset of diseased subjects. Thus, though using the mean expression might
be useful (and work if at least a significant portion of the subjects
have
anomalous expression) it also can be insensitive to anomalous
expression in
small subgroups. However, we propose a
simple statistical approach using bootstrapping to control the various
error
rates of interest (e.g., family-wise error rate); an approach based on
the
previous work of Dudoit, et al. (2004).
The modest innovation comes down to the choice of the test
statistic –
in our case, quantiles. For example, we
wish to find those genes that have at least 25% of the subjects
“significantly”
differentially over-expressed – we can then use a test on the 0.75
quantile of
expression versus some (arbitrarily) chosen null value.
Given the more complicated nature of the
proteomics data, the solution is itself more complicated but based on
the same
basic procedure. Using this method of finding differentially expressed
genes
and proteins, we also examine relationships among these selected genes
and
proteins.
Division of Biostatistics (School of
Medicine), Department of Applied Science (College of Engineering), and
Institute for Data Analysis and Visualization
University of California, Davis
Slides (PDF)
Biologists now have the capacity to measure thousands of compounds
simultaneously from a single biological sample using gene expression
arrays, mass spectrometry, NMR spectroscopy or other methods. These
methods can be used to measure mRNA transcripts, proteins, short
peptides, lipids, and other biologically active compounds. In this
talk, I will describe an important statistical challenge in the use of
such data. Using raw data, logarithms, or ratios, the variability of
the measurements is strongly dependent on the level of expression,
causing a failure of the assumptions of most standard methods of
statistical analysis. We present a solution to this problem via a
specially tuned data transformation and show how it promotes the
effectiveness of simple and sophisticated analyses of the data.
Logic Regression in SNP association studies
Ingo Ruczinski
Department of
Biostatistics, Johns Hopkins University
Slides
(PDF)
Logic Regression was recently introduced as a novel classification and
regression method, particularly useful in SNP association studies. This
adaptive methodology is based on new predictors being generated as
Boolean combinations from binary covariates, and hence models with high
order interactions can be explored. We present the methodology,
show some case studies, and discuss statistical issues such as model
selection, missing data, variable importance, and study design.