\documentclass[12pt]{article}
\usepackage{amsmath,pstricks}
%%% \usepackage[authoryear,round]{natbib}
\usepackage{hyperref}

\textwidth=6.2in
\textheight=8.5in
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

%%%%%%%%%%%%%%%%%%%%%%%%%
% For colors
 
\usepackage{color}
\definecolor{red}{rgb}{1, 0, 0}
\definecolor{green}{rgb}{0, 1, 0}
\definecolor{blue}{rgb}{0, 0, 1}
\definecolor{myblue}{rgb}{0.25, 0, 0.75}
\definecolor{myred}{rgb}{0.75, 0, 0}
\definecolor{mygreen}{rgb}{0, 0.5, 0}
\definecolor{gray}{rgb}{0.5, 0.5, 0.5}
\definecolor{purple}{rgb}{0.65, 0, 0.75}
\definecolor{orange}{rgb}{1, 0.65, 0}
\definecolor{Dorange}{rgb}{0.75, 0.45, 0}

%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}
 
\newcommand{\myem}[1]{\textcolor{myblue}{#1}}
\newcommand{\myurl}[1]{\textcolor{gray}{\url{#1}}}
\newcommand{\RPack}[1]{\textcolor{Dorange}{\textit{#1}}}
\newcommand{\RClass}[1]{\textcolor{gray}{\textit{#1}}}
\newcommand{\RObj}[1]{\textcolor{gray}{\texttt{#1}}}
\newcommand{\RFunc}[1]{\textcolor{gray}{\texttt{#1}}}

\def\RR{\mbox{\it I\hskip -0.177em R}}
\def\NN{\mbox{\it I\hskip -0.177em N}}

\newtheorem{theorem}{Theorem}
\newtheorem{procedure}{Procedure}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\bibliographystyle{plain}

\title{MBI Lab4: \\
Identification of Differentially Expressed Genes: \\
Bioconductor R Packages \RPack{genefilter} and \RPack{multtest}\\
\textcolor{red}{DRAFT: Please do not distribute}}

\author{Sandrine Dudoit, Katherine S. Pollard, and Mark J. van der Laan}

\usepackage{Sweave}
\begin{document}

\maketitle

\begin{center}
\begin{tabular}{ll}
\hline
&\\
Short course & Statistical Methods and Software for the Analysis of Microarray Experiments\\
Location & Mathematical Biosciences Institute\\
& Ohio State University, Columbus, OH\\
Instructors & Sandrine Dudoit, Division of Biostatistics, UC Berkeley\\
& Nicholas P. Jewell, Division of Biostatistics, UC Berkeley\\
Date & September 20--24, 2004\\
Website & \myurl{www.stat.berkeley.edu/~sandrine/Docs/Talks/MBI04/mbi.html} \\
&\\
\hline
\end{tabular}
\end{center}

\textcolor{red}{{\bf N.B.} 
This lab is using a development version of the Bioconductor R package \RPack{multtest}. A more complete and stable version of the package will be included in Bioconductor release 1.5, on October 24th, 2004. 
Please do not distribute the package (code and documentation) or this lab. 
This is work in progress.
We recommend that you replace the development version by the released version 1.5.1 as soon as it becomes available on the Bioconductor Project website (\myurl{www.bioconductor.org}).
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\section{Getting started}

This lab concerns software for the identification of differentially expressed genes in microarray experiments, i.e., genes whose expression measures are associated with possibly censored responses or covariates interest. 
The covariates could be either polychotomous (e.g., sex, treatment, cell type) or continuous (e.g., age, dose). Outcomes of interest could include clinical variables, such as response to treatment, censored survival time, and tumor grade.

We focus on two Bioconductor R packages: \RPack{genefilter} and \RPack{multtest}; although we note that many more packages are available on the CRAN and Bioconductor Project websites, including the \RPack{limma} package.

The functions we will consider are very flexible and general; we will only have the time to consider a limited number of options and applications.
For more detail, please take the time to consult the documentation in the help files, vignettes, and manuals. 
Recall that package vignettes can be listed and accessed in PDF by commands such as \texttt{openVignette("Biobase")}. The \RObj{vExplorer} function, from the \RPack{tkWidgets} package, may be used to step through the code chunks interactively. 
New packages and functions are added regularly on CRAN and Bioconductor; please visit these websites for the latest developments. 

To load the packages,
\begin{Schunk}
\begin{Sinput}
> library(Biobase)
> library(annotate)
> library(tkWidgets)
> library(genefilter)
> library(multtest)
\end{Sinput}
\end{Schunk}


A number of sample datasets are available on the Bioconductor website (under "Experimental Data" link). 
We will use the Acute Lymphoblastic Leukemia (ALL) dataset of Chiaretti et al. (2004), available in the R package \RPack{ALL}, to identify genes whose expression measures are associated with (possibly censored) biological and clinical outcomes such as:  cytogenetic test status (normal vs. abnormal), tumor molecular subtype (BCR/ABL, NEG, ALL1/AF4, E2A/PBX1, p15/p16,  NUP-98), and patient survival. 
The main object in  \RPack{ALL} is \RObj{ALL}, an instance of the class \RClass{exprSet}, which contains the expression measures as well as various responses and gene annotation information.  
The \RObj{ALL} dataset includes 21 phenotypes and 12,625 Affymetrix gene expression measures (chip series HGU95Av2), for each of 128 ALL patients. For greater detail, please consult the \RPack{ALL} package documentation.
The genes-by-subjects matrix of expression measures is provided in the \RObj{exprs} slot of \RObj{ALL} and the phenotype data are stored in the \RObj{phenoData} slot. 
Note that the expression measures in \RObj{ALL} have been normalized using the three-step procedure RMA implemented in the package \RPack{affy}. In particular, the expression measures have been subject to a base 2 logarithmic transformation.

\begin{Schunk}
\begin{Sinput}
> library(ALL)
> library(hgu95av2)
> data(ALL)
> class(ALL)
\end{Sinput}
\begin{Soutput}
[1] "exprSet"
attr(,"package")
[1] "Biobase"
\end{Soutput}
\begin{Sinput}
> slotNames(ALL)
\end{Sinput}
\begin{Soutput}
[1] "exprs"       "se.exprs"    "phenoData"   "description" "annotation" 
[6] "notes"      
\end{Soutput}
\begin{Sinput}
> show(ALL)
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with 
        12625 genes
        128 samples
                 phenoData object with 21 variables and 128 cases
         varLabels
                cod:  Patient ID
                diagnosis:  Date of diagnosis
                sex:  Gender of the patient
                age:  Age of the patient at entry
                BT:  does the patient have B-cell or T-cell ALL
                remission:  Complete remission(CR), refractory(REF) or NA. Derived from CR
                CR:  Original remisson data
                date.cr:  Date complete remission if achieved
                t(4;11):  did the patient have t(4;11) translocation. Derived from citog
                t(9;22):  did the patient have t(9;22) translocation. Derived from citog
                cyto.normal:  Was cytogenetic test normal? Derived from citog
                citog:  original citogenetics data, deletions or t(4;11), t(9;22) status
                mol.biol:  molecular biology
                fusion protein:  which of p190, p210 or p190/210 for bcr/able
                mdr:  multi-drug resistant
                kinet:  ploidy: either diploid or hyperd.
                ccr:  Continuous complete remission? Derived from f.u
                relapse:  Relapse? Derived from f.u
                transplant:  did the patient receive a bone marrow transplant? Derived from f.u
                f.u:  follow up data available
                date last seen:  date patient was last seen
\end{Soutput}
\begin{Sinput}
> varLabels(ALL)
\end{Sinput}
\begin{Soutput}
$cod
[1] " Patient ID"

$diagnosis
[1] " Date of diagnosis"

$sex
[1] " Gender of the patient"

$age
[1] " Age of the patient at entry"

$BT
[1] " does the patient have B-cell or T-cell ALL"

$remission
[1] " Complete remission(CR), refractory(REF) or NA. Derived from CR"

$CR
[1] " Original remisson data"

$date.cr
[1] " Date complete remission if achieved"

$"t(4;11)"
[1] " did the patient have t(4;11) translocation. Derived from citog"

$"t(9;22)"
[1] " did the patient have t(9;22) translocation. Derived from citog"

$cyto.normal
[1] " Was cytogenetic test normal? Derived from citog"

$citog
[1] " original citogenetics data, deletions or t(4;11), t(9;22) status"

$mol.biol
[1] " molecular biology"

$"fusion protein"
[1] " which of p190, p210 or p190/210 for bcr/able"

$mdr
[1] " multi-drug resistant"

$kinet
[1] " ploidy: either diploid or hyperd."

$ccr
[1] " Continuous complete remission? Derived from f.u"

$relapse
[1] " Relapse? Derived from f.u"

$transplant
[1] " did the patient receive a bone marrow transplant? Derived from f.u"

$f.u
[1] " follow up data available"

$"date last seen"
[1] " date patient was last seen"
\end{Soutput}
\end{Schunk}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Filtering genes: \RPack{genefilter}}

A very common task in microarray data analysis is gene-by-gene selection. 
One might be interested in filtering genes based on the following criteria: 
data quality, e.g., absolute intensity or variance; 
subject matter knowledge;
ability to differentiate cases from controls;
spatial or temporal expression patterns.
Depending on the experimental design, some highly-specialized filters may be required and applied sequentially.
In clinical trials, one might filter genes based on association with survival, e.g., using a Cox proportional hazards model.
In a factorial experiment, one might filter genes based on interaction between two treatments, e.g., using 2-way ANOVA.
In a time-course experiment one might filter genes based on periodicity of expression pattern, e.g., using Fourier transform.

The \RPack{genefilter} package provides tools to sequentially apply filters to the rows (genes) of a matrix or of an \RClass{exprSet} object.
It separates the filtering process into two main tasks, {\em assembling} and {\em applying the filters}. 
Accordingly, the package has two main functions, \RObj{filterfun} and \RObj{genefilter}, one for each of the two main tasks of assembling and filtering, respectively. 

Any number of functions for specific filtering tasks can be defined and supplied to \RObj{filterfun}, e.g., Cox model $p$-values, coefficient of variation.
The basic idea is to rely on {\em lexical scoping} to provide values, or {\em bindings}, for the variables that are needed to do the filtering. 
Specifically, each filter is implemented using a function {\em closure}, i.e., the {\em body} of the function along with an {\em enclosing environment} containing all variable bindings needed for evaluating the function. Closures facilitate software modularity and extensibility and are used in other Bioconductor packages, such as \RPack{marray} and \RPack{multtest}. 
In \RPack{genefilter}, closures allow the user to pass functions with the same arguments, i.e., footprint, to the main function \RObj{filterfun}; specialized arguments are passed in the enclosing environment.
In  \RPack{multtest}, closures are used to provide access to a broad class of test statistics. All the test statistics functions have a small number of common arguments and the values of additional, specific arguments are included in the enclosing environment of the function.

The main steps in using \RPack{genefilter} are as follows.
\begin{enumerate}
\item
Select/define functions for specific filtering tasks.
\item
Assemble the filters using the \RObj{filterfun} function.
\item
Apply the filters using the \RObj{genefilter} function. The result is a logical vector, where \RObj{TRUE} indicates genes that are retained.
\item
Apply this logical vector to the \RClass{exprSet} object to obtain a microarray object corresponding to the subset of interesting genes.
\end{enumerate}

The \RPack{genefilter} package provides a number of commonly-used filtering functions; these are essentially {\em wrappers} around general purpose R functions.
\begin{itemize}
\item
\RObj{kOverA}: select genes for which $k$ samples have expression measures larger than $A$.
\item
\RObj{gapFilter}: select genes with a large IQR or gap (jump) in expression measures across samples.
\item
\RObj{ttest}: select genes according to $t$-test nominal $p$-values.
\item
\RObj{Anova}: select genes according to ANOVA nominal $p$-values.
\item
\RObj{coxfilter}: select genes according to Cox proportional hazards model nominal $p$-values.
\end{itemize}
It is very simple to write your own custom filters --- use the supplied filtering functions as templates.

The code below filters genes in the ALL dataset as in the original article of Chiaretti et al. (2004). The gene selection criteria require that: 
(i) at least 20\% of the subjects have a measured intensity of at least 100 and
(ii) the ratio of the SD to the mean (i.e., coefficient of variation) intensity across samples be between 0.7 and 10. 

\begin{Schunk}
\begin{Sinput}
> X <- exprs(ALL)
> f1 <- pOverA(p = 0.2, A = 100)
> f2 <- cv(a = 0.7, b = 10)
> ff <- filterfun(f1, f2)
> geneSub <- genefilter(2^X, ff)
> subALL <- ALL[geneSub, ]
> show(subALL)
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with 
        431 genes
        128 samples
                 phenoData object with 21 variables and 128 cases
         varLabels
                cod:  Patient ID
                diagnosis:  Date of diagnosis
                sex:  Gender of the patient
                age:  Age of the patient at entry
                BT:  does the patient have B-cell or T-cell ALL
                remission:  Complete remission(CR), refractory(REF) or NA. Derived from CR
                CR:  Original remisson data
                date.cr:  Date complete remission if achieved
                t(4;11):  did the patient have t(4;11) translocation. Derived from citog
                t(9;22):  did the patient have t(9;22) translocation. Derived from citog
                cyto.normal:  Was cytogenetic test normal? Derived from citog
                citog:  original citogenetics data, deletions or t(4;11), t(9;22) status
                mol.biol:  molecular biology
                fusion protein:  which of p190, p210 or p190/210 for bcr/able
                mdr:  multi-drug resistant
                kinet:  ploidy: either diploid or hyperd.
                ccr:  Continuous complete remission? Derived from f.u
                relapse:  Relapse? Derived from f.u
                transplant:  did the patient receive a bone marrow transplant? Derived from f.u
                f.u:  follow up data available
                date last seen:  date patient was last seen
\end{Soutput}
\end{Schunk}

Note that the \RObj{genefilter} function can be applied directly to an object of class \RClass{exprSet}. In our particular example, we needed to transform the matrix of expression measures before applying \RObj{genefilter}.

The \RPack{multtest} package provides code for the above filtering and the resulting object \RObj{filtALL} of class \RObj{exprSet}.

\begin{Schunk}
\begin{Sinput}
> data(filtALL)
> show(filtALL)
\end{Sinput}
\begin{Soutput}
Expression Set (exprSet) with 
        431 genes
        128 samples
                 phenoData object with 21 variables and 128 cases
         varLabels
                cod:  Patient ID
                diagnosis:  Date of diagnosis
                sex:  Gender of the patient
                age:  Age of the patient at entry
                BT:  does the patient have B-cell or T-cell ALL
                remission:  Complete remission(CR), refractory(REF) or NA. Derived from CR
                CR:  Original remisson data
                date.cr:  Date complete remission if achieved
                t(4;11):  did the patient have t(4;11) translocation. Derived from citog
                t(9;22):  did the patient have t(9;22) translocation. Derived from citog
                cyto.normal:  Was cytogenetic test normal? Derived from citog
                citog:  original citogenetics data, deletions or t(4;11), t(9;22) status
                mol.biol:  molecular biology
                fusion protein:  which of p190, p210 or p190/210 for bcr/able
                mdr:  multi-drug resistant
                kinet:  ploidy: either diploid or hyperd.
                ccr:  Continuous complete remission? Derived from f.u
                relapse:  Relapse? Derived from f.u
                transplant:  did the patient receive a bone marrow transplant? Derived from f.u
                f.u:  follow up data available
                date last seen:  date patient was last seen
\end{Soutput}
\end{Schunk}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Multiple testing: \RPack{multtest}}

\subsection{Overview}

The Bioconductor R package \RPack{multtest} implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics \cite{Dudoit&vdLaanMTBook,DudoitetalMT1SAGMB04,Pollard&vdLaanJSPI04,vdLaanetalMT3SAGMB04,vdLaanetalMT2SAGMB04}. 
The current version of \RPack{multtest} provides MTPs for null hypotheses concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models.
Both bootstrap and permutation estimators of the test statistics ($t$- or $F$-statistics) null distribution are available. 
Procedures are provided to control Type I error rates defined as tail probabilities and expected values of arbitrary functions of the numbers of Type I errors, $V_n$, and rejected hypotheses, $R_n$. 
These error rates include: 
the generalized family-wise error rate, $gFWER(k) = Pr(V_n > k)$, or chance of at least $(k+1)$ false positives (the special case $k=0$ corresponds to the usual family-wise error rate, FWER); 
tail probabilities $TPPFP(q) = Pr(V_n/R_n > q)$ for the proportion of false positives among the rejected hypotheses;
the false discovery rate, $FDR=E[V_n/R_n]$.
Single-step and step-down common-cut-off (maxT) and common-quantile (minP) procedures, that take into account the joint distribution of the test statistics, are implemented to control the FWER. 
In addition, augmentation procedures are provided to control the gFWER and TPPFP, based on {\em any} initial FWER-controlling procedure.
The results of a multiple testing procedure are summarized using rejection regions for the test statistics, confidence regions for the parameters of interest, and adjusted $p$-values.
The modular design of the \RPack{multtest} package allows interested users to readily extend the package functionality by inserting additional functions for test statistics and testing procedures. 
The S4 class/method object-oriented programming approach was adopted to summarize the results of a MTP.

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Background on multiple testing procedures}

\subsubsection{Multiple hypothesis testing framework}
\label{anal:mult:s:framework}

{\em Hypothesis testing} is concerned with using observed data to test hypotheses, i.e.,  make decisions, regarding properties of the unknown data generating distribution. 
Below, we discuss in turn the main ingredients of a multiple testing problem, namely: data, null and alternative hypotheses, test statistics, multiple testing procedure (MTP) defining rejection regions for the test statistics, Type I and Type II errors, and adjusted $p$-values. 
The crucial choice of a test statistics null distribution is addressed in Section \ref{anal:mult:s:nullDistn}. 
Specific proposals of MTPs are given in Sections \ref{anal:mult:s:SS} -- \ref{anal:mult:s:AMTP}.\\

\noindent
{\bf Data.} Let $X_1,\ldots,X_n$ be a {\em random sample} of $n$ independent and identically distributed (i.i.d.) random variables, $X \sim P\in {\cal M}$, where the {\em data generating distribution} $P$ is known to be an element of a particular {\em statistical model} ${\cal M}$ (i.e., a set of possibly non-parametric distributions).\\

\noindent
{\bf Null and alternative hypotheses.} 
In order to cover a broad class of testing problems, define $M$
null hypotheses in terms of a collection of {\em submodels}, ${\cal
  M}(m)\subseteq {\cal M}$,  $m=1,\ldots,M$, for the data generating
distribution $P$. The $M$ {\em null hypotheses} are defined as
$H_0(m) \equiv \mathrm{I}(P\in {\cal M}(m))$ and the corresponding {\em
  alternative hypotheses} as $H_1(m) \equiv \mathrm{I}(P \notin {\cal M}(m))$.

In many testing problems, the submodels concern {\em parameters}, i.e., functions of the data generating distribution $P$, $\Psi(P) = \psi= (\psi(m):m=1,\ldots,M)$, such as means, differences in means, correlations, and parameters in linear models, generalized linear models, survival models, time-series models, dose-response models, etc. One distinguishes between two types of testing problems: {\em one-sided tests}, where $H_0(m) = \mathrm{I}(\psi(m) \leq \psi_0(m))$, and {\em two-sided tests}, where $H_0(m) = \mathrm{I}(\psi(m) =
\psi_0(m))$.
The hypothesized {\em null values}, $\psi_0(m)$, are frequently zero.

 Let ${\cal H}_0={\cal H}_0(P)\equiv \{m:H_0(m)=1\} = \{m: P \in {\cal M}(m)\}$ be the set of $h_0 \equiv |{\cal H}_0|$ true null hypotheses, where we note that ${\cal H}_0$ depends on the data generating distribution $P$. Let ${\cal H}_1={\cal H}_1(P) \equiv {\cal H}_0^c(P) = \{m: H_1(m) = 1\} = \{m: P \notin {\cal M}(m)\}$
be the set of  $h_1 \equiv |{\cal H}_1|  = M-h_0$ false null hypotheses, i.e., true positives.  
The goal of a multiple testing
  procedure is to accurately estimate the set ${\cal H}_0$, and thus its
  complement ${\cal H}_1$, while controlling probabilistically the number
  of false positives at a user-supplied level $\alpha$.\\

\noindent
{\bf Test statistics.} A testing procedure is a data-driven rule for deciding whether or not to {\em reject}  each of the $M$ null hypotheses $H_0(m)$, i.e., declare that $H_0(m)$ is false (zero) and hence $P \notin {\cal M}(m)$. 
The decisions to reject or not the null hypotheses are based on an $M$--vector of
{\em test statistics}, $T_n
  =(T_n(m):m=1,\ldots,M)$, that are functions of the
data, $X_1, \ldots, X_n$. Denote the typically unknown (finite sample) {\em joint distribution} of the test statistics $T_n$ by $Q_n=Q_n(P)$. 


Single-parameter null hypotheses are commonly tested using {\em $t$-statistics}, i.e., standardized differences,
\begin{equation}\label{anal:mult:e:tstat}
T_n(m) \equiv \frac{\mbox{Estimator} - \mbox{Null value}}{\mbox{Standard error}} = \sqrt{n}\frac{\psi_n(m) - \psi_0(m)}{{\sigma_n(m)}}.
\end{equation}
In general, the $M$--vector $\psi_n = (\psi_n(m): m=1,\ldots, M)$ denotes an asymptotically linear {\em estimator} of the parameter $M$--vector $\psi = (\psi(m): m=1,\ldots,M)$ and $(\sigma_n(m)/\sqrt{n}:
m=1,\ldots, M)$ denote consistent estimators of the {\em standard errors} of the components of $\psi_n$. 
For tests of means, one recovers the usual one-sample and two-sample $t$-statistics, where the $\psi_n(m)$ and $\sigma_n(m)$ are based on sample means and variances, respectively.
In some settings, it may be appropriate to use (unstandardized) {\em difference statistics}, $T_n(m) \equiv \sqrt{n}(\psi_n(m) - \psi_0(m))$ \cite{Pollard&vdLaanJSPI04}.
Test statistics for other types of null hypotheses include $F$-statistics, $\chi^2$-statistics, and likelihood ratio statistics. \\


\noindent
{\bf Example.}
Suppose that, as in the analysis of the ALL dataset of  Chiaretti et al. \cite{Chiarettietal04} (Section \ref{anal:mult:s:appns}), one is interested in identifying genes that are differentially expressed in two populations of  ALL cancer patients, those with normal cytogenetic test status and those with abnormal test. 
The data consist of random $J$--vectors $X$, where the first $M$ entries of $X$ are microarray expression measures on $M$ genes of interest and the last entry, $X(J)$, is an indicator for cytogenetic test status (1 for normal, 0 for abnormal). 
Then, the parameter of interest is an $M$--vector of differences in mean expression measures in the two populations, $\psi(m) = E[X(m) | X(J)=0] - E[X(m) | X(J)=1]$, $m=1,\ldots,M$. 
To identify genes with higher mean expression measures in the abnormal compared to the normal cytogenetics subjects, one can test the one-sided null hypotheses $H_0(m) = \mathrm{I}(\psi(m) \leq 0)$ vs. the alternative hypotheses $H_1(m) = \mathrm{I}(\psi(m) > 0)$, using two-sample Welch $t$-statistics 
\begin{equation}
T_n(m) \equiv \frac{\bar{X}_{0,n_0}(m) - \bar{X}_{1,n_1}(m)}{\sqrt{\frac{\sigma_{0,n_0}^2(m)}{n_0} + \frac{\sigma_{1,n_1}^2(m)}{n_1}}},
\end{equation}
where $n_k$, $\bar{X}_{k,n_k}(m)$, and $\sigma_{k,n_k}^2(m)$ denote, respectively, the sample size, sample means, and sample variances, for patients with test status $k$, $k=0,\, 1$. The null hypotheses are rejected, i.e., the corresponding genes are declared differentially expressed, for large values of the test statistics $T_n(m)$.\\

%%% \textcolor{red}{*** CHECK: the code does with $-T_n(m)$ and rejects for small values of $T_n(m)$.

\noindent
{\bf Multiple testing procedure.} A {\em multiple testing procedure} (MTP) provides {\em rejection regions}, ${\cal C}_n(m)$, i.e., sets of values for the test statistics $T_n(m)$ that lead to the decision to reject the null hypotheses $H_0(m)$. 
In other words, a MTP produces a random (i.e., data-dependent) subset ${\cal R}_n$ of rejected hypotheses that estimates ${\cal H}_1$, the set of true positives,
\begin{equation}
{\cal R}_n={\cal R}(T_n, Q_{0n},\alpha) \equiv 
\{m:\mbox{$H_0(m)$ is rejected}\} = \{m: T_n(m) \in {\cal C}_n(m)\},
\end{equation}
where ${\cal C}_n(m)={\cal C}(T_n,Q_{0n},\alpha)(m)$, $m=1,\ldots,M$, denote possibly random rejection regions. The long notation ${\cal R}(T_n, Q_{0n},\alpha)$ and ${\cal C}(T_n, Q_{0n},\alpha)(m)$ emphasizes that the MTP depends on:
(i) the {\em data}, $X_1, \ldots, X_n$,
 through the $M$--vector of {\em test statistics}, $T_n = (T_n(m): m=1,\ldots,
 M)$;
 (ii) a test statistics {\em null distribution}, $Q_{0n}$ (Section \ref{anal:mult:s:nullDistn}); and 
(iii) the {\em nominal level} $\alpha$ of the MTP, i.e., the desired upper bound for a suitably defined false positive rate. 

Unless specified otherwise, it is assumed that large values of the test statistic $T_n(m)$ provide evidence against the corresponding null hypothesis $H_0(m)$, that is, we consider rejection regions of the form ${\cal C}_n(m) = (c_n(m),\infty)$, where $c_n(m)$ are to-be-determined {\em cut-offs}, or {\em critical values}.\\ 

\noindent
{\bf Type I and Type II errors.} In any
testing situation, two types of errors can be committed: a {\em false
positive}, or {\em Type I error}, is committed by rejecting a true
null hypothesis, and a {\em false negative}, or {\em Type
II error}, is committed when the test procedure fails to reject a false null
hypothesis. The situation can be summarized by Table \ref{anal:mult:t:TypeIandII}, below, where
the number of Type I errors is $V_n \equiv \sum_{m \in {\cal H}_0} \mathrm{I}(T_n(m) \in {\cal C}_n(m)) = |{\cal R}_n \cap {\cal H}_0|$ and the number
of Type II errors is $U_n \equiv \sum_{m \in {\cal H}_1} \mathrm{I}(T_n(m) \notin {\cal C}_n(m)) = |{\cal R}_n^c \cap {\cal H}_1|$. Note that both $U_n$
and $V_n$ depend on the unknown data generating distribution $P$ through
the unknown set of true null hypotheses ${\cal H}_0 = {\cal H}_0(P)$. The numbers $h_0=|{\cal H}_0|$ and $h_1 = |{\cal H}_1| = M-h_0$ of true and false null hypotheses are
{\em unknown parameters}, the number of rejected hypotheses $R_n \equiv \sum_{m=1}^M  \mathrm{I}(T_n(m) \in {\cal C}_n(m)) = |{\cal R}_n|$ is an {\em observable random variable}, and the entries in the body of the table, $U_n$, $h_1 -
U_n$, $V_n$, and $h_0-V_n$, are
{\em unobservable random variables} (depending on $P$, through ${\cal H}_0(P)$). 
\begin{table}[hhh]
\caption{Type I and Type II errors in multiple hypothesis testing.}
\label{anal:mult:t:TypeIandII}
\begin{tabular}{ll|cc|l}
\multicolumn{5}{c}{} \\
\multicolumn{2}{c}{} & \multicolumn{2}{c}{Null hypotheses} & \multicolumn{1}{c}{}\\
\multicolumn{2}{c}{} & \multicolumn{1}{c}{not rejected} & \multicolumn{1}{c}{rejected} & \multicolumn{1}{c}{} \\
%%% \multicolumn{5}{c}{}\\
\cline{3-4}
&&&&\\
& true & $| {\cal R}_n^c \cap {\cal H}_0 |$ &
$V_n = | {\cal R}_n \cap {\cal H}_0 |$ &
$h_0=| {\cal H}_0|$\\
&&&(Type I errors)&\\
Null hypotheses&&&&\\
& false & $U_n = | {\cal R}_n^c \cap {\cal H}_1 |$ & $| {\cal R}_n \cap {\cal H}_1 |$ & $h_1=| {\cal H}_1
|$\\
&&(Type II errors)&&\\
&&&&\\
\cline{3-4}
%%% \multicolumn{5}{c}{}\\
\multicolumn{2}{c}{}& \multicolumn{1}{c}{$M-R_n$} &
\multicolumn{1}{c}{ $R_n = | {\cal R}_n|$}
&\multicolumn{1}{l}{$M$}\\
\end{tabular}
\end{table}

Ideally, one would like to simultaneously minimize both the chances of committing a Type I error and a Type II error. Unfortunately, this is not feasible and one seeks a {\em trade-off} between the two types of errors. A standard approach is to specify an acceptable level $\alpha$ for the Type I error rate and derive testing procedures, i.e., rejection regions, that aim to minimize the Type II error rate, i.e., maximize {\em power}, within the class of tests with Type I error rate at most $\alpha$. \\


\noindent
{\bf Type I error rates.}
When testing multiple hypotheses, there are many possible definitions for the Type I error rate (and power). Accordingly, we adopt a general definition of Type I error rates, as parameters, $\theta_n = \theta(F_{V_n,R_n})$, of the joint distribution $F_{V_n,R_n}$ of the numbers of Type I errors $V_n$ and rejected hypotheses $R_n$. 
Such a general representation covers the following commonly-used Type I error rates.
\begin{enumerate}
\item 
{\em Generalized family-wise error rate} (gFWER), or 
 probability of at least $(k+1)$ Type I errors, $k=0,\ldots, (h_0-1)$,
\begin{equation}\label{anal:mult:e:gFWER}
gFWER(k) \equiv Pr(V_n \geq k+1) = 1 - F_{V_n}(k).
\end{equation}
When $k=0$, the gFWER is the usual {\em family-wise error rate}, FWER, controlled by the classical Bonferroni procedure.
\item
{\em Per-comparison error rate} (PCER), or expected 
proportion of Type I errors among the $M$ tests,
\begin{equation}\label{anal:mult:e:PCER}
PCER \equiv \frac{1}{M} E[V_n] = \frac{1}{M} \int v dF_{V_n}(v).
\end{equation}
\item
{\em Tail probabilities for the proportion of false positives} (TPPFP) among the rejected hypotheses,
\begin{equation}\label{anal:mult:e:TPPFP}
TPPFP(q) \equiv Pr(V_n/R_n > q) = 1 - F_{V_n/R_n}(q), \qquad q \in (0,1),
\end{equation}
with the convention that $V_n/R_n \equiv 0$, if $R_n=0$.
\item
{\em False discovery rate} (FDR), or  {\em expected value} of the proportion of false positives among the rejected hypotheses, 
\begin{equation}\label{anal:mult:e:FDR}
FDR \equiv E[V_n/R_n] = \int q dF_{V_n/R_n}(q),
\end{equation}
again with the convention that $V_n/R_n \equiv 0$, if $R_n=0$ \cite{Benjamini&Hochberg95}. 
\end{enumerate}
Note that while the gFWER is a parameter of only the {\em marginal} distribution $F_{V_n}$ for the number of Type I errors $V_n$ (tail probability, or survivor function, for $V_n$), the TPPFP is a parameter of the {\em joint} distribution of $(V_n,R_n)$ (tail probability, or survivor function, for $V_n/R_n$). 
 Error rates based on the {\em proportion} of false positives (e.g., TPPFP and FDR) are especially appealing for the large-scale testing problems encountered in genomics, compared to error rates based on the {\em number} of false positives (e.g., gFWER), as they do not increase exponentially with the number of hypotheses. 
The above four error rates are part of the broad class of Type I error rates considered in Dudoit \& van der Laan \cite{Dudoit&vdLaanMTBook} and defined as tail probabilities $Pr(g(V_n,R_n) > q)$ and expected values $E[g(V_n,R_n)]$ for an arbitrary function $g(V_n,R_n)$ of the numbers of false positives $V_n$ and rejected hypotheses $R_n$. The gFWER and TPPFP correspond to the special cases $g(V_n,R_n) = V_n$ and $g(V_n,R_n) = V_n/R_n$, respectively.\\


\noindent
{\bf Adjusted $p$-values.} The notion of $p$-value extends directly to multiple testing problems, as follows. 
Given a MTP, ${\cal R}_n = {\cal R}(T_n,Q_{0n}, \alpha)$, the {\em adjusted $p$-value}, $\widetilde{P}_{0n}(m) = \widetilde{P}(T_n,Q_{0n})(m)$, for null hypothesis $H_0(m)$, is defined as the smallest Type I error level at which one would reject $H_0(m)$, that is,
\begin{eqnarray}
\widetilde{P}_{0n}(m) &\equiv& \inf \left \{ \alpha \in [0,1]: \mbox{Reject $H_0(m)$ at MTP level $\alpha$}\right \}\\
&=& \inf\left \{\alpha \in [0,1]: m \in {\cal R}_n \right \}\nonumber \\
&=& \inf\left \{\alpha \in [0,1]: T_n(m) \in {\cal C}_n(m) \right \}, \qquad m=1,\ldots, M.\nonumber
\end{eqnarray}
As in single hypothesis tests, the smaller the $p$-value, the stronger the evidence against the corresponding null hypothesis. The main difference between unadjusted (i.e., for the test of a single hypothesis) and adjusted $p$-values is that the latter are defined in terms of the Type I error rate for the {\em entire} testing procedure, i.e., take into account the multiplicity of tests.
For example, the adjusted $p$-values for the classical Bonferroni procedure for FWER control are given by $\widetilde{P}_{0n}(m) = \min(M P_{0n}(m), 1)$, 
where $P_{0n}(m)$ is the unadjusted $p$-value for the test of single hypothesis $H_0(m)$.

We now have two representations for a MTP, in terms of rejection regions for the test statistics  and in terms of adjusted $p$-values 
\begin{equation}
{\cal R}_n = \{m: T_n(m) \in {\cal C}_n(m) \} = \{m: \widetilde{P}_{0n}(m) \leq \alpha\}.
\end{equation}
As in the single hypothesis case, an
advantage of reporting adjusted $p$-values, as opposed to only
rejection or not of the hypotheses, is that the level $\alpha$ of the test does
not need to be determined in advance, that is, results of the multiple
testing procedure are provided for all $\alpha$. 
 Adjusted $p$-values are convenient and flexible summaries of the strength of the evidence against each null hypothesis, in terms of the Type I error rate for the entire MTP (gFWER, TPPFP, FDR, or any other suitably defined error rate). \\

\noindent
{\bf Stepwise multiple testing procedures.} 
One usually distinguishes between two main classes of multiple testing
procedures, single-step and stepwise procedures.  
 In {\em single-step procedures}, each null hypothesis is
 evaluated using a rejection region that is  independent of the results of the tests of other hypotheses.
Improvement in power, while preserving Type I error rate
control, may be achieved by {\em stepwise procedures}, in which 
rejection of a particular null hypothesis depends on the outcome of
the tests of other hypotheses. 
That is, the (single-step) test procedure is applied to a sequence of successively smaller nested random (i.e., data-dependent) subsets of null hypotheses, defined by the ordering of the test statistics (common cut-offs) or unadjusted $p$-values (common-quantile cut-offs).
In {\em step-down procedures}, the hypotheses
corresponding to the {\em most significant} test statistics (i.e., largest absolute test
statistics or smallest unadjusted $p$-values) are considered successively, with further tests depending
on the outcome of earlier ones.
As soon as one fails to reject a null hypothesis, no further
hypotheses are rejected. 
In contrast, for {\em step-up procedures},
the hypotheses corresponding to the {\em least significant} test
statistics are considered successively, again with further tests
depending on the outcome of earlier ones. As soon as one hypothesis
is rejected, all remaining more significant hypotheses are rejected. \\


\noindent
{\bf Confidence regions.} 
For the test of single-parameter null hypotheses and for any Type I error rate of the form $\theta(F_{V_n})$, Dudoit \& van der Laan \cite{Dudoit&vdLaanMTBook} and Pollard \& van der Laan \cite{Pollard&vdLaanJSPI04} provide results on the correspondence between single-step MTPs and {\em confidence regions}.

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Test statistics null distribution}
\label{anal:mult:s:nullDistn}

\noindent
{\bf Test statistics null distribution.}
One of the main tasks in specifying a MTP is to derive rejection regions for the test statistics such that the Type I error rate is controlled at a desired level $\alpha$, i.e., such that $\theta(F_{V_n,R_n}) \leq \alpha$, for finite sample control, or $\limsup_n \theta(F_{V_n,R_n}) \leq \alpha$, for asymptotic control.
However, one is immediately faced with the problem that the {\em true distribution} $Q_n=Q_n(P)$ for the test statistics $T_n$ is usually {\em unknown}, and hence, so are the distributions of the numbers of Type I errors, $V_n = \sum_{m \in {\cal H}_0} \mathrm{I}(T_n(m) \in {\cal C}_n(m))$, and rejected hypotheses, $R_n = \sum_{m=1}^M  \mathrm{I}(T_n(m) \in {\cal C}_n(m))$. 
In practice, the test statistics {\em true} distribution $Q_n(P)$ is replaced by a {\em null distribution} $Q_0$ (or estimate thereof, $Q_{0n}$), in order to derive rejection regions, ${\cal C}(T_n,Q_0,\alpha)(m)$, and resulting adjusted $p$-values, $\widetilde{P}(T_n,Q_0)(m)$. 

The choice of null distribution $Q_0$ is crucial, in order
to ensure that (finite sample or asymptotic) control of the Type I
error rate under the {\em assumed} null distribution $Q_0$ does indeed provide the required control under the {\em true} distribution $Q_n(P)$.
For proper control, the null distribution $Q_0$ must be such that the Type I error rate under this assumed null distribution {\em dominates} the Type I error rate under the true distribution $Q_n(P)$. That is, one must have $\theta(F_{V_n,R_n}) \leq \theta(F_{V_0,R_0})$, for finite sample control, and $\limsup_n \theta(F_{V_n,R_n}) \leq  \theta(F_{V_0,R_0})$, for asymptotic control, where $V_0$ and $R_0$ denote, respectively, the numbers of Type I errors and rejected hypotheses under the assumed null distribution $Q_0$.  


For error rates $\theta(F_{V_n})$, defined as arbitrary parameters of the distribution of the number of Type I errors $V_n$, \cite{Dudoit&vdLaanMTBook,DudoitetalMT1SAGMB04,Pollard&vdLaanJSPI04,vdLaanetalMT3SAGMB04,vdLaanetalMT2SAGMB04} propose as null distribution the asymptotic distribution $Q_0$ of the vector of null value shifted and scaled test statistics:
\begin{equation}
Z_n(m) \equiv 
 \sqrt{\min \left(1,
  \frac{\tau_0(m)}{Var[T_n(m)]}\right)} \Bigl( T_n(m) + \lambda_0(m) - E[T_n(m)] \Bigr).
\end{equation}
For the test of single-parameter null hypotheses using $t$-statistics, the null values are $\lambda_0(m)=0$ and $\tau_0(m)=1$. For testing the equality of $K$ population means using $F$-statistics, the null values are  $\lambda_0(m)= 1$ and $\tau_0(m) = 2/(K-1)$, under the assumption of equal variances in the different populations.
Dudoit et al. \cite{DudoitetalMT1SAGMB04} and van der Laan et al. \cite{vdLaanetalMT2SAGMB04} prove that this null distribution does indeed provide the desired asymptotic control of the Type I error rate $\theta(F_{V_n})$, for
 general data generating distributions (with arbitrary dependence structures among variables), null hypotheses (defined in terms of submodels for the data generating distribution), and test statistics (e.g., $t$-statistics, $F$-statistics).

For a broad class of testing problems, such as the test of single-parameter null hypotheses using $t$-statistics (as in Equation (\ref{anal:mult:e:tstat})), the null distribution $Q_0$ is an $M$--variate Gaussian distribution with mean vector zero and covariance matrix $\Sigma^*(P)$: $Q_0 = Q_0(P) \equiv N(0,\Sigma^*(P))$. 
For tests of means, where the parameter of interest is the $M$--dimensional mean vector $\Psi(P) = \psi = E[X]$, the estimator $\psi_n$ is simply the $M$--vector of sample averages and $\Sigma^*(P)$ is the correlation matrix of $X \sim P$, $Cor[X]$. More generally, for an asymptotically linear estimator $\psi_n$, $\Sigma^*(P)$ is the correlation matrix of the vector influence curve (IC).

Note that the following important points distinguish our approach from existing approaches to Type I error rate control. 
Firstly, we are only concerned with Type I error control under the {\em true data generating distribution} $P$. The notions of weak and strong control (and associated subset pivotality) are therefore irrelevant to our approach. 
Secondly, we propose a {\em null distribution for the test statistics} ($T_n \sim Q_0$), and not a data generating null distribution ($X \sim P_0\in \cap_{m=1}^M {\cal M}(m)$). 
The latter practice does not necessarily provide proper Type I error control, as the test statistics' {\em assumed} null distribution $Q_n(P_0)$ and their {\em true} distribution $Q_n(P)$ may have different dependence structures (in the limit) for the true null hypotheses ${\cal H}_0$.\\


\noindent
{\bf Bootstrap estimation of the test statistics null distribution.}
In practice, since the data generating distribution $P$ is unknown, then so is the null distribution $Q_0=Q_0(P)$.  Resampling procedures, such as bootstrap Procedure \ref{anal:mult:proc:boot}, below, may be used to conveniently obtain consistent estimators $Q_{0n}$ of the null distribution $Q_0$ and of the resulting test statistic cut-offs and adjusted $p$-values. 
\begin{center}
\fbox{\parbox{4.5in}{%
\begin{procedure}
\label{anal:mult:proc:boot}
{\bf [Bootstrap estimation of the null distribution $Q_0$]}
\begin{enumerate} 
\item
 Let $P_n^{\star}$ denote an estimator of the data generating distribution
$P$. For the {\em non-parametric bootstrap},  $P_n^{\star}$ is simply the
empirical distribution $P_n$, that is, samples of size $n$ are drawn
at random, with replacement from the observed data $X_1, \ldots, X_n$. For
the {\em model-based bootstrap}, $P_n^{\star}$ is based on a model ${\cal
  M}$ for the data generating distribution $P$, such
as the family of $M$--variate Gaussian distributions.
\item
Generate $B$ bootstrap samples, each consisting of $n$ i.i.d. realizations of a random variable $X^{\#} \sim P_n^{\star}$. 
\item
For the $b$th bootstrap sample, $b=1,\ldots, B$, compute an $M$--vector of test statistics, $T_n^{\#}(\cdot,b) = (T_n^{\#}(m,b): m=1,\ldots,M)$.  Arrange these bootstrap statistics in an $M \times B$ matrix, $\mathbf{T}_n^{\#} = \bigl(T_n^{\#}(m,b)\bigr)$, with rows corresponding to the $M$ null hypotheses and columns to the $B$ bootstrap samples.
\item
Compute row means, $E[T_n{^\#}(m,\cdot)]$, and row variances, $Var[T_n{^\#}(m,\cdot)]$, of the matrix $\mathbf{T}_n^{\#}$, to yield estimates of the true means $E[T_n(m)]$ and variances $Var[T_n(m)]$ of the test statistics, respectively.
\item
Obtain an $M \times B$ matrix, $\mathbf{Z}_n^{\#} = \bigl(Z_n^{\#}(m,b)\bigr)$, of
null value shifted and scaled bootstrap statistics $Z_n^{\#}(m,b)$, by row-shifting and scaling the matrix
$\mathbf{T}_n^{\#}$ using the bootstrap estimates of $E[T_n(m)]$ and
$Var[T_n(m)]$ and the user-supplied null values $\lambda_0(m)$ and
$\tau_0(m)$. That is, compute 
\begin{eqnarray}
Z_n^{\#}(m,b) &\equiv&  \sqrt{\min \left(1,
  \frac{\tau_0(m)}{Var[T_n{^\#}(m,\cdot)]}\right)}\\
&& \qquad \times \ \Bigl( T_n^{\#}(m,b) + \lambda_0(m) - E[T_n{^\#}(m,\cdot)] \Bigr)  \nonumber .
\end{eqnarray}
\item
The bootstrap
estimate $Q_{0n}$ of the null distribution $Q_0$ is the empirical distribution of the $B$ columns $Z_n^{\#}(\cdot,b)$ of matrix $\mathbf{Z}_n^{\#}$.
\end{enumerate}
\end{procedure}
}}
\end{center}
Dudoit et al. \cite{DudoitetalMT1SAGMB04} and van der Laan et al. \cite{vdLaanetalMT2SAGMB04} show that single-step and step-down procedures based on consistent estimators of the null distribution $Q_0$ also provide asymptotic control of the Type I error rate. The reader is referred to these two articles and to Dudoit \& van der Laan \cite{Dudoit&vdLaanMTBook} for details on the choice of null distribution and various approaches for estimating this null distribution.\\

Having selected a suitable test statistics null distribution, there remains the main task of specifying rejection regions for each null hypothesis, i.e., cut-offs for each test statistic. 
Among the different approaches for defining rejection regions, we distinguish between single-step vs. stepwise procedures, and common cut-offs (i.e., the same cut-off $c_0$ is used for each test statistic) vs. common-quantile cut-offs (i.e., the cut-offs are the $\delta_0$--quantiles of the marginal null distributions of the test statistics). 
The next three subsections discuss three main approaches for deriving rejection regions and corresponding adjusted $p$-values: single-step common-cut-off and common-quantile procedures for control of general Type I error rates $\theta(F_{V_n})$ (Section \ref{anal:mult:s:SS});  step-down  common-cut-off (maxT) and common-quantile (minP) procedures for control of the FWER (Section \ref{anal:mult:s:SD}); augmentation procedures for control of the gFWER and TPPFP, based on an initial FWER-controlling procedure (Section \ref{anal:mult:s:AMTP}).

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Single-step procedures for control of general Type I error rates $\theta(F_{V_n})$}
\label{anal:mult:s:SS}


Dudoit et al. \cite{DudoitetalMT1SAGMB04} and Pollard \& van der Laan \cite{Pollard&vdLaanJSPI04} propose single-step common-cut-off and common-quantile procedures for controlling arbitrary parameters $\theta(F_{V_n})$ of the distribution of the number of Type I errors. 
The main idea is to substitute control of the parameter $\theta(F_{V_n})$, for the {\em  unknown, true distribution} $F_{V_n}$ of the number of Type I errors, by control of the corresponding parameter $\theta(F_{R_0})$, for the {\em known, null distribution} $F_{R_0}$ of the number of rejected hypotheses. 
That is, consider single-step procedures of the form ${\cal R}_n \equiv \{m: T_n(m)> c_n(m) \}$, 
where the cut-offs $c_n(m)$ are chosen so that $\theta(F_{R_0}) \leq
\alpha$, for $R_0 \equiv \sum_{m=1}^M \mathrm{I}(Z(m) >  c_n(m))$
and $Z \sim Q_0$.
Among the class of MTPs that satisfy $\theta(F_{R_0}) \leq \alpha$, 
Dudoit et al. \cite{DudoitetalMT1SAGMB04} propose two procedures, based on common cut-offs and common-quantile cut-offs, respectively. 
The procedures are summarized below and the reader is referred to the article for proofs and details on the derivation of cut-offs and corresponding adjusted $p$-values.\\

\noindent
{\bf Single-step common-cut-off procedure.} The set of rejected hypotheses for the {\em $\theta$--controlling single-step common-cut-off procedure} is of the form
${\cal R}_n \equiv \{m: T_n(m)> c_0 \}$, where the common cut-off $c_0$ is the {\em smallest}  (i.e., least conservative) value for which $\theta(F_{R_0}) \leq \alpha$.

For $gFWER(k)$ control (special case $\theta(F_{V_n}) = 1 - F_{V_n}(k)$), the procedure is based on the {\em $(k+1)$st ordered test statistic}.  
Specifically, the adjusted $p$-values are given by
\begin{equation}\label{anal:mult:e:SScut}
\widetilde{P}_{0n}(m) = Pr_{Q_0} \left(Z^{\circ}(k+1) \geq T_n(m) \right),  \qquad m=1,\ldots, M,
\end{equation}
where $Z^{\circ}(m)$ denotes the $m$th ordered component of $Z = (Z(m): m=1,\ldots,M) \sim Q_0$, so that $Z^{\circ}(1) \geq \ldots \geq Z^{\circ}(M)$. 
For FWER control ($k=0$), the procedure reduces to the  {\em single-step maxT}  procedure, based on the {\em maximum test statistic}, $Z^{\circ}(1)$.\\

\noindent
{\bf Single-step common-quantile procedure.} The set of rejected hypotheses for the {\em $\theta$--controlling single-step common-quantile procedure} is of the form
${\cal R}_n \equiv \{m: T_n(m)> c_0(m) \}$, where $c_0(m) = Q_{0,m}^{-1}(\delta_0)$ is the $\delta_0$--quantile of the marginal null distribution $Q_{0,m}$ of the $m$th test statistic, i.e., the smallest value $c$ such that $Q_{0,m}(c) = Pr_{Q_0}(Z(m) \leq c) \geq \delta_0$ for $Z \sim Q_0$. Here, $\delta_0$ is chosen as the {\em smallest} (i.e., least conservative) value for which $\theta(F_{R_0}) \leq \alpha$.

For $gFWER(k)$ control, the procedure is based on the {\em $(k+1)$st ordered unadjusted $p$-value}. 
Specifically, let $\bar{Q}_{0,m} \equiv 1 - Q_{0,m}$ denote the survivor functions for the marginal null distributions $Q_{0,m}$ and define unadjusted $p$-values $P_0(m) \equiv  \bar{Q}_{0,m}(Z(m))$ and $P_{0n}(m) \equiv  \bar{Q}_{0,m}(T_n(m))$, for $Z \sim Q_0$ and  $T_n \sim Q_n$, respectively. Then, the adjusted $p$-values for the common-quantile procedure are given by
\begin{equation}\label{anal:mult:e:SSquant}
\widetilde{P}_{0n}(m) = Pr_{Q_0} \left(P_0^{\circ}(k+1) \leq P_{0n}(m) \right),  \qquad m=1,\ldots, M,
\end{equation}
where $P_0^{\circ}(m)$ denotes the $m$th ordered component of the $M$--vector of unadjusted $p$-values $(P_0(m): m=1,\ldots,M)$, so that $P_0^{\circ}(1) \leq \ldots \leq P_0^{\circ}(M)$.  
For FWER control ($k=0$), one recovers the {\em single-step minP} procedure, based on the {\em minimum unadjusted $p$-value}, $P_0^{\circ}(1)$.

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Step-down procedures for control of the family-wise error rate}
\label{anal:mult:s:SD}

van der Laan et al. \cite{vdLaanetalMT2SAGMB04} propose step-down common-cut-off (maxT) and common-quantile (minP) procedures for controlling the family-wise error rate, FWER. 
These procedures are similar in spirit to their single-step counterparts in Section \ref{anal:mult:s:SS} (special case $\theta(F_{V_n}) = 1 - F_{V_n}(0)$), with the important step-down distinction that hypotheses are considered successively, from most significant to least significant, with further tests depending on the outcome of earlier ones. 
That is, the test procedure is applied to a sequence of successively smaller nested random (i.e., data-dependent) subsets of null hypotheses, defined by the ordering of the test statistics (common cut-offs) or unadjusted $p$-values (common-quantile cut-offs). \\


\noindent
{\bf Step-down common-cut-off (maxT) procedure.}
Rather than being based solely on the distribution of the maximum test statistic over all $M$ hypotheses, the step-down common cut-offs and corresponding adjusted $p$-values are based on the distributions of maxima of test statistics over successively smaller nested random subsets of null hypotheses. 
Specifically, let $O_n(m)$ denote the indices for the ordered test statistics $T_n(m)$, so that $T_n(O_n(1)) \geq \ldots \geq T_n(O_n(M))$. 
The step-down common-cut-off procedure is then based on the distributions of maxima of test statistics over the nested subsets of ordered hypotheses $\overline{\cal O}_n(h) \equiv \{O_n(h),\ldots,O_n(M)\}$. 
The adjusted $p$-values for the {\em step-down maxT} procedure are given by
\begin{equation}\label{anal:mult:e:SDmaxT}
\widetilde{p}_{0n}(o_n(m)) =  \max_{h=1,\ldots, m}\ \left\{ Pr_{Q_0}\left(
  \max_{l \in \overline{\cal o}_n(h)} Z(l) \geq t_n(o_n(h))\right)
  \right \},
\end{equation}
where $Z=(Z(m): m=1,\ldots, M)  \sim Q_0$. 
Taking maxima of the probabilities over $h \in \{1, \ldots, m\}$ enforces monotonicity of the adjusted $p$-values and ensures that the procedure is indeed step-down, that is, one can only reject a particular hypothesis provided all hypotheses with
more significant (i.e., larger) test statistics were rejected beforehand.\\

\noindent
{\bf Step-down common-quantile (minP) procedure.}
Likewise, the step-down common-quantile cut-offs and corresponding adjusted $p$-values are based on the distributions of minima of unadjusted $p$-values over successively smaller nested random subsets of null hypotheses.
Specifically, let $O_n(m)$ denote the indices for the ordered unadjusted $p$-values $P_{0n}(m)$, so that $P_{0n}(O_n(1)) \leq \ldots \leq P_{0n}(O_n(M))$. 
The step-down common-quantile procedure is then based on the distributions of minima of unadjusted $p$-values over the nested subsets of ordered hypotheses $\overline{\cal O}_n(h) \equiv \{O_n(h),\ldots,O_n(M)\}$. 
The adjusted $p$-values for the {\em step-down minP} procedure are given by
\begin{equation}\label{anal:mult:e:SDminP}
\widetilde{p}_{0n}(o_n(m)) = \max_{h=1,\ldots, m}\ \left\{ Pr_{Q_0}\left(
  \min_{l \in \overline{\cal o}_n(h)} P_0(l) \leq p_{0n}(o_n(h))\right)
  \right \},
\end{equation}
where $P_0(m) = \bar{Q}_{0,m}(Z(m))$ and $Z=(Z(m): m=1,\ldots, M)  \sim Q_0$. 


%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Augmentation multiple testing procedures}
\label{anal:mult:s:AMTP}

Dudoit \& van der Laan \cite{Dudoit&vdLaanMTBook} and van der Laan et al. \cite{vdLaanetalMT3SAGMB04} discuss {\em augmentation multiple testing procedures} (AMTP), obtained by adding suitably chosen null hypotheses to the set of null hypotheses already rejected by an initial MTP. 
Specifically, given {\em any} initial procedure controlling the generalized family-wise error rate, augmentation procedures are derived for controlling Type I error rates defined as tail probabilities and expected values for arbitrary functions $g(V_n,R_n)$ of the numbers of Type I errors and rejected hypotheses (e.g., proportion $g(V_n,R_n)=V_n/R_n$ of false positives among the rejected hypotheses). 
Adjusted $p$-values for the AMTP are shown to be simply shifted versions of the adjusted $p$-values of the original MTP. 
The important practical implication of these results is that {\em any} FWER-controlling MTP and its
corresponding adjusted $p$-values, provide, without additional work, multiple testing procedures controlling a broad class of Type I error rates and their adjusted $p$-values.
One can therefore build on the large pool of available FWER-controlling procedures, such as the single-step and step-down maxT and minP procedures discussed in Sections \ref{anal:mult:s:SS} and \ref{anal:mult:s:SD}, above. 

Augmentation procedures for controlling tail probabilities of the number (gFWER) and proportion (TPPFP) of false positives, based on an initial FWER-controlling procedure, are treated in detail in van der Laan et al. \cite{vdLaanetalMT3SAGMB04} and summarized below. The gFWER and TPPFP correspond to the special cases $g(V_n,R_n) = V_n$ and  $g(V_n,R_n) = V_n/R_n$, respectively. 
Denote the adjusted $p$-values for the initial FWER-controlling procedure by $\widetilde{P}_{0n}(m)$. Order the $M$ null hypotheses according to these $p$-values, from smallest to largest, that is, define indices $O_n(m)$, so that $\widetilde{P}_{0n}(O_n(1))\leq \ldots \leq \widetilde{P}_{0n}(O_n(M))$. Then, for a nominal level $\alpha$ test, the initial FWER-controlling procedure rejects the $R_n$ null hypotheses 
\begin{equation}
{\cal R}_n \equiv \{m: \widetilde{P}_{0n}(m) \leq \alpha\}.
\end{equation}

\noindent
{\bf Augmentation procedure for controlling the gFWER.} For control of $gFWER(k)$ at level $\alpha$, given an initial FWER-controlling procedure, reject the $R_n$ hypotheses specified by this MTP, as well as the next $A_n = \min\{k, M-R_n\}$ most significant null hypotheses. 
The adjusted $p$-values $\widetilde{P}_{0n}^{+}(O_n(m))$ for the new gFWER-controlling AMTP are simply $k$--shifted versions of the adjusted $p$-values of the initial FWER-controlling MTP:
\begin{equation}\label{anal:mult:e:adjpgFWER}
\widetilde{P}_{0n}^{+}(O_n(m)) =
\begin{cases}
0, & \text{if $m=1,\ldots,k$},\\
\widetilde{P}_{0n}(O_n(m-k)), & \text{if $m=k+1, \ldots, M$}.
\end{cases}
\end{equation}
That is, the first $k$ adjusted $p$-values are set to zero and the remaining $p$-values are the adjusted $p$-values of the FWER-controlling MTP shifted by $k$. The AMTP thus guarantees at least $k$ rejected hypotheses.\\


\noindent
{\bf Augmentation procedure for controlling the TPPFP.} For control of $TPPFP(q)$ at level $\alpha$, given an initial FWER-controlling procedure, reject the $R_n$ hypotheses specified by this MTP, as well as the next $A_n$ most significant null hypotheses, 
\begin{eqnarray}
\label{anal:mult:e:augTPPFP}
A_n &=& \max\left\{m \in \{0,\ldots, M - R_n\}:\frac{m}{m+ R_n}\leq q\right\} \nonumber\\
&=& \min \left\{ \left \lfloor \frac{q R_n}{1-q} \right \rfloor, M-R_n \right\},
\end{eqnarray}
where the {\em floor} $\lfloor x \rfloor$ denotes the greatest integer less than or equal to $x$, i.e., $\lfloor x \rfloor \leq x < \lfloor x \rfloor + 1$. That is, keep rejecting null hypotheses until the ratio of additional rejections to the total number of rejections reaches the allowed proportion $q$ of false positives. 
The adjusted $p$-values $\widetilde{P}_{0n}^{+}(O_n(m))$ for the new TPPFP-controlling AMTP are simply shifted versions of the adjusted $p$-values of the initial FWER-controlling MTP, that is,
\begin{equation}\label{anal:mult:e:adjpTPPFP}
\widetilde{P}_{0n}^{+}(O_n(m)) = \widetilde{P}_{0n}(O_n(\lceil(1-q)m\rceil)), \qquad m=1,\ldots,M,
\end{equation}
where the {\em ceiling} $\lceil x \rceil$ denotes the least integer greater than or equal to $x$, i.e., $\lceil x \rceil -1 < x \leq \lceil x \rceil$. 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Software implementation: \RPack{multtest}}
\label{anal:mult:s:software}

The above MTPs are implemented in the development version of the Bioconductor R package \RPack{multtest} (to be released on October 24th, 2004, as version 1.5.1, Bioconductor 1.5). 
New features include: expanded class of tests (e.g., for regression parameters in linear models and in Cox proportional hazards models);
control of a broader class of Type I error rates (e.g., gFWER, TPPFP, FDR); 
bootstrap estimation of the test statistics null distribution; 
augmentation multiple testing procedures;  
confidence regions for the parameter vector of interest.
Because of their general applicability and novelty, we focus in this section on MTPs that utilize a bootstrap estimated test statistics null distribution and that are available through the package's main user-level function: \RObj{MTP}.
Note that for many testing problems, MTPs based on permutation (rather than bootstrap) estimated null distributions are also available in the present and earlier versions of \RPack{multtest}.
In particular, permutation-based step-down maxT and minP FWER-controlling MTPs are implemented in the functions \RObj{mt.maxT} and \RObj{mt.minP}, respectively, and can also be applied directly through a call to the \RObj{MTP} function.

We stress that {\em all} the bootstrap-based MTPs implemented in \RPack{multtest} can be performed using the main user-level function \RObj{MTP}. 
Most users will therefore only need to be familiar with this function. 
Other functions are provided primarily for the benefit of more advanced users, interested in extending package functionality (Section \ref{anal:mult:s:design}).
For greater detail on \RPack{multtest} functions, the reader is referred to the package documentation, in the form of help files, e.g., \RObj{? MTP}, and vignettes, e.g., \RObj{openVignette("multtest")}. 

One needs to specify the following main ingredients when applying a MTP: 
the {\em data}, $X_1, \ldots, X_n$; 
suitably defined {\em test statistics}, $T_n$, for each of the null hypotheses under consideration (e.g., one-sample $t$-statistics, robust rank-based $F$-statistics, $t$-statistics for regression coefficients in Cox proportional hazards model); 
a choice of {\em Type I error rate}, $\theta(F_{V_n,R_n})$, providing an appropriate measure of false positives for the particular testing problem (e.g., $TPPFP(0.10)$);
a proper {\em joint null distribution}, $Q_0$ (or estimate thereof, $Q_{0n})$, for the test statistics (e.g., bootstrap null distribution of Procedure \ref{anal:mult:proc:boot}); 
given the previously defined components, a {\em multiple testing procedure}, ${\cal R}_n={\cal R}(T_n, Q_{0n},\alpha)$, for controlling the error rate $\theta(F_{V_n,R_n})$ at a target level $\alpha$.
Accordingly, \RPack{multtest} adopts a modular and extensible approach to the implementation of MTPs, with the following four main types of functions.

\begin{itemize}

\item 
Functions for computing the {\em test statistics}, $T_n$. These are internal functions (e.g., \RObj{meanX}, \RObj{coxY}), i.e., functions that are generally not called directly by the user. 
As shown in Section \ref{anal:mult:s:MTP}, below, the type of test statistic is specified by the \RObj{test} argument of the main user-level function \RObj{MTP}.  
Advanced users, interested in extending the class of tests available in \RPack{multtest}, can simply add their own test statistic functions to the existing library of such internal functions (see Section \ref{anal:mult:s:design}, below, for a brief discussion of the closure approach).

\item
Functions for obtaining the {\em test statistics null distribution}, $Q_0$ or $Q_{0n}$.  The main function currently available is the internal function \RObj{boot.resample}, implementing the non-parametric version of bootstrap Procedure \ref{anal:mult:proc:boot} (Section \ref{anal:mult:s:nullDistn}). 

\item
Functions for implementing the {\em multiple testing procedure}, ${\cal R}(T_n, Q_{0n},\alpha)$, i.e., for deriving rejection regions, confidence regions, and adjusted $p$-values. 
The main function is the  user-level wrapper function \RObj{MTP}, which implements the single-step and step-down maxT and minP procedures for FWER control. 
The functions  \RObj{fwer2gfwer} and \RObj{fwer2tppfp} implement, respectively, gFWER and TPPFP-controlling augmentation multiple testing procedures, based on adjusted $p$-values from {\em any} FWER-controlling procedure and can be called via the \RObj{typeone} argument to \RObj{MTP} (Section \ref{anal:mult:s:AMTP}).

\item
Functions for {\em numerical and graphical summaries} of a MTP. As described in Section \ref{anal:mult:s:summaries}, below, a number of summary methods are available to operate on objects of class \RClass{MTP}, output from the main \RObj{MTP} function.
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Resampling-based multiple testing procedures: \RObj{MTP} function}
\label{anal:mult:s:MTP}

The main user-level function for resampling-based multiple testing is \RObj{MTP}. Its arguments are as follows,

\begin{Schunk}
\begin{Sinput}
> args(MTP)
\end{Sinput}
\begin{Soutput}
function (X, W = NULL, Y = NULL, Z = NULL, Z.incl = NULL, Z.test = NULL, 
    na.rm = TRUE, test = "t.twosamp.unequalvar", robust = FALSE, 
    standardize = TRUE, alternative = "two.sided", psi0 = 0, 
    typeone = "fwer", k = 0, q = 0.1, alpha = 0.05, nulldist = "boot", 
    B = 1000, method = "ss.maxT", get.cr = FALSE, get.cutoff = FALSE, 
    get.adjp = TRUE, keep.nulldist = FALSE, seed = NULL) 
NULL
\end{Soutput}
\end{Schunk}

\noindent
{\bf  INPUT.}
\begin{description}

\item{\em Data.} 
The data, \RObj{X}, consist of a $J$--dimensional random vector, observed on each of $n$ sampling units (subjects, cell lines, mice, etc). 
These data can be stored in a $J \times n$ \RClass{matrix}, \RClass{data.frame}, or \RClass{exprs} slot from an object of class \RClass{exprSet}. 
In some settings,  a $J$--vector of weights may be associated with each observation, and stored in a $J \times n$ weight matrix, \RObj{W} (or an $n$--vector \RObj{W}, if the weights are the same for each of the $J$ variables). 
One may also observe a possibly censored continuous or polychotomous outcome, \RObj{Y}, for each sampling unit, as obtained, for example, from the \RClass{phenoData} slot of an object of class \RClass{exprSet}. 
In some studies, $L$ additional covariates may be measured on each sampling unit and stored in \RObj{Z}, an $n \times L$ \RClass{matrix} or \RClass{data.frame}.

When the tests concern parameters in regression models with covariates from \RObj{Z} (e.g., values \RObj{lm.XvsZ}, \RObj{lm.YvsXZ}, and \RObj{coxph.YvsXZ}, for the argument \RObj{test}, described below), the arguments \RObj{Z.incl} and \RObj{Z.test} specify, respectively, which covariates (i.e., which columns of \RObj{Z}, including \RObj{Z.test}) should be included in the model and which regression parameter is of interest.
The covariates can be specified either by a numeric column index or character string.
If \RObj{X} is an \RClass{exprSet}, \RObj{Y} can be a column index or character string referring to the variable in the \RClass{data.frame} \RObj{pData(X)} to use as outcome. 
Likewise, \RObj{Z.incl} and \RObj{Z.test} can be column indices or character strings referring to the variables in \RObj{pData(X)} to use as covariates.
The data components (\RObj{X}, \RObj{W}, \RObj{Y}, \RObj{Z}, \RObj{Z.incl}, and \RObj{Z.test}) are the first six arguments to the \RObj{MTP} function. 
Only \RObj{X} is a required argument; the others are by default \RObj{NULL}.
The argument \RObj{na.rm} allows one to control the treatment of "Not Available" or \RObj{NA} values. It is set to \RObj{TRUE}, by default, so that an
observation with a missing value in any of the data objects' $j$th component ($j=1,\ldots,J$) is excluded from computation of any of the relevant test statistics.


\item{\em Test statistics.} 

The test statistics should be chosen based on the parameter of interest (e.g., location, scale, or regression parameters of the data generating distribution $P$) and the hypotheses one wishes to test. In the current implementation of \RPack{multtest}, the following test statistics are available through the argument \RObj{test}, with default value \RObj{t.twosamp.unequalvar} for the two-sample Welch $t$-statistic. 
\begin{itemize}
\item 
\RObj{t.onesamp}: one-sample $t$-statistic for tests of means;
\item 
\RObj{t.twosamp.equalvar}: equal variance two-sample $t$-statistic for tests of differences in means;
\item 
\RObj{t.twosamp.unequalvar}: unequal variance two-sample $t$-statistic for tests of differences in means (two-sample Welch $t$-statistic); 
\item 
\RObj{t.pair}: two-sample paired $t$-statistic for tests of differences in means;
\item 
\RObj{f}: multi-sample $F$-statistic for tests of equality of population means; 
\item 
\RObj{f.block}: multi-sample $F$-statistic for tests of equality of population means in a block design;  
\item 

\RObj{lm.XvsZ}: 
$t$-statistic for tests of regression coefficients for variable \RObj{Z.test} in linear models, each with outcome $X(j)$ ($j=1,\ldots,J$), possibly adjusted by covariates \RObj{Z.incl} from the matrix \RObj{Z} (in the case of no covariates, one recovers the one-sample $t$-statistic, \RObj{t.onesamp}); 
\item 
\RObj{lm.YvsXZ}: 
$t$-statistic for tests of regression coefficients in linear models, with outcome $Y$ and each $X(j)$ ($j=1,\ldots,J$) as covariate of interest, with possibly other covariates \RObj{Z.incl} from the matrix \RObj{Z};
\item 
\RObj{coxph.YvsXZ}: $t$-statistic for tests of regression coefficients in Cox proportional hazards survival models, with outcome $Y$ and each $X(j)$ ($j=1,\ldots,J$) as covariate of interest, with possibly other covariates \RObj{Z.incl} from the matrix \RObj{Z}.
\end{itemize}


{\em Robust}, {\em rank-based} versions of the test statistics can be specified by setting the argument \RObj{robust} to \RObj{TRUE} (the default value is \RObj{FALSE}). 
Consideration should be given to whether {\em standardized} (Equation (\ref{anal:mult:e:tstat})) or {\em unstandardized} difference statistics are most appropriate (see Pollard \& van der Laan \cite{Pollard&vdLaanJSPI04} for a comparison). Both options are available through the argument \RObj{standardize}, by default \RObj{TRUE}. 
The type of alternative hypotheses is specified via the \RObj{alternative} argument: default value of \RObj{two.sided}, for two-sided test, and values of \RObj{less} or \RObj{greater}, for one-sided tests. 
The (common) null value for the parameters of interest is specified through the \RObj{psi0} argument, by default zero.  


\item{\em Type I error rate.} 
The \RObj{MTP} function controls by default the family-wise error rate (FWER), or probability of at least one false positive (argument \RObj{typeone="fwer"}). 
Augmentation procedures (Section \ref{anal:mult:s:AMTP}), controlling other Type I error rates such as gFWER, TPPFP, and FDR, can be specified through the argument \RObj{typeone} (and \RObj{k} and \RObj{q}, for the allowed number and proportion of false positives for control of $gFWER(k)$ and $TPPFP(q)$, respectively). 
The nominal level of the test is determined by the argument \RObj{alpha}, by default 0.05. 
Testing can be performed for a range of nominal Type I error rates by specifying a vector of levels \RObj{alpha}. 


\item{\em Test statistics null distribution.} 
In the current implementation of \RObj{MTP}, the test statistics null distribution is estimated by default using the non-parametric version of bootstrap Procedure~\ref{anal:mult:proc:boot} (argument \RObj{nulldist="boot"}). 
The bootstrap procedure is implemented in the internal function \RObj{boot.resample}, which calls C to compute test statistics for each bootstrap sample.
The values of the shift ($\lambda_0$) and scale ($\tau_0$) parameters are determined by the type of test statistics (e.g., $\lambda_0=0$ and $\tau_0=1$ for $t$-statistics). 
Permutation null distributions are also available via \RObj{nulldist="perm"}.
The number of resampling steps is specified by the argument \RObj{B}, by default 1,000. 

\item{\em Multiple testing procedures.} 
Several methods for controlling the chosen error rate are available in \RPack{multtest}. 
\begin{itemize}
\item
{\em FWER-controlling procedures.}
For FWER control, the \RObj{MTP} function implements the single-step and step-down (common-cut-off) maxT and (common-quantile) minP MTPs, described in Sections~\ref{anal:mult:s:SS} -- \ref{anal:mult:s:SD}, and specified through the argument \RObj{method} (internal functions \RObj{ss.maxT}, \RObj{ss.minP}, \RObj{sd.maxT}, and \RObj{sd.minP}).
The default MTP is the single-step maxT procedure (\RObj{method="ss.maxT"}), since it requires the least computation.
\item 
{\em gFWER- and TPPFP-controlling augmentation procedures.}
As discussed in Section \ref{anal:mult:s:AMTP}, any FWER-controlling MTP can be trivially augmented to control additional Type I error rates, such as the gFWER, TPPFP, and FDR.
These AMTPs are implemented in a collection of functions, \RObj{fwer2gfwer}, \RObj{fwer2tppfp}, and \RObj{fwer2fdr}, that take FWER adjusted $p$-values as input and return augmentation adjusted $p$-values for the chosen error rate. 
The AMTPs can also be applied directly via the \RObj{typeone} argument of \RObj{MTP}, provided \texttt{get.adjp = TRUE}, i.e., adjusted $p$-values are available for an initial FWER-controlling procedure.
\end{itemize}

\item{\em Output control.} 
Various arguments are available to control output, i.e., specify which combination of the following quantities should be returned: 
confidence regions (argument \RObj{get.cr}); 
cut-offs for the test statistics (argument \RObj{get.cutoff}); 
adjusted $p$-values (argument \RObj{get.adjp}); 
test statistics bootstrap null distribution  (argument \RObj{keep.nulldist}). 
Note that parameter estimates and confidence regions can only be computed for the test of single-parameter null hypotheses. 
In addition, in the current implementation of \RObj{MTP}, parameter confidence regions and test statistic cut-offs can only be computed when \texttt{typeone="fwer"}, so that \RObj{get.cr} and \RObj{get.cutoff} should be set to \RObj{FALSE} when using the error rates gFWER, TPPFP, or FDR.

\end{description}

Note that the \RPack{multtest} package also provides several simple, marginal FWER-controlling MTPs, such as the Bonferroni, Holm \cite{Holm79}, Hochberg \cite{Hochberg88}, and \v{S}id\'{a}k \cite{Sidak67} procedures, and FDR-controlling MTPs, such as the Benjamini \& Hochberg \cite{Benjamini&Hochberg95} and Benjamini \& Yekutieli \cite{Benjamini&Yekutieli01} procedures. 
These procedures are available through the \RObj{mt.rawp2adjp} function, which takes a vector of unadjusted $p$-values as input and returns the corresponding adjusted $p$-values.\\


\noindent
{\bf  OUTPUT.}\\

The {\em S4 class/method object-oriented programming} approach was adopted to summarize the results of a MTP (Section \ref{anal:mult:s:design}). 
Specifically, the output of the \RObj{MTP} function is an object of {\em class} \RClass{MTP}. 
A brief description of the class and associated methods is given next. Please consult the documentation for details, \texttt{class ? MTP}. 

\begin{Schunk}
\begin{Sinput}
> slotNames("MTP")
\end{Sinput}
\begin{Soutput}
 [1] "statistic" "estimate"  "sampsize"  "rawp"      "adjp"      "conf.reg" 
 [7] "cutoff"    "nulldist"  "call"      "seed"     
\end{Soutput}
\end{Schunk}


\begin{description}

\item{\RObj{statistic}:} The numeric $M$--vector of test statistics, specified by the values of the \RObj{MTP} arguments \RObj{test}, \RObj{robust}, \RObj{standardize}, and \RObj{psi0}. In many testing problems, $M = J = $ \RObj{nrow(X)}.

\item{\RObj{estimate}:} For the test of single-parameter null hypotheses using $t$-statistics (i.e., not the $F$-tests), the numeric $M$--vector of estimated parameters.

\item{\RObj{sampsize}:} The sample size, i.e., $n=$ \RObj{ncol(X)}.

\item{\RObj{rawp}:} The numeric $M$--vector of unadjusted $p$-values.

\item{\RObj{adjp}:} The numeric $M$--vector of adjusted $p$-values, computed only if the \RObj{get.adjp} argument is \RObj{TRUE}.

\item{\RObj{conf.reg}:}  For the test of single-parameter null hypotheses using $t$-statistics (i.e., not the $F$-tests), the numeric $M \times 2 \times$ \RObj{length(alpha)} array of lower and upper simultaneous confidence limits for the parameter vector, for each value of the nominal Type I error rate \RObj{alpha}, computed only if the \RObj{get.cr} argument is \RObj{TRUE}. 

\item{\RObj{cutoff}:} The numeric $M \times$ \RObj{length(alpha)} matrix of cut-offs for the test statistics, for each value of the nominal Type I error rate \RObj{alpha}, computed only if the \RObj{get.cutoff} argument is \RObj{TRUE}.

\item{\RObj{nulldist}:} The numeric $M \times B$ matrix for the estimated test statistics null distribution, returned only if the \RObj{keep.nulldist} argument is \RObj{TRUE}.
By default (i.e., for \RObj{nulldist="boot"}), these are the null value centered and scaled bootstrap test statistics, as defined by Procedure~\ref{anal:mult:proc:boot}.

\item{\RObj{call}:} The call to the function \RObj{MTP}.

\item{\RObj{seed}:} The integer value of the seed used by the random number generator to create the resampled datasets. The seed can be set to this value in a repeat call to \RObj{MTP}.

\end{description}

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Numerical and graphical summaries}
\label{anal:mult:s:summaries}

The following {\em methods} are defined to operate on objects of class \RClass{MTP} and summarize the results of a MTP. 

\begin{description}

\item{\RObj{print}:} 
The \RObj{print} method returns a description of an object of class \RClass{MTP}, including 
the sample size $n$,
the number $M$ of tested hypotheses,
the type of test performed, 
the Type I error rate,
the nominal level of the test, 
the name of the MTP, 
the call to \RObj{MTP}.
In addition, this method produces a table with the class, mode, length, and dimension of each slot of the \RClass{MTP} instance. 

\item{\RObj{summary}:} 
The \RObj{summary} method provides numerical summaries of the results of a MTP and returns a list with the following three components.
\begin{itemize}
\item
A table with the number(s) of rejected hypotheses for the nominal Type I error rate(s) specified by the \RObj{alpha} argument of the function \RObj{MTP} 
(\RObj{NULL} values are returned if all three arguments \RObj{get.cr}, \RObj{get.cutoff}, and \RObj{get.adjp} are \RObj{FALSE}).
\item
A numeric $M$--vector of indices for ordering the hypotheses according to first \RObj{adjp}, then \RObj{rawp}, and finally the absolute value of \RObj{statistic} (not printed in the summary). 
\item
When applicable (i.e., when the corresponding quantities are returned by \RObj{MTP}), a table with six number summaries of the distributions of the adjusted $p$-values, unadjusted $p$-values, test statistics, and parameter estimates.
\end{itemize}

\item{\RObj{plot}:} 
The \RObj{plot} method produces the following graphical summaries of the results of a MTP. The type of display may be specified via the \RObj{which} argument.
\begin{itemize}
\item
Scatterplot of number of rejected hypotheses vs. nominal Type I error rate specified by the \RObj{alpha} argument of the function \RObj{MTP}.
\item
Plot of ordered adjusted $p$-values; can be viewed as a plot of Type I error rate vs. number of rejected hypotheses.
\item
Scatterplot of adjusted $p$-values vs. test statistics (``volcano plot'').
\item
Plot of adjusted $p$-values, in the original data order.  
\item
Plot of confidence regions for user-specified parameters, by default the 10 parameters corresponding to the smallest adjusted $p$-values.
\item
Plot of test statistic cut-offs for user-specified hypotheses, by default the 10 hypotheses corresponding to the smallest adjusted $p$-values.
\end{itemize}
The argument \RObj{logscale} (by default equal to \RObj{FALSE}) allows one to use the negative decimal logarithms of the adjusted $p$-values in the second, third, and fourth graphical displays.
Note that some of these plots are implemented in the older function \RObj{mt.plot}.


\item{\RObj{[}:} 
Subsets each slot to include only the specified hypotheses.

\item{\RObj{as.list}:} 
Converts an object of class \RClass{MTP} to an object of class \RClass{list}, with an entry for each slot. 

\end{description}


%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Software design}
\label{anal:mult:s:design}

The following features of the programming approach employed in \RPack{multtest} may be of interest to users, especially those interested in extending the functionality of the package. \\

\noindent
{\bf Closures.}  The use of {\em closures}, in the style of the \RPack{genefilter} package, allows uniform data input for all MTPs and facilitates the extension of the package's functionality by adding, for example, new types of test statistics. 
Specifically, for each value of the \RObj{MTP} argument \RObj{test}, a closure is defined which consists of a function for computing the test statistic (with only two arguments, a data vector \RObj{x} and a corresponding weight vector \RObj{w}, with default value of \RObj{NULL}) and its enclosing environment, with bindings for relevant additional arguments, such as null values \RObj{psi0}, outcomes \RObj{Y}, and covariates \RObj{Z}. 
This closure is first used to compute the vector of observed test statistics, and then, in each bootstrap iteration, to produce the estimated joint null distribution of the test statistics. 
Thus, new test statistics can be added to \RPack{multtest} by simply defining a new closure and adding a corresponding value for the \RObj{test} argument to \RObj{MTP} (existing internal test statistic functions are located in the file \texttt{R/statistics.R}).\\

\noindent
{\bf Class/method object-oriented programming.}  Like many other Bioconductor packages, \RPack{multtest}  has adopted the {\em S4 class/method object-oriented programming approach} of Chambers \cite{Chambers98}.
In particular, a new class, \RClass{MTP}, is defined to represent the results of multiple testing procedures, as implemented in the main \RObj{MTP} function. As discussed above in Section \ref{anal:mult:s:summaries}, several methods are provided to operate on objects of this class.\\

\noindent
{\bf Calls to C.} Because resampling procedures, such as the non-parametric bootstrap implemented in \RPack{multtest}, are computationally intensive, care must be taken to ensure that the resampling steps are not prohibitively slow. The use of closures for the test statistics, however, prevents writing the entire program in C. In the current implementation, we have chosen to define the closure and compute the observed test statistics in R, and then call C (using the R random number generator) to apply the closure to each bootstrap resampled dataset. This approach puts the for loop over bootstrap samples in the C environment, thus speeding up this computationally expensive part of the program. Further optimization for speed may be investigated for future releases. 


%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Application to ALL microarray dataset}
\label{anal:mult:s:appns}

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Association of expression measures and cytogenetic test status: two-sample $t$-statistics}

\paragraph{FWER control, two-sample $t$-statistics, bootstrap null distribution}
The phenotype data include an indicator variable, \RObj{cyto.normal}, for cytogenetic test status (normal vs. abnormal). To identify genes with higher mean expression measures in the abnormal compared to the normal cytogenetics subjects, one-sided two-sample $t$-tests can be performed. We choose to use the Welch $t$-statistic and to control the FWER using the bootstrap-based step-down minP procedure with $B=500$ bootstrap iterations.

\begin{Schunk}
\begin{Sinput}
> set.seed(999)
> cyto.boot <- MTP(X = exprs(filtALL), Y = pData(filtALL)$cyto.normal, 
+     alternative = "less", B = 500, method = "sd.minP")
\end{Sinput}
\begin{Soutput}
running bootstrap...
iteration = 100 200 300 400 500
\end{Soutput}
\end{Schunk}

Let us examine the results of the MTP stored in the object \RObj{cyto.boot} of class \RClass{MTP}. Please take the time to experiment with the object and associated methods.
\begin{Schunk}
\begin{Sinput}
> class(cyto.boot)
\end{Sinput}
\begin{Soutput}
[1] "MTP"
attr(,"package")
[1] "multtest"
\end{Soutput}
\begin{Sinput}
> slotNames(cyto.boot)
\end{Sinput}
\begin{Soutput}
 [1] "statistic" "estimate"  "sampsize"  "rawp"      "adjp"      "conf.reg" 
 [7] "cutoff"    "nulldist"  "call"      "seed"     
\end{Soutput}
\begin{Sinput}
> print(cyto.boot)
\end{Sinput}
\begin{Soutput}
        Bootstrap-based Multiple Testing

Object of class:  MTP
sample size = 128 
number of hypotheses = 431 

test statistics = t.twosamp.unequalvar 
type I error rate = fwer 
nominal level alpha = 0.05 
multiple testing procedure = sd.minP 

Call: MTP(X = exprs(filtALL), Y = pData(filtALL)$cyto.normal, alternative = "less", 
    B = 500, method = "sd.minP")

Slots: 
            Class    Mode Length Dimension
statistic numeric numeric    431          
estimate  numeric numeric    431          
sampsize  numeric numeric      1          
rawp      numeric numeric    431          
adjp      numeric numeric    431          
conf.reg    array logical      0     0,0,0
cutoff     matrix logical      0       0,0
nulldist   matrix logical      0       0,0
call         call    call      6          
seed      integer numeric    626          
\end{Soutput}
\begin{Sinput}
> summ <- summary(cyto.boot)
\end{Sinput}
\begin{Soutput}
MTP:  sd.minP 
Type I error rate:  fwer 

  Level Rejections
1  0.05          7

            Min. 1st Qu.    Median     Mean 3rd Qu.   Max.
adjp       0.000  1.0000  1.000000  0.95970  1.0000 1.0000
rawp       0.000  0.1590  0.470000  0.47610  0.7700 1.0000
statistic -2.922 -0.7589 -0.038150 -0.11920  0.5946 2.5620
estimate  -1.083 -0.2015 -0.008719 -0.03249  0.1467 0.6709
\end{Soutput}
\begin{Sinput}
> class(summ)
\end{Sinput}
\begin{Soutput}
[1] "list"
\end{Soutput}
\begin{Sinput}
> names(summ)
\end{Sinput}
\begin{Soutput}
[1] "rejections" "index"      "summaries" 
\end{Soutput}
\begin{Sinput}
> summ
\end{Sinput}
\begin{Soutput}
$rejections
  Level Rejections
1  0.05          7

$index
  [1] 283 104 365 241 320 339 351  26 243 292  81 389 132  86 216 428 159 178
 [19] 228 197 325 310 240 274 261  82 397 143 385 164 298 368 110 142  25   5
 [37] 193 308 387 279 191 257 258 392 217 300 252  45 394 100 171  42 211   3
 [55]  13 107 361  67 337 231  90 117  34 367 170 379 205 373  36 324 138 386
 [73]  79 248 249   9 234 395 118 329  46 207 202 403 374 358 350 285 331 182
 [91] 184 194  80  98 405 294  15 140  50 340 147 148 313 174 253 119 206 237
[109] 267 400 247  28  61 136 106 295  78  76 333  73 169 259  99 233 321 343
[127] 408 230 264  32  33 214  65 381  68 410 383  24 250 125 357  62 335 375
[145] 417 377 157 376 180 348 415 315 431  48 286 288 362  53 198 314  77 139
[163] 396 213 201 134   2  74 326  17 398 382 273 289 236 144  75 266 271 390
[181] 332 424  94 372   6 312 129 364 307 296 173   8 146 179 305 254 272 260
[199] 303 276 297 418 413 238  31 353  14 411  84 177  16 251 277 232 175  19
[217] 270  35 359 200  29 131 281 158 222  30 225 404 338 406 284  59 220 327
[235]  60 168 153 208 306 422 263 416 423 172  88 371  58 290 302 160  85  70
[253]  96 204 203 141 255 293 399 291 187 209 155 152 156 380 219 102 407 120
[271] 150 137 154  40 127 126 265 188 122 419 309 421 282  23 369  27 402 318
[289]  49 239  39 121 355 363 244 227 304 223  97  41 334 101  63 186 163 344
[307]  92 111 145  93 425 356  87 317 166  56 162 354 426  47 105  38 391  91
[325]  22  18 196 226  21 409 352 427 378 114 278 299 151 336 256 287  57  69
[343]   7 189 165 218 429 349  55 301 366 224  54 116 109 185 269  66 124  51
[361] 316 115  44 268 183  64 346 262 176  95 412  20 181 108 128  89 199 212
[379] 149 280 388 130 341 235 401 113 245 328   4 347 123 384 414  37 311 135
[397] 330 161 246 430 195  12  43 420 229 319  83  52  11 112 190 393   1 133
[415]  10 221 322 370 360  71 345 342 167  72 242 103 275 192 215 210 323

$summaries
            Min. 1st Qu.    Median     Mean 3rd Qu.   Max.
adjp       0.000  1.0000  1.000000  0.95970  1.0000 1.0000
rawp       0.000  0.1590  0.470000  0.47610  0.7700 1.0000
statistic -2.922 -0.7589 -0.038150 -0.11920  0.5946 2.5620
estimate  -1.083 -0.2015 -0.008719 -0.03249  0.1467 0.6709
\end{Soutput}
\end{Schunk}

To obtain a list of genes with adjusted $p$-values less than 0.05,
\begin{Schunk}
\begin{Sinput}
> sum(cyto.boot@adjp <= 0.05)
\end{Sinput}
\begin{Soutput}
[1] 7
\end{Soutput}
\begin{Sinput}
> genes <- geneNames(filtALL)[cyto.boot@adjp <= 0.05]
> genes
\end{Sinput}
\begin{Soutput}
[1] "33232_at"   "37539_at"   "38578_at"   "39712_at"   "40202_at"  
[6] "40607_at"   "40888_f_at"
\end{Soutput}
\end{Schunk}

Functions from the \RPack{annotate} package may be used to retrieve PubMed abstracs for each of these genes and generate an HTML table with links to PubMed.
\begin{Schunk}
\begin{Sinput}
> hgu95av2()
\end{Sinput}
\begin{Soutput}
Quality control information for  hgu95av2 
Date built:  Wed Mar  3 20:53:22 2004  
Number of probes: 12625 
Probe number missmatch: None 
Probe missmatch: None 
Mappings found for probe based rda files: 
         hgu95av2ACCNUM found 12625 of 12625
         hgu95av2CHRLOC found 11425 of 12625
         hgu95av2CHR found 12145 of 12625
         hgu95av2ENZYME found 1642 of 12625
         hgu95av2GENENAME found 12172 of 12625
         hgu95av2GO found 9579 of 12625
         hgu95av2GRIF found 6706 of 12625
         hgu95av2HGID found 10968 of 12625
         hgu95av2LOCUSID found 12282 of 12625
         hgu95av2MAP found 12107 of 12625
         hgu95av2NM found 11276 of 12625
         hgu95av2NP found 11276 of 12625
         hgu95av2OMIM found 9583 of 12625
         hgu95av2PATH found 2258 of 12625
         hgu95av2PMID found 12058 of 12625
         hgu95av2SUMFUNC found 185 of 12625
         hgu95av2SYMBOL found 12172 of 12625
         hgu95av2UNIGENE found 11881 of 12625 
Mappings found for non-probe based rda files:
         hgu95av2ENZYME2PROBE found 566
         hgu95av2GO2ALLPROBES found 4661
         hgu95av2GO2PROBE found 3392
         hgu95av2PATH2PROBE found 124
         hgu95av2PMID2PROBE found 50570 
\end{Soutput}
\begin{Sinput}
> mget(genes, envir = hgu95av2GENENAME)
\end{Sinput}
\begin{Soutput}
$"33232_at"
[1] " cysteine-rich protein 1 (intestinal)"

$"37539_at"
[1] " RalGDS-like gene"

$"38578_at"
[1] " tumor necrosis factor receptor superfamily, member 7"

$"39712_at"
[1] " S100 calcium binding protein A13"

$"40202_at"
[1] " basic transcription element binding protein 1"

$"40607_at"
[1] " dihydropyrimidinase-like 2"

$"40888_f_at"
[1] " eukaryotic translation elongation factor 1 alpha 1"
\end{Soutput}
\begin{Sinput}
> mget(genes, envir = hgu95av2MAP)
\end{Sinput}
\begin{Soutput}
$"33232_at"
[1] "7q11.23"

$"37539_at"
[1] "1q25.2"

$"38578_at"
[1] "12p13"

$"39712_at"
[1] "1q21"

$"40202_at"
[1] "9q13"

$"40607_at"
[1] "8p22-p21"

$"40888_f_at"
[1] "6q14.1"
\end{Soutput}
\begin{Sinput}
> mget(genes, envir = hgu95av2PMID)
\end{Sinput}
\begin{Soutput}
$"33232_at"
[1] "12477932" "9480758"  "9126610"  "7999070" 

$"37539_at"
[1] "10760592" "10231032"

$"38578_at"
 [1] "14702039" "12743427" "12685844" "12477932" "12464570" "12324477"
 [7] "12197885" "12009595" "11976819" "11801693" "11062504" "9582383" 
[13] "9177220"  "8530100"  "7479974"  "2837508"  "2442250"  "1655907" 

$"39712_at"
[1] "14702039" "12645008" "12477932" "10722710" "8985590"  "8878558"  "7759097" 

$"40202_at"
[1] "8291025" "8257632" "8051167" "1356762"

$"40607_at"
[1] "14702039" "12951196" "12482610" "12477932" "11771764" "8973361"  "8640231" 
[8] "7637782" 

$"40888_f_at"
 [1] "14702039" "14623968" "12594214" "12477932" "12426393" "11689615"
 [7] "10364286" "9465091"  "8812466"  "8626763"  "7542776"  "3570288" 
[13] "3512269"  "3346208"  "2564392"  "2183196" 
\end{Soutput}
\end{Schunk}

\begin{Schunk}
\begin{Sinput}
> allAbsts <- pm.getabst(genes, "hgu95av2")
> pm.titles(allAbsts)
\end{Sinput}
\begin{Soutput}
[[1]]
[1] "Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences."                                                           
[2] "Mapping of the human cysteine-rich intestinal protein gene CRIP1 to the human chromosomal segment 7q11.23."                                                
[3] "Human cysteine-rich intestinal protein: cDNA cloning and expression of recombinant protein and identification in human peripheral blood mononuclear cells."
[4] "Isolation and characterization of a cDNA that codes for a LIM-containing protein which is developmentally regulated in heart."                             

[[2]]
[1] "The human RGL (RalGDS-like) gene: cloning, expression analysis and genomic organization."                                                                              
[2] "Prediction of the coding sequences of unidentified human genes. XIII. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro."

[[3]]
 [1] "Complete sequencing and characterization of 21,243 full-length human cDNAs."                                                                                                                       
 [2] "Serum soluble CD27, but not thymidine kinase, is an independent prognostic factor for outcome in indolent non-Hodgkin's lymphoma."                                                                 
 [3] "CD148 and CD27 are expressed in B cell lymphomas derived from both memory and naïve B cells."                                                                                                     
 [4] "Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences."                                                                                                   
 [5] "Failing immune control as a result of impaired CD8+ T-cell maturation: CD27 might provide a clue."                                                                                                 
 [6] "CD27 and CD40 inhibit p53-independent mitochondrial pathways in apoptosis of B cells induced by B cell receptor ligation."                                                                         
 [7] "IL-10 enhances B-cell IgE synthesis by promoting differentiation into plasma cells, a process that is inhibited by CD27/CD70 interaction."                                                         
 [8] "CD27 but not CD70 and 4-1BB intragraft gene expression is a risk factor for acute cardiac allograft rejection in humans."                                                                          
 [9] "Increased levels of soluble CD27 in the cerebrospinal fluid are not diagnostic for leptomeningeal involvement by lymphoid malignancies."                                                           
[10] "Lack of CD27-CD45RA-V gamma 9V delta 2+ T cell effectors in immunocompromised hosts and during active pulmonary tuberculosis."                                                                     
[11] "CD27 is required for generation and long-term maintenance of T cell immunity."                                                                                                                     
[12] "CD27, a member of the tumor necrosis factor receptor superfamily, activates NF-kappaB and stress-activated protein kinase/c-Jun N-terminal kinase via TRAF2, TRAF5, and NF-kappaB-inducing kinase."
[13] "CD27, a member of the tumor necrosis factor receptor family, induces apoptosis and binds to Siva, a proapoptotic protein."                                                                         
[14] "Isolation and regional assignment of human chromosome 12p cDNAs."                                                                                                                                  
[15] "CD27-CD70 interactions regulate B-cell activation by T cells."                                                                                                                                     
[16] "S152 (CD27). A modulating disulfide-linked T cell activation antigen."                                                                                                                             
[17] "Tissue distribution and biochemical and functional properties of Tp55 (CD27), a novel T cell differentiation antigen."                                                                             
[18] "The T cell activation antigen CD27 is a member of the nerve growth factor/tumor necrosis factor receptor gene family."                                                                             

[[4]]
[1] "Complete sequencing and characterization of 21,243 full-length human cDNAs."                                                                                          
[2] "Differential expression of S100 proteins in the developing human hippocampus and temporal cortex."                                                                    
[3] "Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences."                                                                      
[4] "S100A13. Biochemical characterization and subcellular localization in different cell lines."                                                                          
[5] "Characterization of the human S100A12 (calgranulin C, p6, CAAF1, CGRP) gene, a new member of the S100 gene cluster on chromosome 1q21."                               
[6] "Characterization of the human and mouse cDNAs coding for S100A13, a new member of the S100 protein family."                                                           
[7] "Isolation of a YAC clone covering a cluster of nine S100 genes on human chromosome 1q21: rationale for a new nomenclature of the S100 calcium-binding protein family."

[[5]]
[1] "Chromosomal localization and cDNA sequence of human BTEB, a GC box binding protein."                                                           
[2] "Activation of the human immunodeficiency virus type 1 long terminal repeat by BTEB, a GC box-binding transcription factor."                    
[3] "Cell-specific translational control of transcription factor BTEB expression. The role of an upstream AUG in the 5'-untranslated region."       
[4] "Two regulatory proteins that bind to the basic transcription element (BTE), a GC box sequence in the promoter region of the rat P-4501A1 gene."

[[6]]
[1] "Complete sequencing and characterization of 21,243 full-length human cDNAs."                                                                  
[2] "No association between the dihydropyrimidinase-related protein 2 (DRP-2) gene and bipolar disorder in humans."                                
[3] "p80 ROKalpha binding protein is a novel splice variant of CRMP-1 which associates with CRMP-2 and modulates RhoA-induced neuronal morphology."
[4] "Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences."                                              
[5] "Aberrant expression of dihydropyrimidinase related proteins-2,-3 and -4 in fetal Down syndrome brain."                                        
[6] "A novel gene family defined by human dihydropyrimidinase and three related proteins with differential tissue distribution."                   
[7] "Characterization of DRP2, a novel human dystrophin homologue."                                                                                
[8] "Collapsin-induced growth cone collapse mediated by an intracellular protein related to UNC-33."                                               

[[7]]
 [1] "Complete sequencing and characterization of 21,243 full-length human cDNAs."                                                                           
 [2] "Translationally controlled tumor protein acts as a guanine nucleotide dissociation inhibitor on the translation elongation factor eEF1A."              
 [3] "The fragile X mental retardation protein FMRP binds elongation factor 1A mRNA and negatively regulates its translation in vivo."                       
 [4] "Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences."                                                       
 [5] "Exportin-5-mediated nuclear export of eukaryotic elongation factor 1A and tRNA."                                                                       
 [6] "Functional interactions of human immunodeficiency virus type 1 integrase with human and yeast HSP60."                                                  
 [7] "Translation elongation factor 1-alpha interacts specifically with the human immunodeficiency virus type 1 Gag polyprotein."                            
 [8] "Antisense inhibition of the PTI-1 oncogene reverses cancer phenotypes."                                                                                
 [9] "Assignment of human elongation factor 1alpha genes: EEF1A maps to chromosome 6q14 and EEF1A2 to 20q13.3."                                              
[10] "Identification of a group of cellular cofactors that stimulate the binding of RNA polymerase II and TRP-185 to human immunodeficiency virus 1 TAR RNA."
[11] "Identification of the human prostatic carcinoma oncogene PTI-1 by rapid expression cloning and differential RNA display."                              
[12] "Human elongation factor 1 alpha: a polymorphic and conserved multigene family with multiple chromosomal localizations."                                
[13] "The primary structure of the alpha subunit of human elongation factor 1. Structural aspects of guanine-nucleotide-binding sites."                      
[14] "Retinol-regulated gene expression in human tracheobronchial epithelial cells. Enhanced expression of elongation factor EF-1 alpha."                    
[15] "Isolation and characterization of the human chromosomal gene for polypeptide chain elongation factor-1 alpha."                                         
[16] "Retropseudogenes constitute the major part of the human elongation factor 1 alpha gene family."                                                        
\end{Soutput}
\begin{Sinput}
> oneAbst <- allAbsts[[1]][[1]]
> class(oneAbst)
\end{Sinput}
\begin{Soutput}
[1] "pubMedAbst"
attr(,"package")
[1] "annotate"
\end{Soutput}
\begin{Sinput}
> slotNames(oneAbst)
\end{Sinput}
\begin{Soutput}
[1] "pmid"         "authors"      "abstText"     "articleTitle" "journal"     
[6] "pubDate"     
\end{Soutput}
\begin{Sinput}
> oneAbst
\end{Sinput}
\begin{Soutput}
An object of class 'pubMedAbst':
Title:   Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. 
PMID:    12477932 
Authors: RL Strausberg, EA Feingold, LH Grouse, JG Derge, RD Klausner, FS Collins, L Wagner, CM Shenmen, GD Schuler, SF Altschul, B Zeeberg, KH Buetow, CF Schaefer, NK Bhat, RF Hopkins, H Jordan, T Moore, SI Max, J Wang, F Hsieh, L Diatchenko, K Marusina, AA Farmer, GM Rubin, L Hong, M Stapleton, MB Soares, MF Bonaldo, TL Casavant, TE Scheetz, MJ Brownstein, TB Usdin, S Toshiyuki, P Carninci, C Prange, SS Raha, NA Loquellano, GJ Peters, RD Abramson, SJ Mullahy, SA Bosak, PJ McEwan, KJ McKernan, JA Malek, PH Gunaratne, S Richards, KC Worley, S Hale, AM Garcia, LJ Gay, SW Hulyk, DK Villalon, DM Muzny, EJ Sodergren, X Lu, RA Gibbs, J Fahey, E Helton, M Ketteman, A Madan, S Rodrigues, A Sanchez, M Whiting, A Madan, AC Young, Y Shevchenko, GG Bouffard, RW Blakesley, JW Touchman, ED Green, MC Dickson, AC Rodriguez, J Grimwood, J Schmutz, RM Myers, YS Butterfield, MI Krzywinski, U Skalska, DE Smailus, A Schnerch, JE Schein, SJ Jones, MA Marra, M LastName 
Journal: Proc Natl Acad Sci U S A 
Date:    Dec 2002 
\end{Soutput}
\begin{Sinput}
> abstText(oneAbst)
\end{Sinput}
\begin{Soutput}
[1] "The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov)."
\end{Soutput}
\begin{Sinput}
> allAbsts2 <- NULL
> for (l in allAbsts) allAbsts2 <- c(allAbsts2, l)
> pmAbst2HTML(allAbsts2, filename = "cytoBoot", title = "Chiaretti et al. (2004) ALL dataset: FWER-controlling bootstrap step-down minP MTP, cytogenetic test status", 
+     frames = TRUE)
\end{Sinput}
\end{Schunk}

Graphical summaries of the results may be obtained using the \RObj{plot} method (Figure \ref{f:cytoPlot}).
\begin{Schunk}
\begin{Sinput}
> par(mfrow = c(2, 2))
> plot(cyto.boot)
\end{Sinput}
\end{Schunk}

\begin{figure}
\begin{center}
\includegraphics[width=3in,height=3in,angle=0]{cytoPlot}
\end{center}
\caption{
{\em Graphical summaries of FWER-controlling step-down minP MTP --- Cytogenetic test status.} By default, four plots are produced by the \RFunc{plot} method for instances of the class \RClass{MTP}.}
\protect\label{f:cytoPlot}
\end{figure}

Given a vector of unadjusted $p$-values, the \RObj{mt.rawp2adjp} function computes adjusted $p$-values for the marginal FWER-controlling MTPs of Bonferroni, Holm \cite{Holm79}, Hochberg \cite{Hochberg88}, and  $\check{\rm S}$id\'{a}k \cite{Sidak67}, discussed in detail in Dudoit et al. \cite{DudoitetalStatSci03}.
\begin{Schunk}
\begin{Sinput}
> cyto.marg <- mt.rawp2adjp(rawp = cyto.boot@rawp, proc = c("Bonferroni", 
+     "Holm", "Hochberg", "SidakSS", "SidakSD"))
> compFWER <- cbind(cyto.boot@adjp, cyto.marg$adjp[order(cyto.marg$index), 
+     -1])
\end{Sinput}
\end{Schunk}

The \RObj{mt.plot} function may be used to compare the different procedures in terms of their adjusted $p$-values
\begin{Schunk}
\begin{Sinput}
> par(mfrow = c(1, 1))
> mt.plot(adjp = compFWER, teststat = cyto.boot@statistic, proc = c("SD minP", 
+     "Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD"), 
+     leg = c(0.1, 400), col = 1:6, lty = 1:6, lwd = 2)
> title("Comparison of FWER-controlling marginal MTPs and step-down minP MTP")
\end{Sinput}
\end{Schunk}

In this dataset, most of the FWER-controlling MTPs perform similarly, making very few rejections at nominal Type I error rates near zero. As expected, the bootstrap-based step-down minP method, which takes into account the joint distribution of the test statistics,  makes slightly more rejections than the other methods at higher allowed error rates (Figure \ref{f:mtplot}).

\begin{figure}
\begin{center}
\includegraphics[width=3in,height=3in,angle=0]{mtplot}
\end{center}
\caption{
{\em Marginal vs. joint FWER-controlling MTPs  --- Cytogenetic test status.} Plot of number of rejected hypotheses vs. nominal Type I error rate for comparing  bootstrap-based FWER-controlling marginal MTPs and step-down minP MTP.}
\protect\label{f:mtplot}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%
\paragraph{FWER control, two-sample $t$-statistics, permutation null distribution}

Because the sample sizes are not equal for the two cytogenetic groups, we expect the bootstrap and permutation null distributions to yield different sets of rejected hypotheses \cite{Pollard&vdLaanJSPI04}. To compare the two approaches, we apply the permutation-based step-down minP procedure, first using the old \RObj{mt.minP} function and then using the new \RObj{MTP} function (which calls \RObj{mt.minP}). 
Please note that while the \RFunc{MTP} and \RFunc{mt.minP} functions produce the same results, these are presented in a different manner. In particular, for the newer function \RFunc{MTP}, the results (e.g., test statistics, parameter estimates, unadjusted $p$-values, adjusted $p$-values, cut-offs) are given in the orginal order of the null hypotheses, while in the older function \RFunc{mt.minP}, the hypotheses are sorted  first according to their adjusted $p$-values, next their unadjusted $p$-values, and finally their test statistics. 
In addition, the function \RFunc{MTP} implements a broader range of MTPs and has adopted the S4 class/method design for representing and summarizing the results of a MTP.

\begin{Schunk}
\begin{Sinput}
> NAs <- is.na(pData(filtALL)$cyto.normal)
> set.seed(999)
> cyto.perm.old <- mt.minP(X = exprs(filtALL)[, !NAs], classlabel = pData(filtALL)$cyto.normal[!NAs], 
+     side = "lower", B = 500)
\end{Sinput}
\begin{Soutput}
B=500
b=5     b=10    b=15    b=20    b=25    b=30    b=35    b=40    b=45    b=50    
b=55    b=60    b=65    b=70    b=75    b=80    b=85    b=90    b=95    b=100   
b=105   b=110   b=115   b=120   b=125   b=130   b=135   b=140   b=145   b=150   
b=155   b=160   b=165   b=170   b=175   b=180   b=185   b=190   b=195   b=200   
b=205   b=210   b=215   b=220   b=225   b=230   b=235   b=240   b=245   b=250   
b=255   b=260   b=265   b=270   b=275   b=280   b=285   b=290   b=295   b=300   
b=305   b=310   b=315   b=320   b=325   b=330   b=335   b=340   b=345   b=350   
b=355   b=360   b=365   b=370   b=375   b=380   b=385   b=390   b=395   b=400   
b=405   b=410   b=415   b=420   b=425   b=430   b=435   b=440   b=445   b=450   
b=455   b=460   b=465   b=470   b=475   b=480   b=485   b=490   b=495   b=500   
r=4     r=8     r=12    r=16    r=20    r=24    r=28    r=32    r=36    r=40    
r=44    r=48    r=52    r=56    r=60    r=64    r=68    r=72    r=76    r=80    
r=84    r=88    r=92    r=96    r=100   r=104   r=108   r=112   r=116   r=120   
r=124   r=128   r=132   r=136   r=140   r=144   r=148   r=152   r=156   r=160   
r=164   r=168   r=172   r=176   r=180   r=184   r=188   r=192   r=196   r=200   
r=204   r=208   r=212   r=216   r=220   r=224   r=228   r=232   r=236   r=240   
r=244   r=248   r=252   r=256   r=260   r=264   r=268   r=272   r=276   r=280   
r=284   r=288   r=292   r=296   r=300   r=304   r=308   r=312   r=316   r=320   
r=324   r=328   r=332   r=336   r=340   r=344   r=348   r=352   r=356   r=360   
r=364   r=368   r=372   r=376   r=380   r=384   r=388   r=392   r=396   r=400   
r=404   r=408   r=412   r=416   r=420   r=424   r=428   
\end{Soutput}
\begin{Sinput}
> names(cyto.perm.old)
\end{Sinput}
\begin{Soutput}
[1] "index"    "teststat" "rawp"     "adjp"     "plower"  
\end{Soutput}
\begin{Sinput}
> sum(cyto.perm.old$adjp < 0.05)
\end{Sinput}
\begin{Soutput}
[1] 0
\end{Soutput}
\end{Schunk}

\begin{Schunk}
\begin{Sinput}
> set.seed(999)
> cyto.perm.new <- MTP(X = exprs(filtALL), Y = pData(filtALL)$cyto.normal, 
+     alternative = "less", nulldist = "perm", B = 500, method = "sd.minP")
\end{Sinput}
\begin{Soutput}
B=500
b=5     b=10    b=15    
b=20    b=25    b=30    b=35    b=40    b=45    b=50    b=55    b=60    b=65    
b=70    b=75    b=80    b=85    b=90    b=95    b=100   b=105   b=110   b=115   
b=120   b=125   b=130   b=135   b=140   b=145   b=150   b=155   b=160   b=165   
b=170   b=175   b=180   b=185   b=190   b=195   b=200   b=205   b=210   b=215   
b=220   b=225   b=230   b=235   b=240   b=245   b=250   b=255   b=260   b=265   
b=270   b=275   b=280   b=285   b=290   b=295   b=300   b=305   b=310   b=315   
b=320   b=325   b=330   b=335   b=340   b=345   b=350   b=355   b=360   b=365   
b=370   b=375   b=380   b=385   b=390   b=395   b=400   b=405   b=410   b=415   
b=420   b=425   b=430   b=435   b=440   b=445   b=450   b=455   b=460   b=465   
b=470   b=475   b=480   b=485   b=490   b=495   b=500   r=4     r=8     r=12    
r=16    r=20    r=24    r=28    r=32    r=36    r=40    r=44    r=48    r=52    
r=56    r=60    r=64    r=68    r=72    r=76    r=80    r=84    r=88    r=92    
r=96    r=100   r=104   r=108   r=112   r=116   r=120   r=124   r=128   r=132   
r=136   r=140   r=144   r=148   r=152   r=156   r=160   r=164   r=168   r=172   
r=176   r=180   r=184   r=188   r=192   r=196   r=200   r=204   r=208   r=212   
r=216   r=220   r=224   r=228   r=232   r=236   r=240   r=244   r=248   r=252   
r=256   r=260   r=264   r=268   r=272   r=276   r=280   r=284   r=288   r=292   
r=296   r=300   r=304   r=308   r=312   r=316   r=320   r=324   r=328   r=332   
r=336   r=340   r=344   r=348   r=352   r=356   r=360   r=364   r=368   r=372   
r=376   r=380   r=384   r=388   r=392   r=396   r=400   r=404   r=408   r=412   
r=416   r=420   r=424   r=428   
\end{Soutput}
\begin{Sinput}
> summary(cyto.perm.new)
\end{Sinput}
\begin{Soutput}
MTP:  sd.minP 
Type I error rate:  fwer 

  Level Rejections
1  0.05          0

            Min. 1st Qu.   Median    Mean 3rd Qu.  Max.
adjp       0.406  1.0000  1.00000  0.9920  1.0000 1.000
rawp       0.002  0.2210  0.49800  0.4848  0.7300 0.998
statistic -2.922 -0.7589 -0.03815 -0.1192  0.5946 2.562
estimate      NA      NA       NA     NaN      NA    NA
\end{Soutput}
\end{Schunk}

The permutation null distribution version of the step-down minP procedure rejects no hypotheses at nominal FWER level $\alpha=0.05$. 


%%%%%%%%%%%%%%%%%%%%%%%%%
\paragraph{FWER control, robust two-sample $t$-statistics, bootstrap null distribution}

The Wilcoxon rank sum test (also known as the Mann-Whitney test) is a robust alternative to the usual two-sample $t$-test. 

\begin{Schunk}
\begin{Sinput}
> set.seed(999)
> cyto.wilcox <- MTP(X = exprs(filtALL), Y = pData(filtALL)$cyto.normal, 
+     robust = TRUE, alternative = "less", B = 500, method = "sd.minP")
\end{Sinput}
\begin{Soutput}
running bootstrap...
iteration = 100 200 300 400 500
\end{Soutput}
\begin{Sinput}
> sum(cyto.wilcox@adjp < 0.05)
\end{Sinput}
\begin{Soutput}
[1] 4
\end{Soutput}
\begin{Sinput}
> sum(cyto.wilcox@adjp < 0.05 & cyto.boot@adjp < 0.05)
\end{Sinput}
\begin{Soutput}
[1] 3
\end{Soutput}
\end{Schunk}

The step-down minP MTP based on the Wilcoxon test statistic identifies fewer genes than the same MTP based on the less robust $t$-statistic.

%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Augmentation procedures for gFWER, TPPFP and FDR control, two-sample $t$-statistics, bootstrap null distribution}

In the context of microarray gene expression data analysis, one is often willing to accept some false positives, provided their number is small in comparison to the number of rejected hypotheses.
In this case, the FWER is not an appropriate choice of Type I error rate and one should consider other rates that lead to larger sets of rejected hypotheses.
The augmentation procedures implemented in \RObj{MTP} allow one to reject additional hypotheses, while controlling an error rate such as the generalized family-wise error rate (gFWER), the tail probability of the proportion of false positives (TPPFP), or the false discovery rate (FDR). 
We illustrate the use of the \RObj{fwer2gfwer} and \RObj{fwer2tppfp} functions, but note that the gFWER and TPPFP can also be controlled directly and more straightforwardly using the \RObj{MTP} function with appropiate choices of arguments \RObj{typeone}, \RObj{k}, and \RObj{q}  (Figures \ref{f:cytogfwer} and \ref{f:cytotppfp}).

\begin{Schunk}
\begin{Sinput}
> k <- c(10, 50, 100)
> cyto.gfwer <- fwer2gfwer(adjp = cyto.boot@adjp, k = k)
> comp.gfwer <- cbind(cyto.boot@adjp, cyto.gfwer)
> mtps <- paste("gFWER(", c(0, k), ")", sep = "")
> mt.plot(adjp = comp.gfwer, teststat = cyto.boot@statistic, proc = mtps, 
+     leg = c(0.1, 400), col = 1:4, lty = 1:4, lwd = 2)
> title("Comparison of gFWER(k)-controlling AMTPs based on SD minP")
\end{Sinput}
\end{Schunk}

\begin{Schunk}
\begin{Sinput}
> q <- c(0.05, 0.1, 0.5)
> cyto.tppfp <- fwer2tppfp(adjp = cyto.boot@adjp, q = q)
> comp.tppfp <- cbind(cyto.boot@adjp, cyto.tppfp)
> mtps <- c("FWER", paste("TPPFP(", q, ")", sep = ""))
> mt.plot(adjp = comp.tppfp, teststat = cyto.boot@statistic, proc = mtps, 
+     leg = c(0.1, 400), col = 1:4, lty = 1:4, lwd = 2)
> title("Comparison of TPPFP(q)-controlling AMTPs based on SD minP")
\end{Sinput}
\end{Schunk}

Examine numerical and graphical summaries of the MTP to assess the impact of the choice of Type I error rate.

Next, we compare the (conservative) FDR-controlling MTP implemented in the \RObj{MTP} function and discussed in van der Laan et al. \cite{vdLaanetalMT3SAGMB04}, to the marginal Benjamini \& Hochberg \cite{Benjamini&Hochberg95} and Benjamini \& Yekutieli \cite{Benjamini&Yekutieli01} procedures (Figure \ref{f:cytofdr}).

\begin{Schunk}
\begin{Sinput}
> set.seed(999)
> cyto.fdr <- MTP(X = exprs(filtALL), Y = pData(filtALL)$cyto.normal, 
+     alternative = "less", typeone = "fdr", B = 500, method = "sd.minP")
\end{Sinput}
\begin{Soutput}
running bootstrap...
iteration = 100 200 300 400 500
\end{Soutput}
\begin{Sinput}
> cyto.fdr.marg <- mt.rawp2adjp(rawp = cyto.fdr@rawp, proc = c("BY", 
+     "BH"))
> comp.fdr <- cbind(cyto.fdr@adjp, cyto.fdr.marg$adjp[order(cyto.fdr.marg$index), 
+     -1])
> mt.plot(adjp = comp.fdr, teststat = cyto.fdr@statistic, proc = c("AMTP", 
+     "BY", "BH"), leg = c(0.1, 400), col = 1:3, lty = 1:3, lwd = 2)
> title("Comparison of FDR-controlling MTPs")
\end{Sinput}
\end{Schunk}

We see that for controlling the FDR at nominal levels in the range of 0 to 0.1, all three methods are similar. For larger nominal error rates, the Benjamini \& Hochberg method rejects the most hypotheses, followed by the augmentation FDR method. The conservative Benjamini \& Yekutieli MTP  rejects the smallest number of hypotheses. 

%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{figure}
\begin{center}
\includegraphics[width=3in,height=3in,angle=0]{cytogfwer}
\end{center}
\caption{
{\em gFWER-controlling AMTPs --- Cytogenetic test status.} Plot of number of rejected hypotheses vs. nominal Type I error rate for comparing gFWER-controlling AMTPs, based on the FWER-controlling bootstrap step-down minP procedure, with different allowed numbers $k$ of false positives. }
\protect\label{f:cytogfwer}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=3in,height=3in,angle=0]{cytotppfp}
\end{center}
\caption{
{\em TPPFP-controlling AMTPs --- Cytogenetic test status.} Plot of number of rejected hypotheses vs. nominal Type I error rate for comparing TPPFP-controlling AMTPs, based on the FWER-controlling bootstrap step-down minP procedure, with different allowed proportions $q$ of false positives. }
\protect\label{f:cytotppfp}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=3in,height=3in,angle=0]{cytofdr}
\end{center}
\caption{
{\em FDR-controlling MTPs --- Cytogenetic test status.} Plot of number of rejected hypotheses vs. nominal Type I error rate for comparing three FDR-controlling MTPs.}
\protect\label{f:cytofdr}
\end{figure}



%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Association of expression measures and tumor molecular subtype: multi-sample $F$-statistics}

To identify genes with differences in mean expression measures between different tumor molecular subtypes (BCR/ABL, NEG, ALL1/AF4, E2A/PBX1, p15/p16,  NUP-98), one can perform a family of $F$-tests. 
Tumor subtypes with fewer than 10 subjects are merged into one group. 
We compute adjusted $p$-values and cut-offs using the bootstrap-based single-step maxT procedure to control FWER at levels 0.01 and 0.1. We use only $B=100$ bootstrap iterations for speed in this example (more are recommended in practice).

\begin{Schunk}
\begin{Sinput}
> set.seed(999)
> mb <- as.character(pData(filtALL)$mol.biol)
> other <- c("E2A/PBX1", "NUP-98", "p15/p16")
> mb[mb %in% other] <- "other"
> mb.boot <- MTP(X = exprs(filtALL), Y = mb, test = "f", alpha = c(0.01, 
+     0.1), B = 100, get.cutoff = TRUE)
\end{Sinput}
\begin{Soutput}
running bootstrap...
iteration = 100
\end{Soutput}
\end{Schunk}

The set of genes with significant differences in mean expression measures between the tumor molecular subtypes, controlling FWER at nominal level $\alpha=0.01$, can be identified through either adjusted $p$-values or cut-offs for test statistics. 
\begin{Schunk}
\begin{Sinput}
> sum(mb.boot@adjp <= 0.01)
\end{Sinput}
\begin{Soutput}
[1] 265
\end{Soutput}
\begin{Sinput}
> sum(mb.boot@statistic > mb.boot@cutoff[, "alpha=0.01"])
\end{Sinput}
\begin{Soutput}
[1] 265
\end{Soutput}
\end{Schunk}


%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Association of expression measures and survival: Cox $t$-statistics}

The bootstrap-based MTPs implemented in \RObj{MTP} (\RObj{nulldist="boot"}) allow us to test hypotheses about regression parameters in models for which the subset pivotality condition may not hold. Examples include tests for association in logistic and Cox proportional hazards models. The phenotype information in the \RPack{ALL} package includes data about original remission status. There are 88 subjects who experienced complete remission and were followed up for remission status at a later date. 
For each gene identified as differentially expressed in the patients with different tumor molecular subtypes (nominal FWER of 0.01), we test for a significant association between expression measures and time to relapse amongst these 88 subjects, adjusting for sex. 
Note that most of the code below is concerned with extracting survival times and covariates from the \RClass{exprSet} object.


\begin{Schunk}
\begin{Sinput}
> genes <- mb.boot@adjp <= 0.01
> cr.ind <- pData(filtALL)$remission == "CR"
> crPheno <- pData(filtALL)[cr.ind, ]
> times <- strptime(crPheno$"date last seen", "%m/%d/%Y") - strptime(crPheno$date.cr, 
+     "%m/%d/%Y")
> times.ind <- !is.na(times)
> times <- times[times.ind]
> cens <- ((1:length(times)) %in% grep("CR", crPheno[times.ind, 
+     "f.u"]))
> rel.times <- Surv(times, !cens)
> patients <- (1:ncol(exprs(filtALL)))[cr.ind][times.ind]
> X <- exprs(filtALL[genes, patients])
> Z <- pData(filtALL[genes, patients])
\end{Sinput}
\end{Schunk}

\begin{Schunk}
\begin{Sinput}
> set.seed(999)
> cox.boot <- MTP(X = X, Y = rel.times, Z = Z, Z.incl = "sex", 
+     Z.test = 1, test = "coxph.YvsXZ", B = 200, get.cr = TRUE)
\end{Sinput}
\begin{Soutput}
running bootstrap...
iteration = 100 200
\end{Soutput}
\begin{Sinput}
> summary(cox.boot)
\end{Sinput}
\begin{Soutput}
MTP:  ss.maxT 
Type I error rate:  fwer 

  Level Rejections
1  0.05          3

             Min.  1st Qu.  Median    Mean 3rd Qu.   Max.
adjp       0.0000  0.90500 1.00000 0.89810  1.0000 1.0000
rawp       0.0000  0.02500 0.10000 0.14220  0.2350 0.4450
statistic -1.9930 -0.21250 0.50060 0.49380  1.2300 4.0490
estimate  -0.3532 -0.02527 0.04991 0.05407  0.1313 0.3835
\end{Soutput}
\begin{Sinput}
> geneNames(filtALL)[cox.boot@adjp <= 0.05]
\end{Sinput}
\begin{Soutput}
[1] "296_at"     "32773_at"   "33501_r_at" "39733_at"   "40688_at"  
[6] "41401_at"  
\end{Soutput}
\end{Schunk}

\begin{Schunk}
\begin{Sinput}
> plot(cox.boot, which = 5)
\end{Sinput}
\end{Schunk}

A plot of confidence regions illustrates that the level $\alpha=0.05$ confidence regions for the top two hypotheses in fact do not include the null value $\psi_0=0$, and those for the next two hypotheses barely cover $\psi_0=0$. 


\begin{figure}
\begin{center}
\includegraphics[width=3in,height=3in,angle=0]{coxphPlot}
\end{center}
\caption{
{\em FWER-controlling MTP --- Time to relapse.} Plot of Cox regression coefficient esimates and corresponding confidence intervals for the 10 genes with smallest adjusted $p$-values, based on the FWER-controlling bootstrap single-step maxT procedure (\RObj{plot} method, \texttt{which=5}).}
\protect\label{f:coxphPlot}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\bibliography{../lab4DiffExp}

\end{document}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Sweave
setwd("C:/Documents and Settings/sandrine_dudoit/My Documents/Talks/MBI04/Labs/Lab4/lab4Out")

library(tools)
lab4<- file.path("../lab4DiffExp.Rnw")
Sweave(lab4,pdf=TRUE,eps=FALSE,stylepath=TRUE,driver=RweaveLatex())

Stangle(lab4)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


