Joint work with Ru-Fang Yeh, Mark Segal and Jean Yang
In this talk, I will present a discrete algorithms approach to detect population substructure. The algorithm is based on a dissimilarity measure between individuals, for which we studied the asymptotic behavior. In particular, we show rigorously that the algorithm converges rapidly to the correct classification with the increase in the genetic distance between the sub-populations. I will describe some theoretical aspects of this measure, as well as empirical results of population substructure, when applied to the HapMap data and to simulated data. We compared our method to two state of the art methods (STRUCTURE and EIGENSTRAT). Our results indicate that the suggested algorithm is very efficient and more accurate than the other two algorithms.
This is a joint work with Kamalika Chaudhuri, Satish Rao, Shuheng Zhou
and Srinath Sridhar.
I will introduce a method for the analysis of pathologic biology that unravels the disease characteristics of high dimensional data. The method, Disease-Specific Genomic Analysis (DSGA), is intended to precede standard techniques like clustering or class prediction, and enhance their performance and ability to detect disease. DSGA measures the extent to which the disease deviates from a continuous range of normal phenotypes, and isolates the aberrant component of data. In several microarray cancer datasets, I will show that DSGA outperforms standard methods. I will then discuss novel results in breast cancer, highlighted by the use of DSGA. Although these examples focus on microarrays, DSGA generalizes to any high dimensional genomic/proteomic data.
This is joint work with Robert Tibshirani, Anne-Lise Borresen-Dale, and
Stefanie Jeffrey.
Diverse information types from different studies may be combined on a number of levels: raw or adjusted data, parameter estimates, test statistics, p-values, or decisions. As no method is univerally optimal, the choice of level and technique depend on available data and study goals. One overall statistic which detects similar deviations from the null across studies is the (possibly weighted) combined Z-score. This statistic provides a reasonably flexible means of combining results as it does not require data of similar types or even common questions across studies, yet unlike combined log p-values it accumulates positive and negative evidence.
I will compare the combined Z-score with other methods of combination and illustrate its use on a set of microarray studies of breast cancer.
This is joint work with Pratyaksha Wirapati and Mauro Delorenzi of the Swiss Institute for Experimental Cancer Research and the Swiss Insitute
of Bioinformatics.