PB HLTH 295, Section 003
Statistics and Genomics Seminar

Fall 2015



[Home]


Thursday, August 27th


Multiple Testing Procedures for Large and Complex Dependent Data with Application to Study Human Brain Complex Network Properties
Dr. Djalel-Eddine Meskaldji
Medical Image Processing Lab (MIPLab), Ecole Polythechnique Federale de Lausanne (EPFL), Switzerland

Given the large number of papers written over the last ten years on error controls in high dimensional multiple testing, it would be worthwhile to consider a single comprehensive technique that allows user flexibility in error control. We describe a new and comprehensive family of error rates (with a corresponding family of control under different assumptions) that contains and generalizes most existing proposals. It offers the scientist a broad choice on how to properly control for discovering false findings. We also discuss the use of a particular choice that bridges the gap between two well-known control error metrics: FWER and FDR. The second part of the talk will be dedicated to how to use a screening and filtering strategy to benefit from positive dependence to increase power of testing, when data can be modeled as a complex network. As an illustration, the strategy is applied to compare topological differences between groups of human brain networks.


Thursday, September 3rd


Genome-Wide Association Methods: Searching Through Individual Genomes for Association with Health and Lifespan Variation in Drosophila
Dr. Christopher Nelson
Buck Institute for Research on Aging, Novato, CA

Genome-wide association studies (GWAS) are an approach to identifying genetic variations that influence phenotypes of interest. I will discuss GWAS methods and practical insights from implementing them. To illustrate the utility of these methods I will talk about our search for genetic variants in the Drosophila Genetic Resource Panel that modulate phenotypes and dietary responses. We have measured a panel of health and lifespan phenotypes in this collection of 205 genotyped fly strains, and we are using a variety of GWAS techniques to query this dataset. Studying genetic variation in model organisms allows validation and discovery of novel genes, pathways and principles that may influence human health. Specific topics will include allele frequency effects on GWAS statistical power, interaction of genotype with environmental covariates, multiple-testing correction methods, and methods for testing multiple polymorphisms at once or multiple phenotypes at once.


Thursday, September 10th


Genome Regulation by Long Noncoding RNAs
Professor Howard Y. Chang
Stanford University School of Medicine

Abstract.


Thursday, September 17th


Shape Analysis of High-Throughput Transcriptomics Experiment Data
Dr. Kwame Okrah
Genentech

We utilize L-moments statistics as the basis of exploratory data analysis, the assessment of distributional assumptions, and the hypothesis testing of high-throughput transcriptomic data. In particular, we use L-moments ratios for assessing the shape (skewness and kurtosis) of high-throughput transcriptome data. Based on these statistics, we propose an algorithm for identifying genes with distributions that are markedly different from the majority in the data. In addition, we also illustrate the utility of this framework to characterize the robustness of distributional assumptions.


Thursday, September 24th


Improving the Quality of miRNA-mRNA Interaction Predictions
Dr. Angel Rubio
Centro de Estudios e Investigaciones Tecnicas (CEIT), University of Navarra, San Sebastian, Spain

microRNA target prediction remains challenging since very few have been experimentally validated and the databases that include sequence-based predictions have large numbers of false positives. Furthermore, due to the different measuring rules used in these databases of predicted, the selection of the most reliable ones requires extensive knowledge about each algorithm.

In this talk, I expose a method [Tabas et al. 2104] to measure the confidence of predicted interactions based on experimentally validated information. The output of this method has been used to create a combined database with re-assigned score to each predicted interaction. The new score allow the robust combination of several databases without the effect of low-performing algorithms dragging down good-performing ones. The combined database outperforms each of the existing predictive algorithms.

Daniel Tabas-Madrid, Ander Muniategui, Ignacio Sanchez-Caballero, Dannys Jorge Martinez-Herrera, Carlos Oscar S Sorzano, Angel Rubio* and Alberto Pascual-Montano*, "Improving miRNA-mRNA interaction predictions", BMC Genomics, 2014.


Thursday, October 8th


Assessing the Effect of Tumor Heterogeneity on Local Immunosuppression Via Single-Cell Transcriptomics and Genomics
Professor Aaron Diaz
Department of Neurological Surgery, UCSF

Glioblastoma Multiforme is the most common and most aggressive form of primary brain tumor. To date, some of the most effective cancer therapies have been those that hone in on specific molecular defects. However, in highly diverse tumors such as gliomas, clinical trials of promising targeted therapeutics often produce mixed results. This is at least partly due to intra-tumor heterogeneity in response to treatment. In particular, the glioma microenvironment contains a heterogeneous variety of inflammatory infiltrate that can play both tumor-supportive and anti-tumor roles. Single-cell sequencing approaches are being actively developed, which allow assessments of heterogeneity at unprecedented resolution. Our lab has been applying techniques from machine-learning and systems biology to study intra-tumor signaling networks using single-cell sequencing data. I will present preliminary results from our work using single-cell transcriptomics, bulk exome sequencing and molecular assays to identify targets to blockade tumor-mediated immunosuppression.


Thursday, October 15th


Shrinkage of Dispersion Parameters in the Binomial Family, with Application to Differential Exon Skipping
Dr. Sean Ruddy
Department of Statistics, UC Berkeley

The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is gene expression data, where the counts per gene are often modeled as some form of an over-dispersed Poisson. Shrinkage estimates of the per-gene dispersion parameter have led to improved estimation of dispersion, particularly in the case of a small number of samples. We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an over-dispersed binomial model. We are motivated by our interest in testing for differential exon skipping in mRNA-Seq experiments. We introduce a novel shrinkage method that models the over-dispersion with the double binomial distribution proposed by Efron (1986). Our method (WEB-Seq) is an empirical Bayes strategy for producing a shrunken estimate of dispersion and effectively detects differential proportional usage, and has close ties to the weighted-likelihood strategy of edgeR developed for gene expression data (Robinson and Smyth, 2007; Robinson et al., 2010). We analyze its behavior on simulated data sets as well as real data and show that our method is fast, powerful and gives accurate control of the FDR compared to alternative approaches. We provide implementation of our methods in the R package DoubleExpSeq available on CRAN.


Thursday, October 22nd


Statistical Approaches for Inferring the 3D Structure of the Genome
Nelle Varoquaux
Center for Computational Biology, Mines ParisTech and Institut Curie, Paris, France

The spatial and temporal organization of the 3D organizations of chromosomes is thought to have an important role in genomic function, but is yet poorly understood. Recent advances in chromosomes conformation capture (3C) technologies, initially developed to assess interactions between specific pairs of loci, allow to simultaneously measure contacts on a genome scale, paving the way to the reconstruction of the full 3D model of a genome.

Inferring the 3D structure remains however a challenging problem. Many approaches converts interaction frequencies into physical distances and solves constrained optimization problem (often non convex) akin to multidimensional scaling (MDS). Recent works have proposed probabilistic models of interaction frequencies and their relationships with physical distances, and uses MCMC sampling procedures to produce an ensemble of 3D structures.

We propose two new formulations of the inference as a maximum likelihood model based on statistical models (a Poisson modeling and a Negative Binomial modeling) of interaction frequencies where the 3D structure is latent variable. We demonstrate these approaches reconstruct better structures than previous MDS based methods, particularly at low coverage and high resolution.


Thursday, October 29th


Title
Speaker
Affiliation

Abstract.


Thursday, November 5th


Title
Speaker
Affiliation

Abstract.


Thursday, November 12th


3D Genome Reconstruction and Functional Hotspot Identification
Professor Mark R. Segal
Division of Bioinformatics and Department of Epidemiology and Biostatistics, UCSF, UCSF

The three-dimensional (3D) configuration of chromosomes within the eukaryote nucleus is consequential for several cellular functions including gene expression regulation, and is also strongly associated with cancer-causing translocation events. While visualization of such architecture remains limited to low resolutions (due to compaction, dynamics and scale), the ability to infer structures at high resolution has been enabled by recently-devised chromosome conformation capture (CCC) techniques. In particular, when coupled with next generation sequencing, these methods yield an inventory of genome-wide chromatin interactions or contacts. Various algorithms have been advanced to operate on CCC data to generate 3D configurations. For simple eukaryotes these reconstructions have been shown to provide added value over raw contact data with respect to downstream biological analyses. However, such added value has yet to be fully realized for higher eukaryotes since high resolution, genome-wide reconstructions have been largely precluded due to computational bottlenecks and organismal complexity. Here we propose, illustrate and evaluate a two-stage algorithm, deploying multi-dimensional scaling and Procrustes transformation, to address these issues. Conditional on having a 3D reconstruction with superposed functional data, we describe and showcase methods (patient-rule induction, kNN regression) for identifying "3D hotspots": localized regions wherein the functional phenotype is extreme.


Thursday, November 19th


Uni- and Bi-Partite Stochastic Network Models with Applications to 'Omics Data
Dr. Tom Bartlett
Department of Statistical Science, University College, London

Networks and other non-Euclidean relational datasets have become important applications in modern statistics: complex systems which can be modelled as networks are ubiquitous, and such methodology has been found to be particularly useful in the study of cell biology. However, cell biological processes are inherently stochastic and non-stationary, and empirical networks based on high-throughput assays suffer from technical noise. I develop computational and statistical network models with applications relevant to the study of developmental and disease processes, such as cancer. In this talk I will propose a number of models for inference of uni- and bi-partitite network structure drawn from my recently completed doctoral work, and I will outline current challenges in, and preliminary results from, new collaborations with scientists at UCSF.

Website: https://www.ucl.ac.uk/statistics/people/thomas-bartlett


Thursday, December 3rd


Mapping Cell Differentiation Patterns by Single-Cell Transcriptomics
Kelly Street
Graduate Group in Biostatistics, UC Berkeley

The advent of single-cell RNA sequencing technology has allowed researchers to interrogate cellular processes such as differentiation at an unprecedented level of detail. Using samples from multiple time points to chart progression through these processes has become a common application. We characterized the gene expression profiles of hundreds of cells from the mouse olfactory epithelium, a regenerative tissue involved in smell. In particular, we surveyed basal cells, a type of stem cell which help to replenish other cell types, over the course of two weeks after inducing differentiation by genetic ablation. The main goals of our analysis were to distinguish between differentiation processes originating with the same cell type and identify the distinct genetic patterns of these processes.