Statistics and Genomics Seminar
STAT 278B, Section 004

Fall 2018



[Home]


Thursday, September 6th

Genetic Insights into Human Evolutionary History
Professor Priya Moorjani
Department of Molecular and Cell Biology and Center for Computational Biology, UC Berkeley

Recent advances in technology -- for sequencing present-day and ancient genomes -- have opened up unprecedented opportunities to use genetic data to improve our understanding of human evolutionary genetics. To illustrate how we can use genetic data to learn about human origins and evolution, I will discuss two projects: a) Reconstructing the timing of human-Neanderthal mixture using ancient genomes, and b) An evolutionary perspective on the human mutation rate. These analyses have provided insights about the timeline of human evolution, as well as helped to uncover the determinants of mutation rate and learn about the complex interactions between hominins in the past.


Thursday, September 13th

Statistical and ML Challenges from Genetics to CRISPR Gene Editing
Professor Jennifer Listgarten
Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, UC Berkeley

Molecular biology, healthcare and medicine have been slowly morphing into large-scale, data driven sciences dependent on machine learning and applied statistics. Many of the same challenges from other domains are applicable here: causality vs association; covariate shift; hidden confounders; heterogenous target space; model validation; (multiple) hypothesis testing; feature engineering (owing to relatively small data sets). In this talk, I will go over domain-specific instantiations of some of these problems, along with proposed solutions. In particular, I will start by presenting modeling challenges in finding the genetic underpinnings of disease, which is important for screening, treatment, drug development. Assuming that we have uncovered genetic causes, genome editing -- which is about deleting or changing parts of the genetic code -- will one day let us fix the genome in a bespoke manner. Editing will also help researchers understand mechanisms of disease, enable precision medicine and drug development, to name just a few more important applications. I will close this talk by discussing how machine learning can help with more effective CRISPR gene editing, as well as present a general ML problem that arises in this setting which we are working to solve (group-specific regression spaces).


Thursday, September 20th

Unraveling Tissue Regeneration with Single-Cell RNA-Sequencing
Dr. Diya Das
Department of Molecular and Cell Biology, UC Berkeley, and Berkeley Institute for Data Science

Adult stem cells maintain structure and function of tissues under widely varying conditions, but the molecular and cellular mechanisms that underlie the development of diverse cell types during either tissue maintenance or injury-induced regeneration remain incompletely understood. I will discuss the combination of lineage tracing with single-cell RNA sequencing to address the regulation of these phenomena in the olfactory epithelium, one of only a few sites of ongoing adult neurogenesis. First, we consider the role of olfactory stem cells, known as horizontal basal cells (HBCs), in homeostatic conditions. By examining gene expression, we identify lineage trajectories arising from HBCs. We find that support cells arise by direct fate conversion of HBCs without cell division and also establish that multipotency of the HBC population arises from unipotent fate decisions of individual cells.

Second, we consider the contribution of HBCs to injury-induced regeneration of the olfactory epithelium, and whether this process differs from homeostatic maintenance. We discover activated olfactory stem cell states that are both transient and unique to regeneration. The activated stem cells express genes associated with epithelial wound repair in other stem cell niches. These cells are also heterogeneous, and at least part of this heterogeneity corresponds to cell fate choice.

Taken together, these findings contribute to our understanding of how neurogenesis and regeneration are accomplished in a fully-developed, adult sensory tissue. Our integration of single-cell genomics with in vivo lineage tracing lays the groundwork for investigating outstanding questions relating to the development, maintenance, and repair of other tissues. This work may ultimately aid in the development of stem-cell based therapies to replace specialized cell types and tissues lost to damage and disease.


Thursday, September 27th

Controlling FDR while Highlighting Distinct Discoveries, with Applications to GO Enrichment Analysis
Eugene Katsevich
Department of Statistics, Stanford University

The false discovery rate (FDR) is a popular error criterion for large-scale multiple testing problems. A notable pitfall of the FDR is that filtering (i.e. subsetting) the rejection set post hoc might invalidate the FDR guarantee. In some applied settings, however, filtering is standard practice. For example, post hoc filtering is often employed in gene ontology enrichment analysis (where hypotheses have a directed acyclic graph structure) to remove redundancy among the set of rejected hypotheses (for example, via the REVIGO software). We propose Focused BH, a filter-aware extension of the BH procedure. Assuming the filter can be specified in advance, Focused BH takes as input this filter as well as a set of p-values and outputs a rejection set. This rejection set, when filtered, provably controls the FDR. Existing domain-specific filters can be easily integrated into Focused BH, allowing scientists to continue the practice of filtering without sacrificing rigorous Type I error control.


Thursday, October 4th

Predicting Cell-Fate Decisions to Non-Lethal Dose of Chemotherapy
Professor Lani Wu
Department of Pharmaceutical Chemistry, UC San Francisco

Chemotherapy is designed to induce cell death. However, cancer cells can experience non-lethal doses due to unavoidable declines in drug concentration. In these non-lethal ranges, cells can choose to remain proliferative or become senescent. Here, we discuss how single-cell analysis of dynamic signaling changes to cancer cells within hours of treatment can be used to predict cell-fate decisions that only manifest days later.


Thursday, October 25th

Title
Professor Sourav Bandyopadhyay
Department of Bioengineering and Therapeutic Sciences, UC San Francisc
Affiliation

Abstract.


Thursday, November 1st

Drug Discovery in the Era of Precision Medicine
Professor Marina Sirota
Bakar Computational Health Sciences Institute (BCHSI), UC San Francisco

Recent advances in genome typing and sequencing technologies have enabled quick generation of a vast amount of molecular data at very low cost. The mining and computational analysis of this type of data can help shape new diagnostic and therapeutic strategies in biomedicine. In this talk, I will discuss how such technological advances in combination with data science and integrative analysis can be applied to drug discovery in the context of drug target identification, computational drug repurposing and population stratification approaches.


Thursday, November 8th

Title
Kelly Street, Graduate Group in Biostatistics, UC Berkeley
Affiliation

Abstract.


Thursday, November 15th

Machine Learning for Health Care
Professor Katherine A. Heller
Department of Statistical Science and at the Center for Cognitive Neuroscience, Duke University

We will present multiple ways in which healthcare data is acquired and machine learning methods are currently being introduced into clinical settings. This will include: Current work in these areas will be presented and the future of machine learning contributions to the field will be discussed.


Thursday, November 29th

Unlocking RNA-seq Tools for Zero-Inflation and Single Cell Applications Using ZINB-WaVE Observation Weights
Dr. Fanny Perraudeau
Whole Biome

Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

I'll take ~15 minutes to talk about Whole Biome and working in biotech industry after a PhD at UCB.


Thursday, December 6th

Data Science and the Evolution of Plant and Animal Morphology
Dr. Ciera Martinez
Department of Molecular and Cell Biology, UC Berkeley, and Berkeley Institute for Data Science

The field of Evolutionary Development (Evo-Devo) attempts to describe how the genetic differences between species can direct the astounding diversity of body architecture observed on this planet. In this talk I will explain the Evo-Devo approach and how I use take advantage of genetic and morphological data to allow never before seen glimpses into the mechanisms guiding morphological evolution. I will also detail how the maturation of data science as a field has guided my research trajectory from studying the evolution of plant morphology to my current project exploring the mysterious non-coding regions of genomes in fruit flies.