PB HLTH 292, Section 020
Statistics and Genomics Seminar
Thursday, January 22nd
WikiPathways: Pathway Editing for the People
Dr. Bruce R. Conklin
Gladstone Institute of
Cardiovascular Disease, UCSF
To facilitate the contribution and maintenance of pathway information by the biology community, we established WikiPathways (www.wikipathways.org). WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. WikiPathways thus presents a new model for pathway databases that enhances and complements ongoing efforts. Building on the same MediaWiki open source software that powers Wikipedia, we added a custom graphical pathway editing tool and integrated databases covering major gene, protein, and small-molecule systems. The familiar web-based format of WikiPathways greatly reduces the barrier to participate in pathway curation. More importantly, the open, public approach of WikiPathways allows for broader participation by the entire community, ranging from students to senior experts in each field. This approach also shifts the bulk of peer review, editorial curation, and maintenance to the community.
Thursday, January 29th
Full Transcriptome Analysis using the Illumina Genome Analyzer
Dr. Gary P. Schroth
The Illumina Genome Analyzer is a high throughput DNA sequencing platform that routinely generates several billion bases of very high quality sequence information from a large variety of genomic applications. We will show examples of how the instrument is being used for a large variety of applications in genome biology including eukaryotic and prokaryotic resequencing, SNP discovery, gene expression analysis, ChIP-SEQ, genome-wide mapping of DNA methylation sites, and miRNA discovery and analysis. We will present details of how the mRNA-Seq assay is being used to quantify gene expression levels with high specificity over a broad dynamic range. In addition to quantifying expression levels, this data is also being used to characterize thousands of novel alternative transcripts in the human transcriptome. We will also discuss the development of new software and analysis tools that can help users glean biological meaning from the massive amounts of data produced by the system.
Thursday, February 5th
From expression profiling to putative master regulators
Professor Terry Speed
Department of Statistics, UC Berkeley
People conduct microarray gene expression experiments or studies in order to find out which genes are regulated, e.g. as a result of a treatment, or over time. The genes so observed that can usually be validated by qrt-pcr. Of equal or even greater interest are the regulators of the genes found to be activated/differentially expressed. How do we identify these regulators, and when we have found some candidates and view their statuses as hypotheses, how are these tested?
Thursday, February 12th
Preliminary Transcriptome Analysis of a Trio: Mother, Father, Daughter
Dr. Hugh Rienhoff
Unknown syndromes and sporadic cases with suspected genetic etiology are the most difficult cases to diagnose and manage and yet the prevalence of such cases numbers in the hundreds-of-thousands. The identification of mutations causing novel sporadic genetic disease is an open-ended search guided by the clinical similarity with known Mendelian diseases and the application of standard global genetic interrogations such as karyotyping, comparative genomic hybridization and genomic copy number variation. To extend the methodologies available to these patients we have pioneered the use of RNA sequencing by examining the RNA of white blood cells in a trio -- an affected daughter and her two parents. The RNA sequence has been complemented by low-pass whole genome sequencing and the 1.4M SNP chip for cross-platform validation to identify significant insertions and deletions. In toto, the dataset is rich in known and unsuspected phenomenology as well as offering hypotheses to test.
Thursday, February 19th
Analysis of 2D-DIGE protein expression data
Professor Elmer Fernandez
Catholic University of Cordoba
Nowadays it is possible to afford a whole view of the proteome in a glance
high-throughput techniques such as 2D Differential in-gel electrophoresis
This technique produce high mount of data with complex structure that
analyzed by means of appropriate analytical techniques. The primary goal
kind of experiments is the detection of proteins showing a statistically
difference on expression under different experimental conditions or the
identification of potential biomarkers that could be use for early
this talk we will show the fundamentals of 2D-DIGE technology as well as
some statistical methods used to deal with this kind of data.
Thursday, February 26th
The Importance of Race/Ethnicity & Genetics in Biomedical Research and Clinical Practice; Lessons from the Genetics of Asthma in Latino Americans (GALA) Study
Professor Esteban Burchard
Department of Medicine, UCSF
A debate has recently arisen over the use of racial classification in medicine and biomedical research. In particular, with the completion of a rough draft of the Human Genome, some have suggested that racial classification may not be useful for biomedical studies since it reflects "a fairly small number of genes that describe appearance,"1 and that "there is no basis in the genetic code for race."2 Based in part on these conclusions, some have argued for the exclusion of racial and ethnic classification from biomedical research.3 In the United States, race and ethnicity have been a source of discrimination, prejudice, marginalization and even subjugation. Excessive focus on racial/ethnic differences runs the risk of undervaluing the great diversity that exists among individuals within groups. However, this risk needs to be weighed against the fact that in epidemiologic and clinical research, racial and ethnic categories are useful for generating and exploring hypotheses on environmental and genetic risk factors and interactions between risk factors for important medical outcomes. Erecting barriers to the collection of information such as race and ethnicity may provide protection against the aforementioned risks, however it will simultaneously retard progress in biomedical research and limit effectiveness in clinical decision-making.
Today I hope to convey the importance of Race & Ethnicity in Biomedical, Genetic and Clinical Research. I will begin by providing fundamental evidence of genetic differences between racial and ethnic populations. I will then demonstrate racially-specific differences in genetic risk for diseases including Alzheimer's Disease and ethnic-specific differences in drug responsivness. Finally, I will present data from the ongoing Genetics of Asthma in Latino Americans (GALA) Study.
1. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature 2001; 409:860-921.
2. Venter C, quoted in interview for NY Times News Article "Do Races Differ? Not Really, Genes Show" written by N. Angier. Do Races Differ? Not Really, Genes Show. New York Times 2000.
3. Schwartz RS. Racial profiling in medical research. N Engl J Med 2001; 344:1392-3.
Thursday, March 5th
Methods for Allocating Ambiguous Short-Reads
of Statistics, UC Berkeley
With the rise in prominence of biological research using new short-read DNA sequencing technologies comes the need for new techniques for aligning and assigning these reads to their genomic location of origin. Until now, methods for allocating reads which align with equal or similar fidelity to multiple genomic locations have not been model-based, and have tended to ignore potentially informative data. Here, I will demonstrate that existing methods for assigning ambiguous reads can produce biased results. I will also present a new method for allocating ambiguous reads to the genome, developed within a framework of statistical modeling, which shows promise in alleviating these biases, both in simulated and real data.
This is joint work with my advisor, Terry Speed, and Doron Lipson from Helicos Biosciences.
Thursday, March 12th
A Method for the Analysis of Longitudinal Multi-factorial Microarray Data
Professor Wing Hung Wong
Department of Statistics, Stanford University
Time-course microarray experiment is capable of capturing the dynamic profiles of genomic response to multiple experimental factors. Analytic methods are needed to simultaneously handle the time course (longitudinal) structure and multi-factorial structure in the data. We will introduce a robust non-parametric ANOVA (NANOVA) method for the analysis of multi-factor effects while accounting for multiple testing and non-normality nature of microarray data. To incorporate time course measurements, factor effects are evaluated based on information pooled across time. The proposed method can effectively extract the gene specific response feature and provide quantitative information about the expression pattern of a gene. It has a broader applicability in longitudinal factorial data in general and can be extended to cross-sectional time course data. This method was applied to four data sets from a large-scale clinical study of burn injury. Our analysis identified age-related and gender-related burn responsive genes and characterized their response features. T-cell and B-cell related immune systems, insulin-related signaling pathway and various metabolic processes were found to be differentially perturbed in pediatric and adult burn patients. Gender differences in burn injury were detected in several sex chromosome genes. We also assessed age and burn effects across four tissues (blood, skin, muscle and fat) and identified muscle as the most differentially perturbed tissue in burn injured children compared with adults. Finally, our analysis of age impact on adult survivability after burn suggests several metabolic processes as the potential contributors to increasing death rate in older burn patients.
Joint work with Baiyu Zhou.
Thursday, April 2nd
Analysis of "DMET Plus" - a customized genotyping panel for simultaneous assessment of a wide variety of polymorphisms involved in adsorption, distribution, metabolism and excretion of compounds in humans
Dr. Simon Cawley
Director, Algorithms & Data
Exploring the genetic determinants of variations in human response to drugs requires extensive monitoring of polymorphisms in genes involved in Adsorption, Distribution, Metabolism and Excretion (collectively referred to as ADME). The current database of polymorphisms known to have functional effects includes a broad variety of type of polymorphism, including SNPs, insertion/deletion events and variations in chromosome copy number. Some of the markers of interest involve more than two alleles and many have proximal secondary polymorphisms. This diversity of type of polymorphism makes it technically very challenging to develop a unified approach that is capable of high-throughput determination of all the underlying types. Affymetrix recently released DMET Plus, a solution enabling simultaneous interrogation of all the key types of polymorphism of interest for pharmacogenetics studies. Development of the array and assay required an analysis approach that was general enough to handle the diverse collection of types of polymorphism, that delivered highly reliable genotype calls and that could operate under the constraint that genotype calls be made based on analysis of a single sample at a time. This talk will focus on the analytical challenges that arose during the development and will describe the genotype calling methods put in place for the final product.
Thursday, April 9th
Statistical Analysis of Histone Modifications
Professor Ping Ma
Department of Statistics, University of Illinois, Urbana-Champaign
Gene activities in eukaryotic cells are concertedly regulated by
transcription factors and chromatin structure. The basic repeating
unit of chromatin is the nucleosome, an octamer containing two copies
each of four core histone proteins. Recent high throughput studies
have begun to uncover the global regulatory role of nucleosome
positioning and modifications. While nucleosome occupancy in promoter
regions typically occludes transcription factor binding, thereby
repressing global gene expression, the mechanism of histone
modification is more complex. Histone tails can be modified in
various ways, including acetylation, methylation, phosphorylation,
and ubiquitination. Even the regulatory mechanism of histone
acetylation, the best characterized modification to date, is still
not fully understood.
In this talk, I will present a some statistical method to analyze
some genome-wide histone modification datasets.
Thursday, April 16th
Algorithms for structure prediction and concentration estimation of alternatively spliced isoforms
Professor Angel Rubio
Centro de Estudios e
Investigaciones Tecnicas de Gipuzkoa (CEIT), University of Navarra
xon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict the strucuture of unknown spliced isoforms and the ability to quantify the concentration of known and unknown isoforms. We have developed an algorithm that is able to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. I will present the results for real and simulated data. In addition, we have preliminary results of a new version that exploits the redundancy of the probes in the Affymetrix exon (or exon+ junction) arrays.
Thursday, April 23rd
A Novel Topology for Representing Protein Folds
Professor Mark Segal
Division of Biostatistics, UCSF
Various topologies for representing three dimensional protein structures have been advanced for purposes ranging from prediction of folding rates to ab initio structure prediction. Examples include relative contact order, Delaunay tessellations, and backbone torsion angle distributions. Here we introduce a new topology based on a novel means for operationalizing three dimensional proximities with respect to the underlying chain. The measure involves first interpreting a rank-based representation of the nearest neighbors of each residue as a permutation, then determining how perturbed this permutation is relative to an unfolded chain. We show that the resultant topology provides improved association with folding and unfolding rates as determined for a set of two-state proteins under standardized conditions. Furthermore, unlike existing topologies, the proposed geometry exhibits fine scale structure with respect to sequence position along the chain, potentially providing insights into folding initiation and/or nucleation sites.
Thursday, April 30th
Statistical methods for high-throughput phenotypic studies
Professor Jenny Bryan
Department of Statistics, University of British Columbia
Researchers in functional genomics can now obtain quantitative
phenotypes for large collections of organisms, each of which is
characterized by the deletion of an individual gene. By observing the
phenotypic consequence of deletion across diverse conditions, we
obtain specific information on the functional roles of the disrupted
gene. The repertoire of massively parallel perturbations being
applied to live cells and organisms extends well beyond the simple
knockout or knockdown of single genes. Recent examples include other
genomic modifications, such as the insertion of alternative regulatory
regions, and treatment with large libraries of well-characterized and
novel compounds. Finally, researchers may apply these interventions in
a combinatorial fashion, e.g., mating yeast deletion mutants to create
double knockouts or treating a panel of knockouts with a large
collection of drugs.
I will present statistical approaches I have developed for the
analysis of data from these high-throughput phenotypic studies, with
some coverage of low-level issues, such as normalization, and
high-level analyses, such as clustering and growth curve modelling on
a large scale.
Thursday, May 7th
Estimating recombination rates in microbial populations from metagenomic data
Philip L. F. Johnson
Graduate Group in Biophysics, UC Berkeley
Microbial populations exchange genetic information at widely varying rates, dramatically affecting the evolutionary potential of a population. Traditionally, microbial recombination rates have been calculated as a genome-wide average from multi-locus sequence typing at carefully chosen loci. New metagenomic sequencing projects, however, hold the potential to identify not only average rates of recombination, but also local "hotspots" of recombination because they generate short, overlapping fragments of DNA sequence, each deriving from a different individual, at random locations across the genome. We have developed a composite likelihood estimator that operates on these data. This method will help elucidate the rates of exchange of genetic material in microbial genomes.
Thursday, May 14th
Selective genotyping and phenotyping strategies in a complex trait context
Professor Saunak Sen
Division of Biostatistics, UCSF
Selective genotyping and selective phenotyping strategies, where a
subset of individuals are genotyped or phenotyped, can reduce the cost
of genetic studies. In experimental crosses (where two or more
strains are mated to form a segregating population), the efficiency of
these strategies has been evaluated in simplified settings where a
single locus contributes to the trait of interest, and when the trait
is normally distributed. Complex traits, where multiple loci
contribute to the trait, possibly with interactions, are incompatible
with this simplified setting; additionally such traits may not be
normally distributed. We analyze selective genotyping and phenotyping
considering these complexities not considered previously, and suggest
approaches that will work better in more realistic scenarios. Our
approach is based on a general framework for calculating the expected
information content of experimental strategies.