PH 296
Fall 2001

 

Index 

Home

Seminar

Discussion

 

Home -Discussion

Discussion - Fall 2001

Monday, November 26th

Pattern Discovery in Protein Sequences
Katerina Kechris
Department of Statistics, UC Berkeley

The rapid growth of sequence databases has motivated the development of techniques to identify similarities in related sequences. For example, conserved positions across a family of protein sequences may indicate sites which are functionally or structurally important. These sites may remain constant because of selective pressures, while other, non-essential sites are more likely to tolerate mutations over evolutionary time. Once common features are extracted from a family of structurally, functionally or evolutionarily related sequences, they can be used for classification of new examples.

There are a variety of methods for automatic pattern discovery. In this talk, I will discuss several approaches separated into two sections according to the pattern type. Deterministic patterns match or do not match a sequence. These are found by enumerating the solution space of the defined pattern, in the language of regular expressions. Probabilistic patterns assign a probability or score to the match between a sequence and the pattern. These may be found by fitting a statistical model. This talk is not meant to be an exhaustive survey of the different methods, but rather, an introduction into the different approaches through several illustrative examples.

Handout: ps pdf

Monday, November 19th

Gene Finding with Hidden Markov Models
Marina Alexandersson
Department of Statistics, UC Berkeley

A fundamental task in analyzing genomes is to annotate various features of biological importance. While this is relatively straight forward for organisms with compact genomes (such as bacteria or yeast), it becomes much more challenging for large genomes (such as mammals) because the coding "signal" is scattered in a vast sea of non-coding "noise". Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. In this talk we discuss the various forms and algorithms of HMMs used in sequence analysis, including pair HMMs (PHMMs) and generalized HMMs (GHMMs), and we show the pros and cons of extending the theory to cross-species gene recognition.

Monday, November 12th

No class, Veterans Day.

Monday, October 29th and November 5th

Talk on linkage analysis, Spring 2001: ps pdf

Monday, October 22nd

Reading:

A. P. Dempster, N. M. Laird, and D. B. Rubin. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm J. R. Statist. Soc. B. 39(1): 1-38. Download from JSTOR.

Notes on the EM algorithm (ps)

Monday, October 15th

Reading:

Y. Benjamini and Y. Hochberg. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B. 57: 289-300. Download from JSTOR.

J. P. Shaffer. (1995). Multiple hypothesis testing. Annu. Rev. Psychol. 46: 561-584.

Monday, October 8th

Reading:

L. R. Rabiner. (1989). A tutorial on hidden Markov models and selected applications in  speech recognition. Proceedings of the IEEE. 77 (2): 257-286.

H. M. Taylor and S. Karlin. (1984). An introduction to stochastic modeling. Academic Press.

Monday, October 1st

Lecture by Lior Pachter: more on Steiner trees and sequence alignment.

Monday, September 24th

Reading:

S. B. Needleman & C. D. Wunsch. (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology .48: 443-453.

T. F. Smith & M. S. Waterman. (1981).  Identification of common molecular subsequences. Journal of Molecular Biology. 147: 195-197.

L. R. Rabiner. (1989). A tutorial on hidden Markov models and selected applications in  speech recognition. Proceedings of the IEEE. 77 (2): 257-286.

Monday, September 17th
Reading:

Robust local regression
W. S. Cleveland. (1979). Robust localy weighted regression and smoothing scatterplots. Journal of the American Statistical Association. 74: 829-836. 
Download from JSTOR.

Design of microarray experiments
M. K. Kerr & G. A. Churchill. (2001). Experimental Design for Gene Expression Microarrays. Biostatistics. 2: 183-201.
Download from Gary Churchill's webpage or Biostatistics website.

Thursday, September 13th
Slides: Pre-processing in DNA microarray experiments (ppt)
Links: 
Terry's Speed's Microarray Data Analysis Group Page 
Sandrine Dudoit's Homepage (more links ...)
Monday, September 10th
Slides: Introduction to the biology and technology of DNA microarrays (ppt)
Links:
Human Genome Project Education Resources
DNA Microarray Methodology Animation
The Chipping Forecast, Nature Genetics, Vol. 21, supp. p. 1-60. 


To top

last updated November 27, 2001
sandrine@stat.berkeley.edu