The materials for lectures 1-3 are mainly from Statistics (3rd ed.) by Freedman, Pisani and Purves. Publisher: Norton
Midterm II [download], Due at 5pm, Friday 05/09/2008
Data [clustering] [TrueC.dat]
Please either email your work to me or drop it in my office
or put a hard copy in my mailbox at 367
Evans Hall. Good Luck!
HW and LAB assignments:
HW #1 [download], due at the end of the lab session, Wednesday 02/13/2008
[Solution]
HW #2 – part 1 [download], due at the end of the lab session, Wednesday 02/27/2008
part 2 [download], due at the end of the lab session, Wednesday 02/27/2008
[Solution]
LAB#1 – part 1 [download], due at the end of the lab session, Wednesday 03/12/2008
[123nt.dat] [purine100.txt] [purine1000.txt] [purine10000.txt] [gtattg_5kb_count.txt]
[gtattg_20kb_count.txt] [tataat_5kb_count.txt] [tataat_20kb_count.txt]
part 2 [download] [hmm.txt], due 5:00pm Friday 03/21/2008
HW #3 – [download], due Thursday in class 03/13/2008
Data [f8i2.fasta] [f8i2_perm.fasta]
[Solution]
HW #4 – [download] due at the end of the lab session, Wednesday 04/23/2008
LAB#2 – [download] due at the end of the lab session, Wednesday 04/30/2008
Data [ko8.lab]
Midterm I [download], Due at the beginning of class, Thursday 03/06/2008
Note that there is no class on Tuesday 03/04/2008 !
Note that there is no lecture on Tuesday 04/01/2008!
Please read the following materials: [1] and [PDF] for a review; [PDF] and [PDF] for False Discovery Rate
The Thursday lecture will cover Hierarchical and K-means
clustering analysis
Midterm II [download], Due at 5pm, Friday 05/09/2008
Data [clustering] [TrueC.dat]
Please either email your work to me or drop it in my office
or put a hard copy in my mailbox at 367
Evans Hall. Good Luck!
Lecture notes:
Jan 22 Introduction to the course and Bioinformatics [syllabus] [doc]
Jan 24 Introduction to Probability [doc] [doc]
Jan 29 Introduction to Probability and Statistics (I) [PDF]; More on distributions [PDF]
Jan 31 Introduction to Probability and Statistics (II) [PDF]; Mendel’s genetics [doc]
Feb5, Feb 7, Feb12: Statistics in Genome Assembly
Genome Sequencing [PDF];
Statistics in shotgun sequencing [PDF];
Some reading materials:
History of Genomic Sequencing [1], [2]
Re-sequencing [PDF]
Brief introduction of bio-chemistry in sequencing [PDF]
The materials of
“Statistics in shotgun sequencing” are from
(Hard copies of above two sections will be distributed in class)
Outline of the lecture on Feb 12:
1) Analysis of one sequence
a. Shotgun sequencing
i. coverage theorem
ii. mean number of contigs
b. Modeling signals in DNA (brief intro)
i. transcription factor binding sites identification
ii. Hidden Markov Model (HMM) for gene
annotation (coding regions vs noncoding regions)
2) Analysis of Multiple sequences
a. Sequence alignment (brief intro)
b. Alignment-free sequence comparison
i. Frequency comparison (Chi-square test)
ii. D2 statistic
Feb 12, Feb 14, Feb
19, Feb 21, Feb 26, Feb 28: DNA or
protein sequence alignment and comparison
Markov Chain and Hidden Markov Model; [reference link]
Viterbi algorithm [link1] [link2] and Dynamic programming [PDF]
(Also please read the Chapter 4 and Chapter 11 of Ewens and Grant’s book “Statistical Methods in Bioinformatics: An introduction”. A hard copy of some sections from those chapters will be distributed in class on Feb 19)
Random Walk and BLAST theory; [PDF]
Maximum Likelihood Estimates;
Mar 4 Midterm review
Mar 6 Midterm I
Mar 11, 13, 18, 20: Gene Expression Data Analysis (I)
SAGE data [PDF]; please read the introduction and the method sections
Reference [2]
Please watch the animation [1] for DNA arrays (highly recommended)
“Affymetrix arrays” [PDF]
“Statistics in Microarray gene expression analysis” [PDF]
“Microarray gene expression analysis (I)” [PDF]
“Statistical methods for finding differentially expressed gene” [PDF]
“False Discovery Rates, Maximum Likelihood Estimates, Bayes’s Rule” [PDF]
Outline of the lecture on March 18:
Two sample t-tests for finding differentially expressed genes
1. calculating the test statistics
a. samples with equal or un-equal sizes
b. samples with equal or un-equal variances
2. evaluating the statistics
a. Type I errors; False positives; p-values; False positive rate
b. Type II errors; False negatives; power; False negative rate
3. Q-Q plots
a. Comparing the distributions of two samples
b. Comparing the distribution of one sample against a particular distribution
c. Visualizing outliers (significant genes)
“Microarray gene expression analysis (II)” [PDF] (optional)
More reading materials for False Discovery Rates [PDF] (optional)
Mar 25, Mar 27 Spring break
Apr 1, 3, 8, 10, 15, 17, 22, 24: Gene Expression Data Analysis (II): clustering and classification
Note that there is no lecture on
Tuesday 04/01/2008!
Please read the following materials: [1] and [PDF] for a review; [PDF] and [PDF] for False Discovery Rate
The Thursday (04/03) lecture will
cover Hierarchical and K-means clustering analysis
Hierarchical clustering
K-means clustering
Self-organizing maps
Principal Component Analysis (PCA) [PDF];
Spectral Clustering [PDF];
Correspondence Analysis
Nonnegative Matrix Factorization
If time permits, we will also cover the methods for finding transcription factor binding motifs.
Apr 29 Midterm II review
May 1 Midterm II
May 6, 8, 13 Final Presentations