Haiyan Huang, PhD






Associate Professor
Department of Statistics

Interdepartmental Group in Biostatistics

Center for Computational Biology, PhD Program Chair & Head Graduate Advisor

University of California, Berkeley

CA, 94720, USA


[CV]    [Research]





Current Students & Postdocs:

·       Funnan Shi (PhD, Statistics)

·       Yuting Ye (PhD, Biostatistics)

·       Courtney Schiffman (PhD, Biostatistics)

·       Yun Zhou (PhD, Biostatistics)

·       Dr. Ke Liu (Postdoc)

·       Tom Hu (Undergraduate, Statistics)


Former Graduate Students:

·       Siew-leng Melinda Teng, PhD, 2007 Summer (Thesis Title: Statistical methods in integrative analysis of gene expression data with applications to biological pathways; Current Position: Statistician, Genentech, Inc.)

·       Na Xu, PhD, 2008 summer, co-advised with Prof. Peter Bickel (Thesis Title: Transcriptome Detection by Multiple RNA Tiling Array Analysis and Identifying Functional Conserved Non-coding Elements by Statistical Testing; Current Position: Statistician, Genentech, Inc.)

·       Kyungpil Kim, PhD, 2013 (Thesis Title: Application of Statistical Methods to Integrative Analysis of Genomic Data; Current Position: Postdoctoral Research Fellow, Children’s Hospital Oakland Research Institute)

·       Daisy Yan Huang, PhD (Thesis Title: Overcoming the Small Sample Size Challenge in Differential Gene Expression Analysis Studies; Current Position: Lecturer, Princeton University)

·       Jingyi Jessica Li, PhD, 2013, co-advised with Prof. Peter Bickel (Thesis Title: Statistical Methods for Analyzing High-throughput Biological Data; Current Position: Assistant Professor, Department of Statistics, UC Los Angeles)

·       Y.X. Rachel Wang, PhD, 2015, co-advised with Prof. Peter Bickel (Thesis Title: Problems in Network Modeling: Estimating Edges and Community Detection; Current Position: faculty member, School of Mathematics and Statistics, University of Sydney, Australia)

·       Christine Ho, PhD, 2016 Winter, co-advised with Prof. Elizabeth Purdom (Thesis Title: Statistical Modeling and Analysis for Biomedical Applications; Current Position: Data Scientist, Pandora)



·       Hua Chen, Master 2008 (Thesis Title: Bayesian Method for Multi-Loci Association Study of Human Disease; Position right after graduation: Research fellow, Harvard University)

·       Ling Meng, Master 2009 (Thesis Title: Learning Algorithm and Model Selection for Protein-Protein Interaction Inference in Arabidopsis; Position right after graduation: Research Fellow, UC Berkeley)

·       Harold Pimentel, Master 2013 (Thesis Title: Biclustering as an Extension of Sparse Canonical Correlation Analysis; Current Position: Postdoctoral Research Fellow, UCSF)

·       Courtney Schiffman, Master 2016 (Thesis Title: Single Cell RNA-Seq Study: A Study on Normalization and Sub-Population Identification Techniques; Current Position: Student in Biostatistics PhD Program, UC Berkeley)

·       Yuting Ye, Master 2017 (Thesis Title: Testing and Diagnosis of Neurological Disorders based on Neuroimaging; Current Position: Student in Biostatistics PhD Program, UC Berkeley)



Former Postdocs:

·       Ci-Ren Jiang, Postdoc Sept 2009 – Aug 2010 (Current Position: (Tenured) Associate Research Fellow of the Institute of Physics, Academia Sinica, Taipei, Taiwan) 

·       Qunhua Li, Postdoc Sept 2008 – July 2011, co-advised with Professor Peter Bickel (Current Position: Assistant Professor, Department of Statistics, Penn State University)

·       Hao Xiong, Postdoc Jan 2012 – Dec 2012, co-advised with Professor Peter Bickel



Major Collaborators:

·       Peter Bickel, Statistics Department, UC Berkeley

·       Marisa Medina, Children’s Hospital Oakland Research Institute

·       Lewis Feldman, Plant & Microbial Biology, UC Berkeley

·       Lydia Sohn, Mechanical Engineering, UC Berkeley

·       Amy Herr, Bioengineering, UC Berkeley




            Undergraduate courses:

         STAT 152: Survey Sampling (Falls 2003 – 2006)

         BIOE/STAT C141: Statistics for Bioinformatics (Springs 2004 – 2008)

         STAT 131A: Statistical Inferences for Social and Life Scientists (Spring 2009)

         STAT 157: Seminar on Topics in Probability and Statistics (Fall 2009)

         STAT 133: Concepts in Computing with Data (Spring 2013)


   Master courses:

         STAT 200B: Introduction to Probability and Statistics at an Advanced Level (Springs 2006, 2007, 2011, 2012)

STAT 201B: Introduction to Probability and Statistics at an Advanced Level (Falls 2013, 2014, 2016, 2017)


   PhD courses:

               STAT 215A: Statistical Models: Theory and Application (Fall 2011)

               STAT 215B: Statistical Models: Theory and Application (Spring 2017)

STAT 210A: Theoretical Statistics (Falls 2008 – 2010)

               STAT 246: Statistical Genetics (Spring 2009; co-teaching with Prof. S Dudoit)

               STAT C245E/F: Statistical Genomics (Springs 2010, 2012, 2013, 2014; co-teaching with Prof. S Dudoit and Prof. R Nielson)

         STAT 272: Statistical Consulting (Fall 2010, Spring 2014, Fall 2014, Spring 2015, Spring 2016)


Pedagogical course:

STAT 375: Professional Preparation: Teaching of Probability and Statistics (Falls 2013, 2014)





* Corresponding or Co-corresponding author(s)


1.     Wang YXR, Liu K, Theusch E, Rotter JI, Medina MW, Waterman M*, Huang H* (2017). Generalized correlation measure using count statistics for gene expression data with ordered samples. Bioinformatics.  Accepted.

2.     Pimentel H, Hu ZT, Huang H* (2017). Biclustering by Sparse Canonical Correlation Analysis. Quantitative Biology. Accepted.

3.     Huang H*, Yu B* (2017). Data Wisdom in Computational Genomics Research. Statistics in Biosciences. 2017:1-16.

4.     Chang M, Womer FY, Edmiston KE, Bai C, Zhou Q, Jiang X, Wei S, Wei Y, Ye Y, Huang H$$, He Y, Xu K, Tang Y, Wang F (2017). Neurobiological Commonalities and Distinctions Among Three Major Psychiatric Diagnostic Categories: A Structural MRI Study. Schizophrenia Bulletin. 2017 Jun 13.

5.     Shi F, Huang H* (2017). Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach. Journal of Computational Biology. 2017 Jul 1;24(7):663-74.                                             

6.     Schiffman C, Lin C, Shi F, Chen L, Sohn L, Huang H* (2017). SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-cell RNA Sequencing data. Statistics in Biosciences. June 2017, Volume 9, Issue 1, pp 200–216.     

7.     Sinkala E, Sollier-Christen E, Renier C, Rosàs-Canyelles E, Che J, Heirich K, Duncombe TA, Vlassakis J, Yamauchi KA, Huang H$$, Jeffrey SS, Herr AE (2017). Profiling protein expression in circulating tumour cells using microfluidic western blotting. Nature Communications, Mar 23;8:14622.

  1. Wang YXR, Jiang K, Feldman LJ, Bickel PJ, Huang H* (2015). Inferring Gene-Gene Interactions and Functional Modules Using Sparse Canonical Correlation Analysis. Annals of Applied Statistics. 9(1): 300-323.

9.     National Research Council Committee on Predictive-Toxicology Approaches for Military Assessments of Acute Exposures (2015). “National Academy of Sciences: Application of Modern Toxicology Approaches for Predicting Acute Toxicity for Chemical Defense.” Washington, DC: The National Academies Press. (I served as an NRC Committee member)

  1. Wang YXR, Waterman MS*, Huang H* (2014). Gene Coexpression Measures in Large Heterogeneous Samples Using Count Statistics. Proc Natl Acad Sci. USA. 111(46):16371-6 [Paper Link] Commentary [Link]
  2. Kim K, Bolotin E, Theusch E, Huang H, Medina MW, Krauss RM (2014). Prediction of LDL Cholesterol Response to Statin using Transcriptomic and Genetic Variation. Genome Biology. 15(9): 460. [Paper Link]
  3. Li JJ, Huang H*, Bickel PJ*, Brenner S* (2014). Comparison of D. Melanogaster and C. Elegans Developmental Stages, Tissues, and Cells by modENCODE RNA-seq Data.  Genome Research. 24: 1084-1101. [Paper Link]
  4. Jiang CR, Liu CC, Zhou XJ, Huang H* (2014). Optimal Ranking in Multi-label Classification Using Local Precision Rates. Statistica Sinica. 24: 1547-1570. [preprint Link]
  5. Gerstein M, et al. (2014). Comparative Analysis of the Transcriptome across Distant Species. Nature. 512: 445-448. doi:10.1038/nature13424. [Paper Link]
  6. Boyle AP, et al. (2014). Comparative Analysis of Regulatory Information and Circuits across Distant Species. Nature. 512: 453-456. doi:10.1038/nature13668. [Paper Link]
  7. Wang YXR, Huang H*. (2014) Review on Statistical Methods for Gene Network Reconstruction Using Expression Data. Journal of Theoretical Biology. 362:53-61. doi:10.1016/j.jtbi.2014.03.040 [Paper Link]
  8. Xiong H, Brown JB, Boley N, Bickel PJ, Huang H* (2014). DE-FPCA: Testing Gene Differential Expression and Exon Usage through Functional Principal Component Analysis. In S. Datta and D. Nettleton, editors, "Statistical Analysis of Next  Generation Sequence Data (Frontiers in Probability and Statistical Science)." Springer. New York. (ISBN-13: 978-3319072111; ISBN-10: 3319072110)
  9. Li JJ, Huang H, Qian M, Zhang X (2014). Transcriptome Analysis Using Next-Generation Sequencing. Chapter in “Advanced Medical Statistics”. 2nd Edition. To appear. (Publication Date: December 20, 2014 | ISBN-10: 9814583294 | ISBN-13: 978-9814583299) [Book Link]
  10. Chapman MR, Balakrishnan KR, Li J, Conboy MJ, Huang H, Mohanty SK, Jabart E, Hack J, Conboy IM, Sohn LL (2013). Sorting single satellite cells from individual myofibers reveals heterogeneity in cell-surface markers and myogenic capacity. Integrative Biology. 5(4):692-702.
  11. ENCODE Consortium Project (2012). An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature. 489, 57-74.
  12. Kim K, Teng S, Jiang K, Feldman L, Huang H* (2012). Using biologically interrelated experiments to identify pathway genes in arabidopsis. Bioinformatics. 28(6), 815-822. [Paper Link]
  13. Gao Q, Ho C, Jia Y, Li JJ, Huang H* (2012). Biclustering of Linear Patterns in Gene Expression Data (CLiP). Journal of Computational Biology. 19(6), 619-631.
  14. Chapman MR, Balakrishnan K, Conboy MJ, Mohanty SK, Jabart E, Li J, Huang H,  Hack J, Conboy IM, and Sohn LL (2012). Label-free screening of niche-to-niche variation in Satellite stem cells using functionalized pores. A chapter in Nanopores for Bioanalytical Applications: Proceedings of the International Conference, Eds. J. Edel and T. Albrecht, RSC Publishing, pp. 38-42.
  15. Li JJ, Jiang CR, Brown BJ, Huang H*, Bickel PJ* (2011). Sparse Linear Modeling of RNA-seq Data for Isoform Discovery and Abundance Estimation. Proc Natl Acad Sci. USA. 108 (50) 19867-19872. [Paper Link]
  16. Li Q, Brown JB, Huang H, Bickel PJ. (2011). Measuring Reproducibility of High-throughput Experiments. Annals of Applied Statistics. 5(3), 1752-1779. [Paper Link]
  17. Li Y, Huang H and Cai L (2011) Prediction of Transcriptional Regulatory Networks for Retinal Development. A chapter in book “Computational Biology and Applied Bioinformatics” edited by Lopes HS and Cruz LM.
  18. Durinck S, Ho C, Wang NJ, Liao W, Jakkula LR, Collisson EA, Pons J, Chan SW, Lam ET, Chu C, Park K, Hong S, Hur JS, Huh N, Neuhaus IM, Yu SS, Grekin RC, Mauro  TM, Cleaver JE, Kwok P, LeBoit PE, Getz G, Cibulskis K, Aster JC, Huang H, Purdom  E, Li J, Bolund L, Arron ST, Gray JW, Spellman PT, Cho RJ (2011). Temporal Dissection of Tumorigenesis in Primary Cancers. Cancer Discovery, 1:137-143.
  19. Xu N, Bickel PJ, Huang H* (2010). Genome-wide Detection of Transcribed Regions through Multiple RNA Tiling Array Analysis. International Journal of Systems and Synthetic Biology. 1(2) 155-170.
  20. Huang H*, Liu C, Zhou XJ* (2010). Bayesian Approach to Transforming Public Gene Expression Repositories into Disease Diagnosis Databases. Proc Natl Acad Sci. USA. 107 (15) 6823-6828. [Paper Link]

·     This paper is selected for issue highlight by PNAS: http://www.pnas.org/content/107/15/6553.full.pdf+html (it is under the title "Gene databases mined for diagnoses").

·     This paper has been selected for Faculty of 1000 Biology (http://www.f1000biology.com) and evaluated by Dr. Russ Altman from Stanford University: http://www.f1000biology.com/article/id/3925957.  (Faculty of 1000 Biology is an award-winning online service that highlights and evaluates the most interesting papers published in the biological sciences, based on the recommendations of over 2000 of the world's top researchers.)

·     This paper has also been reported in the following news reports:

·       GenomeWeb daily news. “Team Develops Proof-of-Principle Diagnostic Database for Applying Public Gene Expression Data”. March 22, 2010.

·       National Cancer Institute. Research News. “Mathematical Modeling Turns Gene Expression Data into Disease Diagnostics” http://physics.cancer.gov/news/2010/april/po_news_b.asp

·       Biocentury and Nature publishing group. “GEO: world of diagnostic potential,” Haas, M.J. SciBX 3(14); April 8, 2010

  1. Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR (2010). Subsampling Methods for Genomic Inference. Annals of Applied Statistics. 4(4) 1660-1697.

(authors ordered alphabetically)

  1. Jiang K, Zhu T, Diao Z, Huang H, Feldman LJ. (2010). The Maize Root Stem Cell Niche: A Partnership between Two Sister Cell Populations. Planta, 231(2):411-24. [Paper Link]
  2. Bickel P, Brown B, Huang H, Li Q (2009). An overview of recent developments in genomics and associated statistical methods. Philosophical Transactions of the Royal Society A 367, 4313-4337. [Paper Link]

(authors ordered alphabetically)

  1. Teng S, Huang H* (2009). A statistical framework to infer functional gene associations from multiple biologically interrelated microarray experiments. Journal of the American Statistical Association, June 2009, Vol. 104, No. 486. [Paper Link]
  2. Wang F, Jiang T, Sun Z, Teng SL, Luo X, Zhu Z, Zang Y, Zhang H, Yue W, Hong N, Huang H, Blumberg H, Zhang, D (2009). Neuregulin 1 genetic variation and anterior cingulum integrity in patients with schizophrenia and healthy controls. Journal of Psychiatry & Neuroscience, 2009 May;34(3):181-6. [Paper Link]
  3. Liu C, Hu J, Kalakrishnan M, Huang H*, Zhou XJ* (2009) Integrative disease classification based on cross-platform microarray Data. BMC Bioinformatics, 2009 Jan;10 Suppl 1:S25. [Paper Link]
  4. Carbonaro A, Mohanty SK, Huang H, Godley LA and Sohn LL (2008) Cell characterization using a protein-functionalized Pore. Lab Chip, 8(9):1478-85. [Paper Link]
  5. Huang H, Cai L, Wong WH. (2008) Clustering analysis of SAGE transcription profiles using a Poisson approach. Methods Mol Biol. 2008; 387:185-98. (Book Chapter) [Paper Link]
  6. Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ (2007). Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics, 23(13):i222-i229. [Paper Link]
  7. ENCODE Consortium (2007)._ Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 447, 799-816. [Paper Link]
  8. Kim K, Zhang S, Jiang K, Cai L, Lee IB, Feldman LJ, Huang H* (2007). Measuring similarities between gene expression profiles through data transformations. BMC Bioinformatics, 8:29 (highly accessed paper). [Paper Link]
  9. Jiang K, Zhang S, Lee S, Tsai G, Kim K, Huang H, Zhu T, Feldman LJ (2006). Transcription profile analyses identify genes and pathways central to root cap functions in Maize. Plant Molecular Biology, 60(3):343-63. [Paper Link]
  10. Huang H, Kim K (2006). Unsupervised clustering analysis of gene expression, Chance, vol. 19, No.3. [Paper Link]
  11. Zhou XJ, Kao MJ, Huang H, Wong A, Nunez-Iglesias J, Aparicio OM, Morgan TE, Wong WH (2005). Functional annotation and network reconstruction through cross-platform integration of microarray data. Nature Biotechnology, 23(2):238-43. [Paper Link]
  12. Zhao X, Huang H, Speed T (2005). Finding short DNA motifs using permuted Markov models. Journal of Computational Biology, 12(6): 894-906 (journal version of the 2004 RECOMB paper; numbered as 16 below) [Paper Link]
  13. Huang H, Kao MJ, Zhou X, Liu JS, Wong WH (2004). Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification.” Journal of Computational Biology, 11(1):1-14. [Paper Link]
  14. Cai L#, Huang H#, Blackshaw S, Liu JS, Cepko CL, Wong WH (2004). Clustering analysis of SAGE data using a Poisson approach. Genome Biology, 5(7):R51. [Paper Link]

#Joint first authors

  1. Zhao X, Huang H, Speed T (2004). Finding short DNA motifs using permuted Markov models. Proceedings of RECOMB 2004. [Paper Link]
  2. Blackshaw S, Harpavat S, Trimarchi J, Cai L, Huang H, Kuo W, Fraioli R, Cho S, Yung R, Asch E, Wong WH, Cepko CL (2004). Genomic analysis of mouse retinal development. PLoS Biol, 2(9):E247. [Paper Link]
  3. Allinen M, Beroukhim R, Cai L, Brennan C, Domenici CJ, Huang H, Porter D, Hu M, Chin L, Richardson A, Schnitt S, Sellers W, Polyak K (2004). Molecular characterization of the tumor microenvironment in breast cancer. Cancer Cell, 6(1):17-32. [Paper Link]
  4. Lippert RA, Huang H, Waterman MS (2002). Distributional regimes for the number of k-word matches between two random sequences. Proc Natl Acad Sci. USA, 99(22):13980-9. [Paper Link]
  5. Huang H (2002). Error bounds on multivariate normal approximations for word count statistics. Advances in Applied Probability, 34(3): 559-586. [Paper Link]



Books Edited


  1. “Research in Computational Molecular Biology” (11th Annual International Conference, RECOMB 2007), edited by Terry Speed and Haiyan Huang, Published by Springer. [Book Link]
  2. “Inaugural DahShu Data Science Symposium: Computational Precision Health (CPH 2017)”, edited by Haiyan Huang, Hua Tang, et. al.. Special Issue published by the Journal of Computational Biology.




Articles Submitted or Manuscripts in Preparation


  1. Chang M, Edmiston KE, Womer FY, Zhou Q, Wei S, Jiang X, Ye Y, Huang H$$, Xu K, Tang Y, Wang F (2017).  Spontaneous Low-Frequency Fluctuations in Neural System for Emotional Perception in Major Psychiatric Diagnostic Categories: Amplitude Similarities and Differences across Frequency Bands. Submitted.
  2. Kang CC, Ward TM, Bockhorn J, Schiffman C, Huang H, Pegram MD and Herr AE (2017). Clinical Electrophoretic Pathology Investigates Oncoprotein Isoforms. Submitted to NPJ (Nature Partnering Journal) Precision Oncology.
  3. Liu K, et. al. (2017) Bagged semi-supervised gene clustering for context-specific gene pathway analysis. 
  4. Ho C, Jiang CR, Lee W, Huang H* (2017). Optimal decision making in hierarchical multi-label classification.
  5. Ren H, Qi C, Ma Q, Sun X, Wei Q, Liang JW, Li G, Zhang Z, Ren J, Yan J, Li G,   Xu W, Li X, Bai C, She X, Wu G, Huang H and McCormick S (2017). Fixation of expression divergences by natural selection in Arabidopsis coding genes.