Frederick (a.k.a. Erick) A. Matsen
Miller institute fellow
UC Berkeley Dept. Statistics
367 Evans Hall #429
Berkeley, CA 94720-3860
USA
ph: +1 510 642 2450
fax: +1 510 642 7892


Research focus

I develop mathematical techniques and computer algorithms to improve our understanding of evolution. My current research is motivated by two main questions: first, and how can we (better) reconstruct evolutionary history from present-day DNA sequences? Second, how do organisms diversify (e.g. speciate)?

How can we (better) reconstruct evolutionary history from present-day DNA sequences?

This is a very big and somewhat old question, with hundreds of scientists working on different aspects. The field which has developed is called phylogenetics. The general idea is that organisms with similar DNA sequences are usually more closely related than organisms which quite different DNA sequences. Making this formal and then running many computations on a computer leads to a tree diagram showing interelationships; this diagram is called a phylogenetic tree. My contributions to this big project are in two areas: phylogenetic mixtures and theoretical analysis of Bayesian methods.

Mixtures: It is well established from a theoretical perspective that if sequences evolve under a single (simple) model then a large amount of sequence data will reconstruct the tree correctly with high probability. However, it is now known that different parts of a sequence evolve in different ways; this is formulated statistically as a phylogenetic mixture model. In contrast to the single-process case, it is known that data from mixtures of processes does not uniquely determine a tree. Mike Steel and I recently realized that even more is true: it is possible to have a mixture of two processes on one tree such that the resulting data looks exactly like a single process on a different tree. I'm now interested in whether these sorts of issues really do pose a problem for phylogenetics researchers. Recent work with Mossel and Steel partially addresses this question through a combination of geometric and combinatorial means.

Bayesian: There are many different ways of building phylogenetic trees, and one class of such methods are called Bayesian methods. One advantage of these methods is that they can give posterior probabilities, which are (more or less) an estimate of how correct certain parts of the tree are. However, it can happen that even if there is no actual evidence determining how a certain set of species evolved, the methods can choose one scenario and attach a very high posterior probability to the story. This problem is called the “star tree paradox,” and Mike Steel and I recently showed analytically that it can persist even when the methods are given arbitrarily long DNA sequences.

How do organisms diversify?

This is an even older question, with a correspondingly bigger literature. My focus is on only one approach, which is based on looking at "shape" or overall structure of tree phylogenetic trees. A quick review of a couple virus trees show that different evolutionary scenarios can lead to different tree shapes.

In order to use tree shape in a scientific fashion, we need ways of quantifying it. So far I have written about tree shape in three ways: geometric, algebraic/combinatorial, and recursive. The recursive (optimization) approach has been the most productive for applications to data. I am currently applying this framework with Katherine St. John to search for evidence of tree reconstruction bias in modern tree reconstruction algorithms. I am also collaborating with Alexei Drummond applying these techniques to test for coalescent model mis-specification.

Other projects

In the past, John Wakeley and I investigated a class of models between the lattice model and the island model, and were able to show that these models converged back to the island model when the number of subpopulations goes to infinity. For this project I applied some nice theory about random walks on graphs.

I have also worked on the evolution of language with Martin Nowak. Rather than approach learning theory from the classical angle of an idealized teacher-learner pair, we investigated a model where the agents try to find a common language. We found some remarkably simple individual strategies which led to the population finding a common language with high probability given some constraints on the underlying space of languages.



Publications

[PDF] D. Ford, T. Gernhard, and F. A. Matsen. A method for investigating relative timing information on phylogenetic trees. arXiv:0803.1510, 2008.
[PDF] F. A. Matsen. Fourier transform inequalities for phylogenetic trees. arXiv:0711.3492, 2008.
[PDF] F. A. Matsen, E. Mossel, and M. Steel. Mixed-up trees: the structure of phylogenetic mixtures. arXiv:0705.4328 [q-bio.PE], 2007.
[PDF] F.A. Matsen and M. Steel. Phylogenetic mixtures on a single tree can mimic a tree of another topology. Sys. Bio., 56:767–775, Oct 2007.
[PDF] M. Steel and F. A. Matsen. The Bayesian star paradox persists for long finite sequences. Molecular Biology and Evolution, 24(4):1075–1079, April 2007.
[PDF] F.A. Matsen. Optimization over a class of tree shape statistics. IEEE/ACM Trans Comput Biol Bioinform, 4:506–512, 2007.
[PDF] F.A. Matsen and S.N. Evans. Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials. arXiv:q-bio/0512010, 2006.
[PDF] F.A. Matsen. A geometric approach to tree shape statistics. Systematic Biology, 55(4):652–661, 2006.
[PDF] F.A. Matsen and J. Wakeley. Convergence to the island-model coalescent process in populations with restricted migration. Genetics, 172(1):701–708, January 2006.
[PDF] F.A. Matsen and M.A. Nowak. Win-stay, lose-shift in language learning from peers. PNAS, 101(52):18053–18057, December 2004. Commentary by K. Sigmund.
Software

simmons     My software to compute tree shape statistics.
alga, etc.     The source code for the genetic algorithm and related software described in Optimization…

[Join the
FSF!]


Last modified March 12, 2008.


This document was translated from LATEX by HEVEA.