My research interests have shifted in recent years from mathematical problems at the interface of stochastic processes and to random combinatorial structures, to more practical problems of academic information management. A bridge between these interests is provided by probabilistic models for random partitions and related stuctures such as random graphs which may serve as the basis of machine-learning algorithms for automated classification and entity recognition in the bibliographic universe. The importance of connecting the combinatorics of large graphs to methods of organizing the world's information is amply demonstrated by the development and industrial application of the page rank algorithm by Google. Much remains to be discovered about how best to structure and manage the deluge of academic information now available to researchers. I expect that ideas from combinatorics, probability and machine learning will continue to find important applications in this endeavor.
I have long been interested in interfaces between the traditional theory of stochastic processes and other areas of mathematics, especially combinatorics. I have studied various random combinatorial objects, such as permutations, partitions, and trees, and how the asymptotic behaviour of such structures over a large number of elements can be described in probabilistic terms, most often involving Brownian motion and related processes. This has led to the study of various measure-valued and partition-valued Markov processes whose behaviour may be understood in terms of combinatorial constructions involving random trees. I have been engaged in developing various ideas related to random partitions, random trees, irreversible processes of coalescence, and their time reversals which provide models for random splitting or fragmentation. I view this line of research largely as pure mathematics, but mathematics of a concrete kind which is often motivated and influenced by applications. Stochastic models with a natural probabilistic structure typically turn up in different disguises in diverse fields. The study of their mathematical structure allows ideas and results developed in one context to be transferred to another. Following is a selection of articles in this vein over the last 4 years.
Bénédicte Haas, Grégory Miermont, Jim Pitman and Matthias Winkel
Continuum tree asymptotics of discrete fragmentations and applications to phylogenetic models
Ann. Probab. Vol. 36 No. 5, 1790-1837 (2008). [arXiv] [DOI] [MR?] [GS?]
Bénédicte Haas, Jim Pitman and Matthias Winkel
Spinal partitions and invariance under re-rooting of continuum random trees
Ann. Probab. Vol. 36 No. 5, 1790-1837 (2008). [arXiv] [DOI] [MR?] [GS?]
Jim Pitman and Matthias Winkel
Regenerative tree growth: binary self-similar continuum random trees and Poisson-Dirichlet compositions
Ann. Probab. Vol. 37 No. 5, 1999-2041 (2009). [pdf] [Project Euclid] [arXiv] [DOI] [GS?]
Peter McCullagh, Jim Pitman and Matthias Winkel
Gibbs fragmentation trees
Bernoulli Vol. 14 No. 4, 988-1002 (2008). [arXiv] [DOI] [MR?] [GS?]
Jomy Alappattu and Jim Pitman
Coloured loop-erased random walk on the complete graph
Combinatorics, Probability and Computing Vol. 17, 727-740 (2008). [.pdf] [arXiv] [MR?] [GS?]
Berestycki, Nathanaël and Pitman, Jim
Gibbs distributions for random partitions generated by a fragmentation process
J. Stat. Phys. Vol. 127 No. 2, 381--418 (2007). [arXiv] [DOI] [MR] [GS?]
In 2008 I organized submission of a collaborative proposal to the NSF Cyber-enabled Discovery and Innovation (CDI) Program to develop a suite of tools and services to encourage formation of virtual organizations in scientific communities of various sizes, such as conference groups and departmental research groups, and allow such organizations to filter out relevant documents from various input streams, select and enhance the quality of bibliographic data associated with the organization, and attract students, teachers and researchers to contribute to activity of the organization. The main partners are the American Institute of Mathematics, a group at the Institute of Quantitative Science at Harvard University, and a group in Berkeley Statistics. Some further individuals involved are Hadley Wickham at Rice and Jon Willinsky at Stanford.
This project was funded in September 2008, and work has been proceeding since then. See the Project Website for further details.
The project addresses three fundamental problems of knowledge management:
The idea is to solve these problems by gradually distilling the wealth of heterogeneous data now available in digital formats into an openly navigable network of websites, the Bibliographic Knowledge Network, each node of which is a website dedicated to a specific topic or field of knowledge. Each participating site will be maintained by some individual or Virtual Organization with a commitment to that field. Sites may be designed as guides for researchers, teachers, and students, or they may provide more specialized services, such as gateways to connect other internet resources.
Accomplishments of the project todate have been convergence on the specification of BibJSON as a bibliographic data format, success with the use of CouchDB as a bibliographic database compatible with the project needs, and development of a preliminary system of bibliographic web services. I have also been working with Berkeley students Jeff Regier, James Long and Fei Yu on various aspects of machine-aided disambiguation and matching of various bibliographic entities, especially authors and subjects.