Chris Paciorek - Web Philosophy

A bit (as in bits and pieces) of my philosophy about posting research-related materials on the web.

1.) Code: If you're proposing a new statistical method, you should absolutely post code on the web for carrying out the technique. This doesn't have to be commercial-level code, and could even be production code with limited comments, but it should be enough that a knowledgeable statistician and programmer can figure out what you did, including all the detailed steps.

Ideally, I think this code should be in R. This allows people to try it out and figure out what you've done in a widely-known syntax. Splus or Matlab seem to me to be somewhat less desirable as they rely on commercial software, but the languages are widely known. C or C++ or Fortran seem to me to be quite a bit less desirable because such code can be rather obscure and because of the effects of computer system differences.

2.) Data: If you compare statistical methods or propose your own method and assess the method(s) on simulated or real datasets, you should post the datasets on the web so that people can replicate your results and use the same data in their own comparisons.

3.) MCMC: If you do an analysis or propose a method that makes use of Markov chain Monte Carlo, a major issue is whether the chain is mixing adequately. Most papers address this topic obliquely, often by mentioning that, 'mixing was assessed by X and judged to be adequate ...'. This makes it hard to assess whether this is true and hard to compare mixing between methods. If point #1 above is followed, that will help in this regard, allowing others to replicate results and see the intricate details of the MCMC. One might also consider posting on the web the iterations of the chain for key parameters.

4.) Text: Posting your papers, technical reports, and presentations on the web is a great way to increase scientific communication. As a field, computer science, including machine learning, is much better about this than statistics, and I think that goes a long way towards making their field more dynamic than ours. (That and faster publishing via conference publications, but that's another issue.)

Once other researchers know who you are and what area you work in, they are likely to look on the web to get a feel for the sort of work you are currently doing. In particular, I suggest this to graduate students who are just starting to see projects come to fruition during their later years in graduate school. Because senior researchers have built up a professional network, others often know what they are up to, but for junior researchers this isn't the case, and the web is a great resource.

Last updated: November 2006.

Chris Paciorek's Professional Webpage