Jim Pitman
|
pitman@stat.berkeley.edu |
Preliminary draft for comment. 12/9/02 |
Summary This is a proposal to construct a new means of organizing, communicating and archiving mathematical knowledge, by a faithful representation of that knowledge in cyberspace. The purpose is first of all to provide a peer-reviewed survey of all of mathematics, professionally organized, fully searchable, navigable and retrievable, continuously archived and updated, and available free online to anyone with Internet access, in perpetutity. This is to be achieved by creation of an electronic journal, The Mathematics Survey (or MathSurvey for short), which would be a multi-layered network of richly interlinked electronic survey journals, one in each branch of mathematics.
This is a specific proposal to construct and electronic survey journal in the field of mathematics. The proposal is to construct a new means of organizing, communicating and archiving mathematical knowledge, by a faithful representation of that knowledge in cyberspace. The purpose is first of all to provide a peer-reviewed survey of all of mathematics, professionally organized, fully searchable, navigable and retrievable, continuously archived and updated, and available free online to anyone with Internet access, in perpetutity. This is to be achieved by creation of an electronic journal, The Mathematics Survey (or MathSurvey for short), which would be a multi-layered network of richly interlinked electronic survey journals, one in each branch of mathematics. See also Best Current Practices: Recommendations on Electronic Information Communication by the Committee on Electronic Information Communcation (CEIC) of the International Mathematical Union (IMU), reproduced in Notices of the AMS Volume 49, Number 8, and the CEIC Recommedations on Information and Communication for general background about the digital revolution in scholarly communication, and advice to mathematicians about how the mathematical literature should be created, disseminated, and preserved in the electronic era.
As discussed in detail elsewhere, the structure of the peer-reviewed journal is a robust one which has served the academic community well over the last century, and survived intact despite being largely taken over by commercial interests in the last half century, and despite the change of medium from paper to electronic. That acknowledged, it should be preserved and adapted to the new paradigm of open access communication. The way to do that has already been tested and proved. It is the concept of an overlay journal, first developed over Ginsparg's e-print arXiv. This was the first digital repository, used initially just for e-prints, but now used as the archive supporting a number of electronic journals in mathematics and physics, including
One novelty of the present proposal is that a foundation layer of the MathSurvey be structured as an overlay journal on public archives, over which arbitrarily many further overlays can be constructed, for the benefits of different classes of readers. The potential of this structure for the creative development of expository material appears to be limited only by the imagination of the subnetworks of editors responsible for the creation of various layers.
A second MSC layer of the MathSurvey, would be based on the spanning tree structure of the 2000 Math Subject Classification(MSC), much like The Atlas of Mathematics developed by Dave Rusin. The differences would be that instead of the whole structure being controlled by a single individual,
The MathSurvey should also be developed in further layers, to organize and interlink the body of openly archived mathematics, in different ways limited only by the imagination of their creators. Most obviously, there could be layers which provided reorganizations of the MSC more agreeable to the editors. Across most fields there could be a theoretical layer lying above an applied layer, and further layers could be created for the benefit of different classes of users, who might be students or teachers at any level from primary school to graduate school, as well as post doctoral researchers and advanced reseachers. In fact, there already are digital libraries of mathematics at various levels, which could be incorporated into the MathSurvey as soon as they met an appropriate professional standard for content, and became openly archived. One such is the developed by the Mathematical Association of America with a grant from NSF. This system meets desired standards for content, but is not yet open access, due to insistence on registration and login for many users. Some other open knowledge networks in various branches and various levels of mathematics, which could be either incorporated into, or linked various levels of the MathSurvey, include the following:
On the web, to make reference without making a link is possible but ineffective - like speaking but with a paper bag over your head.
THREE RULES FOR EVOLVABLE SYSTEMSEvolvable systems - those that proceed not under the sole direction of one centralized design authority but by being adapted and extended in a thousand small ways in a thousand places at once - have three main characteristics that are germane to their eventual victories over strong, centrally designed protocols.
- Only solutions that produce partial results when partially implemented can succeed. The network is littered with ideas that would have worked had everybody adopted them. Evolvable systems begin partially working right away and then grow, rather than needing to be perfected and frozen. Think VMS vs. Unix, cc:Mail vs. RFC-822, Token Ring vs. Ethernet.
- What is, is wrong. Because evolvable systems have always been adapted to earlier conditions and are always being further adapted to present conditions, they are always behind the times. No evolving protocol is ever perfectly in sync with the challenges it faces.
- Finally, Orgel's Rule, named for the evolutionary biologist Leslie Orgel - ``Evolution is cleverer than you are'' ....
Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially. It's dinosaurs vs. mammals, and the mammals win every time. The Web is not the perfect hypertext protocol, just the best one that's also currently practical. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong and ultimately better designed than its competition.
On the other hand, competetion between different branches of the network, with regard to rate of growth of quality content, and the rate of liberation of research ejournals from the control of commercial agents, is something that will be very beneficial for the public domain, and is therefore to be encouraged.
The scholarly communication crisis The MathSurvey should grow in an organic way, as a creatively structured and universal network of mathematical knowledge, which would be the result of a major collaborative effort by the entire mathematical community. It should reach from the foundations of mathematics through its major trunks and well established branches to the twigs, leaves and flowers of current research achievements. That image in cyberspace of the entire body of mathematics should serve the community in numerous ways.
In view of recent initiatives to digitize the paper archive of mathematics, such as JSTOR and the Digital Mathematics Library, and especially if the latter project can be adequately funded to provide free access to digital representations of that part of the paper record refrerred to in articles of the MathSurvey, the only obvious limit to the range of the MathSurvey would be recent mathematical content excluded from it by improper archiving behind electronic toll gates, and even that may be eventually liberated by pressure on the gate keepers. The content of openly archived mathematics currently includes most existing free electronic journals. But, as pointed out by John Ewing in Predicting the future of scholarly publishing, that is a very small fraction of the total research output of the mathematical community. Nonetheless, the quantity of mathematics published over open archives is growing at an exponential rate which should increase as there is pressure on the majority of traditional journals which have gone to an paper plus electronic format to go to further to an electronic plus paper on demand format, with the likelihood that many such journals may become electronic only.
Once the MathSurvey starts growing, and this open knowledge network becomes used routinely as supplement to existing databases such as MathSciNet, it should demonstrate beyond reasonable doubt the importance of open access archiving. The entire mathematical community should then become committed to the perpetual maintenance of electronic journals and their underlying digital repositories. The thousands of editors involved in construction and maintenance of the MathSurvey should by then be empowered to ease the chronic communication crisis in mathematics due to appropriation and gating of research by commercial publishers: articles not freely linked to the universal open access platform provided by the MathSurvey would become marginalized and invisible to the majority of mathematicians.
This rapid growth of open content, and the ease of access to such content afforded by the MathSurvey, should soon persuade both professional societies and commercial publishers of the inevitability of eventual victory of the Free Online Scholarship movement. That recognition will complete the digital revolution in academic communication. For then all publishers of scholarly work will be obliged to open up their digital archives, or risk marginalization. Then open knowledge networks like the MathSurvey should grow naturally in all branches of human knowledge, and some of the current systemic problems of academia may become things of the past.
The archiving problem As explained by Raym Crow in the compelling Case for Institutional Repositories supported by SPARC (The Scholarly Publishing and Academic Resources Coalition), the current investment by the academic community in digital repositories, provides the basis for a disaggregated scholarly communication system capable of addressing many of the systemic problems of the current system, especially the serials crisis, and the archiving problem. Thus an initiative like the MathSurvey, which would create an intelligent overlay of those digital repositories, and hence encourage their use, should serve not only to ease the serials crisis, but also to help solve the archiving problem created by the exponential growth of mathematical literature.
The professional development problem A third aim of the MathSurvey is to provide an outlet for motivational expositions which should assist in solving the professional development problem in mathematics. That problem, of how to motivate and train new researchers, acknowledged by VIGRE (Grants for Vertical Integration of Research and Education in the Mathematical Sciences), a program of the NSF Division of Mathematical Sciences to increase the number of well-prepared U.S. citizens, nationals, and permanent residents who pursue careers in the mathematical sciences. One of the required components of the VIGRE program is for recipients to engage students at all levels ``in activities that introduce them to the thrill of discovery and generate within them excitement for the mathematical sciences'', and to disseminate accounts of VIGRE activities, experiences and insights. The structure of the MathSurvey would be ideal for the organization and archiving of such accounts, and should encourage similar activities to benefit the professional development of mathematicians all over the world.
The compartmentalization problem The profesional development problem is particularly acute in fields like probability and statistics, whose current vigor depends largely on their interactions with other branches of knowledge. For young researchers to get going in such interdisciplinary work they have to rapidly work their way up two or more trees in the forest of knowledge, to understand what is happening where the leaves of different trees are intermingling. The problem of training people for interdisciplinary work is made difficult by the current compartmentalization of knowledge, and perpetuated by the departmental structure of universities, the privatization of intellectual resources, and the lack of high quality guide and review material in most fields. If adequate survey and review materials were available free online in neighbouring subjects, appropriate links between those surveys should help to break down the barriers between them, and reduce the compartmentalization of knowledge.
Both historically and currently, the development of mathematics has been stimulated by interactions between its branches at a high level, and applications to other subjects. As an example of current interest, involving various branches of both mathematics and physics, where many researchers would appreciate efficient exposition of relevant material in fields other than their own, I mention the stunning recent developments and connections made in the the theory of stochastic Löwner equations, involving complex analysis, critical percolation, random spanning trees, Brownian motion, and conformal field theory. See for instance the recent St. Flour Lecture Notes by Wendelin Werner. A fourth aim of the MathSurvey is to further such exciting interactions and applications, and facilitate all kinds of interdisciplinary research involving mathematics, by efficient transfer of knowledge between workers in various branches of mathematics, and from mathematics to other fields.
In construction of the MathSurvey, the limiting commodity is time rather than money. Time must be spent by authors, editors and referees to produce a quality product, just as much for a survey journal as for a research journal. To build up and maintain the digital knowledge commons, scholars must be motivated to dedicate their time to that common enterprise, rather than to some other purpose. My motivation is simple enough. If I spend a month writing a survey article on a subject I know well, and I can persuade others to do the same in other subjects of interest to me but which I know less well, then I save myself many months of time spent struggling to learn other subjects with inadequate guidance. Initially it will take dedication to the cause by a few committed scholars who can motivate others to devote their time to the enterprise. But once even the bare skeleton of some tree of knowledge has been grown in cyberspace, for all to see, the merit of developing and maintaining such a tree in the public domain, and of propagating similar trees in other subjects, should become evident to all. By then, it will be a matter of some prestige to be editor of the the survey journal webpage devoted to some particular branch of knowledge. For that webpage will be the most visible and conspicious representation of that branch of knowledge in cyberspace: a professionally organized, peer-reviewed guide to what is currently regarded as important in that branch of knowledge, with links to other such guides in related branches.
Initially then, there is some risk involved in committing time to creation of a free electronic survey journal. Others might not contribute their share, and the project might falter. But once the journal has assembled a critical mass of well connected survey material, its further growth should be essentially self sustaining, as the journal becomes widely used and cited, and eventually acknowledged as the definitive survey of its field of knowledge. See the list of supporters of the MathSurvey for details of how you might be able to contribute to the MathSurvey.
In thinking about something as important as the archiving and perpetual preservation of some large body of human knowledge, we must identify and avoid mistakes of the past. Of these, there is no shortage.
The lesson of the The Library of Alexandria is well known. Not all the books in one library. Not all the bits on one server. For records in digital repositories, this problem has been addressed by first making multiple copies in widely distributed mirror sites, such as the worldwide system of mirrors of the Mathematics ArXiv, including the Centre pour la Communication Scientifique Directe (CCSD) run by the French CNRS, followed by development of suitable comparison protocols to detect any loss or degradation of the data. See for instance LOCKSS (Lots Of Copies Keeps Stuff Safe) A Permanent Web Publishing and Access System, and the article about this system in D-Lib Magazine, June 2001 by Vicky Reich and David S. H. Rosenthal.
One concern is the number of storage technologies besides paper which have now been discarded in favor of present methods of digital storage: e.g. microfiche, punch cards, .... However, as discussed earlier, the format for digital representation of scholarly work seems to have stabilized to the extent that the developer of any future format will be obliged to provide the necessary conversion tools. A more serious concern is that once a large amount of human knowledge has been committed to some form of digital storage, there is the fundamental issue of who has control over access, especially who might be able to change the conditions of use of software involved in that access. See Lawrence Lessig's book The future of ideas: the fate of the commons in a connected world for a full acknowledgement of closely related problems. The greatest danger is to allow the digital resource to have a single controlling agent: even a university or scholarly society or government, whose intentions at one time were good, might be inclined to impose a charge for the resource at some later time when the value of the resource had increased. The price of freedom is eternal vigilance. The only protection against such control is for the academic community to vigilantly ensure that at all times there are many universities and many governments involved with storage of copies of the archive, none of them having power to exclude any of the others. As long as enough parties maintain a commitment to storage of the archive, it should be possible in this way to maintain freedom of access to the archive in perpetuity.
In the end, at any given time, it is the current users and maintainers of the archive who have the main interest in its preservation. These users and maintainers must be vigilant to ensure that no third party ever gains a monopoly control over access, for example by charging for some software involved. In any case, such vigilance is a small price to pay to avoid what appears likely to happen if academia makes no serious attempt to build a knowledge commons over open archives. That would be an emerging dominance of control by privately owned archives such as ScienceDirect and Kluwer Journals, whose cost to the academic community would be unbounded by competitive forces as soon as the copyright to any substantial fraction of human knowledge was hoarded in those archives.