The Mathematics Survey Proposal

Jim Pitman
Departments of Mathematics and Statistics
The University of California, Berkeley

Preliminary draft for comment. 12/9/02

Summary   This is a proposal to construct a new means of organizing, communicating and archiving mathematical knowledge, by a faithful representation of that knowledge in cyberspace. The purpose is first of all to provide a peer-reviewed survey of all of mathematics, professionally organized, fully searchable, navigable and retrievable, continuously archived and updated, and available free online to anyone with Internet access, in perpetutity. This is to be achieved by creation of an electronic journal, The Mathematics Survey (or MathSurvey for short), which would be a multi-layered network of richly interlinked electronic survey journals, one in each branch of mathematics.

1  Introduction

In a companion article Two rules for scholarly communication: publish for the public, and keep the journals, I argue that the current scholarly communication crisis makes it imperative, for the preservation and continued growth of human knowledge, that the digital record of as much as possible of that knowledge be securely placed in an international collection of interoperable public archives, to make that knowledge freely accessible to anyone with an Internet connection, in perpetuity. To that end, the value of existing public archives should be compounded for posterity by building over them a well organized network of digital overlays which can be easily navigated for knowledge retrieval. This can be achieved, while preserving the valuable existing system of peer-reviewed scholarly journals, by the following two complimetary strategies: The purpose of the survey journals would be to fill in the trunks, branches and stems of the forest of knowledge whose leaves and flowers are the research journals. In each subject whose knowledge is so represented by a vast tree in cyberspace, a surge of interest and activity in that subject and its applications should result, as its store of knowledge becomes accessible to students and scholars in all subjects, and to the broader public defined by the Internet.

This is a specific proposal to construct and electronic survey journal in the field of mathematics. The proposal is to construct a new means of organizing, communicating and archiving mathematical knowledge, by a faithful representation of that knowledge in cyberspace. The purpose is first of all to provide a peer-reviewed survey of all of mathematics, professionally organized, fully searchable, navigable and retrievable, continuously archived and updated, and available free online to anyone with Internet access, in perpetutity. This is to be achieved by creation of an electronic journal, The Mathematics Survey (or MathSurvey for short), which would be a multi-layered network of richly interlinked electronic survey journals, one in each branch of mathematics. See also Best Current Practices: Recommendations on Electronic Information Communication by the Committee on Electronic Information Communcation (CEIC) of the International Mathematical Union (IMU), reproduced in Notices of the AMS Volume 49, Number 8, and the CEIC Recommedations on Information and Communication for general background about the digital revolution in scholarly communication, and advice to mathematicians about how the mathematical literature should be created, disseminated, and preserved in the electronic era.

2  Overlay Journals

As discussed in detail elsewhere, the structure of the peer-reviewed journal is a robust one which has served the academic community well over the last century, and survived intact despite being largely taken over by commercial interests in the last half century, and despite the change of medium from paper to electronic. That acknowledged, it should be preserved and adapted to the new paradigm of open access communication. The way to do that has already been tested and proved. It is the concept of an overlay journal, first developed over Ginsparg's e-print arXiv. This was the first digital repository, used initially just for e-prints, but now used as the archive supporting a number of electronic journals in mathematics and physics, including

A key feature of these free electronic journals is that the cost of their production has been borne by the academic community, supported in turn by government agencies such as NSF. These electronic journals are available for free to a broad public defined by access to the Internet. The benefits of such free access and wide distribution are obvious, and have been widely acclaimed by those who have embraced this electronic paradigm of scholarly communication. Beyond the investment in digital repositories, which has already been made by governments and universities, and the time spent in preparation of journal articles by authors, editors and referees, the cost of creation and maintance of such electronic journals is negligible, as evidenced by the growing number of free electronic journals with no print version, for instance which are overlaid on university repositories by independent editorial boards of scholars, mostly with no financial support from their institutions or any other source, apart from their regular academic salaries and the computer facilities now commonly available in large universities. I refer also to a list of over 50 currently free mathematical ejournals and a list of hundreds of mathematical ejournals, both free and gated.

One novelty of the present proposal is that a foundation layer of the MathSurvey be structured as an overlay journal on public archives, over which arbitrarily many further overlays can be constructed, for the benefits of different classes of readers. The potential of this structure for the creative development of expository material appears to be limited only by the imagination of the subnetworks of editors responsible for the creation of various layers.

3  The user interface

The foundation layer of the MathSurvey would be a peer-reviewed electronic journal, with a basic user interface similar to that of many mathematics ejournals such as Geometry and Topology and The Electronic Journal of Probability. That is to say, articles of various size and scope, which might range from a one page list of important books and references in some subject, to a several hundred page monograph, would be posted on the web as accepted, with bundling into volumes of convenient size for web page display. This foundation layer would be an overlay journal built over one or more of the growing family of open access digital repositories now available, such as the Mathematics ArXiv, and California Digital Library(CDL).

A second MSC layer of the MathSurvey, would be based on the spanning tree structure of the 2000 Math Subject Classification(MSC), much like The Atlas of Mathematics developed by Dave Rusin. The differences would be that instead of the whole structure being controlled by a single individual,

The opening page of the MSC layer would have a list of names and perhaps logos of currently organized subsurveys, say The Algebraic Geometry Survey, The Geometry Survey, The Probability Survey, and so on, according to the first two digits of the MSC. Another webpage with similar function is the page of Geometry and Topology which displays the logos of its collection of journals and monographs. From the opening page of the MSC layer, clicking on a subject line would lead to a major Branch Survey page, which would typically list further subsurveys. For instance, The Probability Survey would contain a link to a Survey of Stochastic Processes, which would in turn be linked to a Survey of Inference for Stochastic Processes. That survey, located three clicks away from the opening page of the MSC layer of the MathSurvey, might also be found two clicks away via The Statistics Survey. The structure of web pages of the MSC layer of the MathSurvey would thus respect the connections between different branches of mathematics. The page corresponding to each major branch of mathematics would contain pointers to selected content of that branch. These might be books, or links to survey articles in the foundation layer, or other articles located anywhere in the international system of open archives. This structure would be provided by editors according to their views of the subject. Those views would collectively define the global geometry of this faithful representation in cyberspace of the entire manifold of mathematics. the MathSurvey could thus provide a free online navigable encyclopedia of all of mathematics, for the benefit of all those interested in developing and extending their knowledge of the subject.

The MathSurvey should also be developed in further layers, to organize and interlink the body of openly archived mathematics, in different ways limited only by the imagination of their creators. Most obviously, there could be layers which provided reorganizations of the MSC more agreeable to the editors. Across most fields there could be a theoretical layer lying above an applied layer, and further layers could be created for the benefit of different classes of users, who might be students or teachers at any level from primary school to graduate school, as well as post doctoral researchers and advanced reseachers. In fact, there already are digital libraries of mathematics at various levels, which could be incorporated into the MathSurvey as soon as they met an appropriate professional standard for content, and became openly archived. One such is the developed by the Mathematical Association of America with a grant from NSF. This system meets desired standards for content, but is not yet open access, due to insistence on registration and login for many users. Some other open knowledge networks in various branches and various levels of mathematics, which could be either incorporated into, or linked various levels of the MathSurvey, include the following:

The material in these sources is of uneven quality, in particular what is in the encyclopedias is mostly not sufficiently advanced to be meaningfully connected to what is in the ejournals. Some high quality content can be found in gated areas of cyberspace such as the Kluwer Encyclopedia of Mathematics and ScienceDirect, which provides access to more than 30 million abstracts and links to over 10,000 titles from various scientific, technical and medical publishers, through a gate which Reed Elsevier charges $8,000,000 per year to keep open to researchers at the University of California. But this, and other high cost services provided by global media conglomerates, are simply closed to most smaller universities in the U.S. and even to many larger universities in other countries. Even if you have access to them they are not properly linked to the free online universe. Articles behind these gates can only be references, not links. As remarked by Tim Berners-Lee, creator of the World Wide Web, in Links and Law: Myths,
On the web, to make reference without making a link is possible but ineffective - like speaking but with a paper bag over your head.

4  The Network of Mathematics Editors

It is expected that the family of Branch Survey journals comprising the MathSurvey will develop gradually over time, as various individuals and institutions are motivated to organize different portions of a vast editorial network dedicated to the MathSurvey. Each editorial branch network, responsible for a particular branch of mathematics, should typically be organized by a single Branch Editor. The Branch Editor would be responsible for recruiting and maintaining an appropriately sized network of editors whose expertise encompassed that branch of mathematics, and would act as the chief editor of the survey journal in that branch. Typically, a Branch Editor might organize a group of 5-20 Core Editors dedicated to the organization of that branch of the survey, and a larger network of 50-100 Associate Editors, each responsible for one or more sub-branches. To set the initial standard of a branch of the editorial network, the Branch Editor should initially try to attract as many members as possible of the current editorial boards of top research journals in that branch of mathematics. It is expected that the core of editors in each branch should be in close communication with each other and the Branch Editor, and that each Associate Editor would maintain close contact with at least one Core Editor about organizational issues. Each Branch Survey should be run by its branch network of editors as an essentially autonomous journal, except that articles accepted by all branches of the survey would be subject to a common regristration and certification procedure involving an interaction between the editors and the system of digital repositories over which the MathSurvey is laid.

5  Evolution of the survey

Essential to the eventual success of a massive network of public ejournals, in challenging the competing private systems offered by the global media conglomerates, is the idea that the free ejournal system be allowed to grow freely and locally without any global control apart from some agreement of compatibility between various parts of the network. That method of growth, exploited by Tim Berners-Lee in his enormously successful creation of the World Wide Web, can naturally be adapted to a web overlay like the free ejournal network. To summarize lessons learned from the phenomenal success of the World Wide Web, I quote the following, from Clay Shirky's In praise of evolvable systems:


Evolvable systems - those that proceed not under the sole direction of one centralized design authority but by being adapted and extended in a thousand small ways in a thousand places at once - have three main characteristics that are germane to their eventual victories over strong, centrally designed protocols.

Centrally designed protocols start out strong and improve logarithmically. Evolvable protocols start out weak and improve exponentially. It's dinosaurs vs. mammals, and the mammals win every time. The Web is not the perfect hypertext protocol, just the best one that's also currently practical. Infrastructure built on evolvable protocols will always be partially incomplete, partially wrong and ultimately better designed than its competition.

5.1  Freedom, quality and control

To adapt these rules to success of the MathSurvey, one subject must first grow its survey tree in cyberspace to demonstrate "proof of concept". That prototype should be supported and certified by professional organizations, but not owned or controlled by them, to ensure its quality, prestige, and independent existence. Then others should initiate similar growth of digital knowledge trees in their subjects, and link those trees as well as possible, as they are motivated to do so. This will be quite a chaotic growth process, proceeding rapidly in some subjects and slowly in others. It is futile to attempt to control it by some overarching authority. It must rather be controlled locally, throughout a vast network of individual editors, by each editor being reponsible for the content of their own webpage, and its links to neighbouring pages. The process of appointment of editors is likely to be at first informal, through professional connections and mutual respect. But as time passes, and the shape of the digital knowledge forest stabilizes, it is to be expected that various trees and subtrees in the forest will become affiliated with professional societies and organizations, and that such affiliations will serve both to certify content of the knowledge network, and to assist the recruitment of new editors. Such affiliations may also serve to allow parts of the free knowledge network to use of the expertise in both paper and electronic publication possessed by many professional organizations. But I urge that the editors and creators of the knowledge network resist any attempt by professional organizations to control or limit the growth of the network. The policy by professional organizations in the last century of limiting the growth of their paper journals led first to the commercialization of paper journals, then to the monopolistic control of those journals by the global media conglomerates. It is precisely to reverse that trend, and diffuse the control of high quality digital representations of knowledge, that I advocate creation of the free global knowledge network built over public archives. To adequately grow and preserve that network in perpetuity, the editors with local control of that network must simply dedicate themselves to maintaining the quality of their portion of the network, and trust that others will do the same in their portions of the network.

5.2  Probability, Mathematics, and Science

The probability community forms a small but relatively important part of the broader mathematical community, which in turn is small but very important part of the scientific community. Probabilty is important because it provides ideas which are used in other branches of mathematics such as combinatorics, statistics, and computer science, and in other branches of science such as physics and biology. Mathematics will always be important, as it provides the language for quantitative and qualitative theories of all kinds in all branches of science. Probability is a good place to start the revolutionary campaign. There is a small, tightly knit community of probabilists, with a strong sense of the identity and importance of their subject in the larger scheme of mathematics and science. It is my intention to mobilize the probability community behind The Probability Survey, a project to create a free digital knowledge network in probability theory. As this catches on, it should motivate scholars in other branches of mathematics to create The Statistics Survey, The Combinatorics Survey, and so on, all under a common umbrella of The Mathematics Survey. I leave it to motivated individuals in other subjects to initiate their own free electronic surveys to expose the infrastructure of various trunks, branches and stems of knowledge underlying the ensemble of journals in their subject, and finally to attach the leaves to this digital forest of knowledge, by liberation of the research journals one by one.

5.3  Cooperation and competition

The structure of independent survey journals in different fields, or in different parts of the same field, can simply be cloned and replicated, much as Kluwer has done in creation of its vast portfolio of Kluwer Journals across all of science. Different branches of the survey journal network could also use different designs or formats, as long as they were overlaid on public archives using the basic linking structure of the World Wide Web. Indeed, innovative designs and formats within the basic paradigm should be welcomed and encouraged, and the best of them copied in expanding areas of the network. Thus on technical matters of design and architecture of the knowledge network, cooperation should be the norm, with the aim of expanding the public knowledge network as rapidly as possible to compete with the existing private knowledge networks of the big information providers.

On the other hand, competetion between different branches of the network, with regard to rate of growth of quality content, and the rate of liberation of research ejournals from the control of commercial agents, is something that will be very beneficial for the public domain, and is therefore to be encouraged.

6  Further Motivation

The MathSurvey should assist in solving four systemic problems of academia, at least within the field of mathematics: Moreover, if successful in mathematics, the basic strategy should be replicable in other fields of science and human knowledge, where the same problems abound.

The scholarly communication crisis   The MathSurvey should grow in an organic way, as a creatively structured and universal network of mathematical knowledge, which would be the result of a major collaborative effort by the entire mathematical community. It should reach from the foundations of mathematics through its major trunks and well established branches to the twigs, leaves and flowers of current research achievements. That image in cyberspace of the entire body of mathematics should serve the community in numerous ways.

In view of recent initiatives to digitize the paper archive of mathematics, such as JSTOR and the Digital Mathematics Library, and especially if the latter project can be adequately funded to provide free access to digital representations of that part of the paper record refrerred to in articles of the MathSurvey, the only obvious limit to the range of the MathSurvey would be recent mathematical content excluded from it by improper archiving behind electronic toll gates, and even that may be eventually liberated by pressure on the gate keepers. The content of openly archived mathematics currently includes most existing free electronic journals. But, as pointed out by John Ewing in Predicting the future of scholarly publishing, that is a very small fraction of the total research output of the mathematical community. Nonetheless, the quantity of mathematics published over open archives is growing at an exponential rate which should increase as there is pressure on the majority of traditional journals which have gone to an paper plus electronic format to go to further to an electronic plus paper on demand format, with the likelihood that many such journals may become electronic only.

Once the MathSurvey starts growing, and this open knowledge network becomes used routinely as supplement to existing databases such as MathSciNet, it should demonstrate beyond reasonable doubt the importance of open access archiving. The entire mathematical community should then become committed to the perpetual maintenance of electronic journals and their underlying digital repositories. The thousands of editors involved in construction and maintenance of the MathSurvey should by then be empowered to ease the chronic communication crisis in mathematics due to appropriation and gating of research by commercial publishers: articles not freely linked to the universal open access platform provided by the MathSurvey would become marginalized and invisible to the majority of mathematicians.

This rapid growth of open content, and the ease of access to such content afforded by the MathSurvey, should soon persuade both professional societies and commercial publishers of the inevitability of eventual victory of the Free Online Scholarship movement. That recognition will complete the digital revolution in academic communication. For then all publishers of scholarly work will be obliged to open up their digital archives, or risk marginalization. Then open knowledge networks like the MathSurvey should grow naturally in all branches of human knowledge, and some of the current systemic problems of academia may become things of the past.

The archiving problem   As explained by Raym Crow in the compelling Case for Institutional Repositories supported by SPARC (The Scholarly Publishing and Academic Resources Coalition), the current investment by the academic community in digital repositories, provides the basis for a disaggregated scholarly communication system capable of addressing many of the systemic problems of the current system, especially the serials crisis, and the archiving problem. Thus an initiative like the MathSurvey, which would create an intelligent overlay of those digital repositories, and hence encourage their use, should serve not only to ease the serials crisis, but also to help solve the archiving problem created by the exponential growth of mathematical literature.

The professional development problem   A third aim of the MathSurvey is to provide an outlet for motivational expositions which should assist in solving the professional development problem in mathematics. That problem, of how to motivate and train new researchers, acknowledged by VIGRE (Grants for Vertical Integration of Research and Education in the Mathematical Sciences), a program of the NSF Division of Mathematical Sciences to increase the number of well-prepared U.S. citizens, nationals, and permanent residents who pursue careers in the mathematical sciences. One of the required components of the VIGRE program is for recipients to engage students at all levels ``in activities that introduce them to the thrill of discovery and generate within them excitement for the mathematical sciences'', and to disseminate accounts of VIGRE activities, experiences and insights. The structure of the MathSurvey would be ideal for the organization and archiving of such accounts, and should encourage similar activities to benefit the professional development of mathematicians all over the world.

The compartmentalization problem   The profesional development problem is particularly acute in fields like probability and statistics, whose current vigor depends largely on their interactions with other branches of knowledge. For young researchers to get going in such interdisciplinary work they have to rapidly work their way up two or more trees in the forest of knowledge, to understand what is happening where the leaves of different trees are intermingling. The problem of training people for interdisciplinary work is made difficult by the current compartmentalization of knowledge, and perpetuated by the departmental structure of universities, the privatization of intellectual resources, and the lack of high quality guide and review material in most fields. If adequate survey and review materials were available free online in neighbouring subjects, appropriate links between those surveys should help to break down the barriers between them, and reduce the compartmentalization of knowledge.

Both historically and currently, the development of mathematics has been stimulated by interactions between its branches at a high level, and applications to other subjects. As an example of current interest, involving various branches of both mathematics and physics, where many researchers would appreciate efficient exposition of relevant material in fields other than their own, I mention the stunning recent developments and connections made in the the theory of stochastic Löwner equations, involving complex analysis, critical percolation, random spanning trees, Brownian motion, and conformal field theory. See for instance the recent St. Flour Lecture Notes by Wendelin Werner. A fourth aim of the MathSurvey is to further such exciting interactions and applications, and facilitate all kinds of interdisciplinary research involving mathematics, by efficient transfer of knowledge between workers in various branches of mathematics, and from mathematics to other fields.

7  Wanted: a thousand good editors, and ten thousand good authors

In construction of the MathSurvey, the limiting commodity is time rather than money. Time must be spent by authors, editors and referees to produce a quality product, just as much for a survey journal as for a research journal. To build up and maintain the digital knowledge commons, scholars must be motivated to dedicate their time to that common enterprise, rather than to some other purpose. My motivation is simple enough. If I spend a month writing a survey article on a subject I know well, and I can persuade others to do the same in other subjects of interest to me but which I know less well, then I save myself many months of time spent struggling to learn other subjects with inadequate guidance. Initially it will take dedication to the cause by a few committed scholars who can motivate others to devote their time to the enterprise. But once even the bare skeleton of some tree of knowledge has been grown in cyberspace, for all to see, the merit of developing and maintaining such a tree in the public domain, and of propagating similar trees in other subjects, should become evident to all. By then, it will be a matter of some prestige to be editor of the the survey journal webpage devoted to some particular branch of knowledge. For that webpage will be the most visible and conspicious representation of that branch of knowledge in cyberspace: a professionally organized, peer-reviewed guide to what is currently regarded as important in that branch of knowledge, with links to other such guides in related branches.

Initially then, there is some risk involved in committing time to creation of a free electronic survey journal. Others might not contribute their share, and the project might falter. But once the journal has assembled a critical mass of well connected survey material, its further growth should be essentially self sustaining, as the journal becomes widely used and cited, and eventually acknowledged as the definitive survey of its field of knowledge. See the list of supporters of the MathSurvey for details of how you might be able to contribute to the MathSurvey.

8  Cautionary tales

In thinking about something as important as the archiving and perpetual preservation of some large body of human knowledge, we must identify and avoid mistakes of the past. Of these, there is no shortage.

The lesson of the The Library of Alexandria is well known. Not all the books in one library. Not all the bits on one server. For records in digital repositories, this problem has been addressed by first making multiple copies in widely distributed mirror sites, such as the worldwide system of mirrors of the Mathematics ArXiv, including the Centre pour la Communication Scientifique Directe (CCSD) run by the French CNRS, followed by development of suitable comparison protocols to detect any loss or degradation of the data. See for instance LOCKSS (Lots Of Copies Keeps Stuff Safe) A Permanent Web Publishing and Access System, and the article about this system in D-Lib Magazine, June 2001 by Vicky Reich and David S. H. Rosenthal.

One concern is the number of storage technologies besides paper which have now been discarded in favor of present methods of digital storage: e.g. microfiche, punch cards, .... However, as discussed earlier, the format for digital representation of scholarly work seems to have stabilized to the extent that the developer of any future format will be obliged to provide the necessary conversion tools. A more serious concern is that once a large amount of human knowledge has been committed to some form of digital storage, there is the fundamental issue of who has control over access, especially who might be able to change the conditions of use of software involved in that access. See Lawrence Lessig's book The future of ideas: the fate of the commons in a connected world for a full acknowledgement of closely related problems. The greatest danger is to allow the digital resource to have a single controlling agent: even a university or scholarly society or government, whose intentions at one time were good, might be inclined to impose a charge for the resource at some later time when the value of the resource had increased. The price of freedom is eternal vigilance. The only protection against such control is for the academic community to vigilantly ensure that at all times there are many universities and many governments involved with storage of copies of the archive, none of them having power to exclude any of the others. As long as enough parties maintain a commitment to storage of the archive, it should be possible in this way to maintain freedom of access to the archive in perpetuity.

In the end, at any given time, it is the current users and maintainers of the archive who have the main interest in its preservation. These users and maintainers must be vigilant to ensure that no third party ever gains a monopoly control over access, for example by charging for some software involved. In any case, such vigilance is a small price to pay to avoid what appears likely to happen if academia makes no serious attempt to build a knowledge commons over open archives. That would be an emerging dominance of control by privately owned archives such as ScienceDirect and Kluwer Journals, whose cost to the academic community would be unbounded by competitive forces as soon as the copyright to any substantial fraction of human knowledge was hoarded in those archives.

File translated from TEX by TTH, version 3.13.
On 22 Sep 2003, 16:12.