StatLib StatLib is an archive of statistical software, data sets, and other information maintained at Carnegie Mellon University in Pittsburgh, PA. The materials available can be accessed via electronic mail [e-mail], file transfer protocol [ftp], gopher, and World Wide Web [WWW] browsers such as Netscape. These materials are organized into the following collections: apstat: consists of a nearly complete set of the FORTRAN, ALGOL, and Pascal algorithms published in the journal Applied Statistics; see also the entry for "griffiths-hill" asaboard: contains material related to various activities of the American Statistical Association [ASA] Board of Directors asacert: contains various versions of the ASA proposal for the certification of statisticians, the results of a survey of the ASA membership on certification, a list of frequently asked questions [FAQ] with answers on certification, and the archives of an electronic mailing list discussion of certification asascs: contains the charter for the Statistical Computing Section of the ASA and software developed by the Section for searching the electronic version of the Current Index to Statistics [CIS] blss: contains macros, fixes, and contributed code for the BLSS statistical package cmlib: consists of the Core Mathematics Library (version 3.0, 1988) from the National Institute of Standards and Technology [NIST] (formerly the National Bureau of Standards), a set of FORTRAN subprogram packages for solving a variety of mathematical and statistical problems cmu-stats: contains information about the Carnegie Mellon Department of Statistics including the graduate program, the undergraduate program, technical reports, and biographies and pictures of the faculty (accessible via WWW only); see also the entry for "Other Sites!" crab: contains the Kodiak Island king crab survey data, distributed for the Data Analysis Exposition sponsored by the Statistical Graphics and Statistical Computing Sections at the 1990 ASA Joint Statistical Meetings in Anaheim, CA csb: contains all data sets appearing in: Lange, N., Ryan, L., Billiard, L., Brillinger, D., Conquest, L., & Greenhouse, G. (1994). CASE STUDIES IN BIOMETRY. New York: Wiley. datasets: consists of a wide variety of interesting data sets including classics such as the Stanford Heart Transplant Data (two versions) and the complete data from several textbooks; see also the entry for "jasadata" designs: consists of a collection of designs, and programs and algorithms for creating designs, for statistical experiments directory: contains several lists of addresses and/or e-mail addresses of statisticians, including the members of the Institute of Mathematical Statistics [IMS], the Allstat mailing list of the United Kingdom, the Statistical Society of Canada [SSC], and the Societa Italiana di Statistica [SIS], and a list of New Zealand statisticians disease: contains National Notifiable Diseases Data (mumps, tuberculosis) from the special exposition "Statistics in Public Health Surveillance" sponsored by the Statistical Graphics Section at the 1991 ASA Joint Statistical Meetings in Atlanta, GA DOS: contains software for the DOS and Windows operating systems, including software for Splus, tools for manipulating DOS archives, and software of general statistical interest (not accessible via e-mail) general: consists of a large and varied collection of statistical software written in FORTRAN, C, Lisp, SAS, and Gauss, and some other odds and ends genstat: contains material related to the GENSTAT statistical package glim: contains material related to the GLIM statistical package including algorithms published in the GLIM Newsletter and macros and data from: Aitkin, M., et al. (1989). STATISTICAL MODELLING IN GLIM. New York: Oxford University Press. griffiths-hill: contains the algorithms published in: Griffiths, P., & Hill, I. D. (Eds.). (1985). APPLIED STATISTICS ALGORITHMS. Chichester, England: Ellis Horwood for the Royal Statistical Society, London. see also the entry for "as" imsbull: contains IMS Bulletins and related information, including the International Calendar of Statistical Events Interface95: contains information about Interface '95 -- 27th Symposium on the Interface: Computing Science and Statistics -- held in Pittsburgh, PA, 21-24 June 1995 Interface96: contains the latest news about Interface '96 -- 28th Symposium on the Interface: Computing Science and Statistics -- to be held in Sydney, Australia, 8-12 July 1996 jasadata: contains data sets from articles published in the Journal of the American Statistical Association; see also the entry for "datasets" jasasoftware: contains software from articles published in the Journal of the American Statistical Association jcgs: contains data sets, software, and abstracts from articles published in the Journal of Computational Graphics and Statistics joint94: contains abstracts for papers presented at the 1994 ASA Joint Statistical Meetings in Toronto joint95: contains abstracts for papers presented at the 1995 ASA Joint Statistical Meetings in Orlando, FL jqt: contains algorithms from articles published in the Journal of Quality Technology maps: contains a world map in a compressed tar file (not accessible via e-mail; if retrieving via ftp, get the file in binary mode) meetings: contains calendars and programs for various statistics and mathematics meetings, for both domestic (USA) and international (non USA) meetings minitab: contains macros for the statistical package Minitab, including the Minitab Industrial Statistics [MIS] macros, the Minitab User's Group [MUG] macros, and others multi: contains an annotated directory and selected algorithms for classification and multivariate data analysis (correspondence analysis, clustering, discriminant analysis, principal components analysis, etc.); see also the entry for "general" Other Sites!: provides information about degree programs, technical reports, and information servers at statistics departments and other organizations around the world (accessible via WWW only); see also the entry for "cmu-stats" p-stat: contains software and extensions for the P-STAT statistical package (not accessible via e-mail) poliscidata: contains data sets from political science journals and authors R: contains pre-alpha binary versions of the language R, an S-like computing environment for the Apple Macintosh (not accessible via e-mail; if retrieving via ftp, get the three executable files in binary mode) S: contains lots of neat stuff (functions, device drivers, list of bugs, syntax summaries, FAQs, language conversions, etc.) for the S and Splus computing environments s-news: contains the archives of the S-news electronic mailing list sapaclisp: contains Common Lisp functions that can be used to perform many of the computations described in: Percival, D. B., & Walden, A. T. (1993). SPECTRAL ANALYSIS FOR PHYSICAL APPLICATIONS: MULTITAPER AND CONVENTIONAL UNIVARIATE TECHNIQUES. Cambridge, England: Cambridge University Press. wc95: contains abstracts for papers presented at the ASA Winter Conference in Raleigh, NC, 6-9 January 1995 xlispstat: contains source code for the 1991 release (version 2.1) of XLISPSTAT, a statistical computing and dynamic graphics system for UNIX machines written by Luke Tierney at the University of Minnesota 1993.expo: contains data (oscillator time series, breakfast cereal) from the special exposition "Serial Correlation or Cereal Correlation??" sponsored by the Statistical Graphics Section at the 1993 ASA Joint Statistical Meetings StatLib also provides access to the Netlib archive of mathematical software (accessible from StatLib via WWW only), the Journal of Statistics Education Service (accessible from StatLib via gopher and WWW only), and the Statistical Computing and Graphics Newsletter (accessible from StatLib via gopher and WWW only). WWW --- StatLib is accessible via WWW. Type netscape http://lib.stat.cmu.edu from an XWindows screen, or enter http://lib.stat.cmu.edu in the "Location" window. E-MAIL ------ The index of collections (from which the above list is adapted) is updated periodically. To obtain the most recent version of the index via e-mail, send the one-line e-mail message send index to the address statlib@lib.stat.cmu.edu. The "Subject:" line of the e-mail message should be left blank. On the Statistical Computing Facility [SCF] machines, statlib@lib.stat.cmu.edu has been aliased to statlib so that e-mail can be sent to the shorter address statlib. To obtain the index for a specific collection, send the e-mail message send index from specific_collection to the address statlib. For example, the index for the "general" collection can be obtained by sending the e-mail message send index from general to the address statlib. To request a specific data set, algorithm, etc., from a collection, send the e-mail message send specific_item from specific_collection to the address statlib. For example, the program "cutoff" (a FORTRAN program for establishing optimal cutoff points for screening and diagnostic tests) can be obtained from the "general" collection by sending the e-mail message send cutoff from general and the data set "rir" (from the Cook and Weisberg book RESIDUALS AND INFLUENCE IN REGRESSION) can be obtained from the "datasets" collection by sending the e-mail message send rir from datasets to the address statlib. You may include several requests in a single e-mail message if you type each on a separate line, as follows: send cutoff from general send rir from datasets send splines from S send ims.chapelhill from meetings In the reply to your e-mail request you will notice repetitions of the phrase "CUT HERE". Materials retrieved from StatLib via e-mail are packaged as shar (SHell ARchive) files, which are groups of files and directories. You can cut ("unpack") the shar file reply into separate pieces using a text editor (vi, emacs, etc.). Alternatively, you can strip the e-mail header and unpack the shar file reply by typing unshar < file where file is the name of the file to which you have saved the e-mail reply to your request. For more information, send the e-mail message send shar from general to the address statlib. StatLib will break large requests into several smaller pieces. You can control the size of the pieces by including a line such as mailsize 100k in your e-mail message. Response time is generally very quick. Responses will appear to be coming from the e-mail address statlibd@stat.cmu.edu or statlibd@temper.stat.cmu.edu; requests should *not* be sent to these addresses, however. Many of the StatLib collections accept submissions of material for inclusion in the collection. If a collection accepts submissions, instructions for submission can be obtained by sending the e-mail message send submissions from specific_collection to the address statlib. FTP --- StatLib can be accessed via ftp. Type ftp lib.stat.cmu.edu At the Name prompt, type statlib At the Password prompt, type login@stat.berkeley.edu where login is *your* login name. To leave ftp, type quit ftp has a built-in but rudimentary help system. For more detailed information about ftp, type help ftp or man ftp at the UNIX prompt. GOPHER ------ StatLib can also be accessed via gopher. Type gopher lib.stat.cmu.edu 70 To leave gopher, type q For more information about gopher, type man gopher or for XWindows man xgopher at the UNIX prompt. WWW --- StatLib is also accessible via WWW. Type mosaic http://lib.stat.cmu.edu from an XWindows screen. If you retrieve materials from WWW using Mosaic, be sure to choose the "HTML" format rather than the "Plain Text" format for the saved file(s). This will preserve the tab characters that are present in many StatLib materials. Detailed instructions for browsing the WWW with Mosaic aren't provided here. There is as yet no help or man file for Mosaic; it is a point- and-click system with a great deal of internal documentation and so is to a large extent self-explanatory. If you need more information, you might consult one of the many reference books available about the Internet or about the World Wide Web and/or Mosaic specifically. NOTES ----- Not all materials are available via all four connectivities (e-mail, ftp, gopher, and WWW). WWW provides the most complete access (and is the most fun!). In order to prevent unnecessary demand on Internet and Carnegie Mellon resources, please take only what you need from StatLib. If you need something else later, you can always retrieve it then. Also, please check whether the software or data set that you plan to retrieve from StatLib is already available on the SCF machines. For example, there would be little point in your requesting XLISPSTAT; it is already installed on the SCF SPARCstations. Typing help software_stat at the UNIX prompt will provide a long list of statistical programs, packages, and data sets currently available on the SCF machines. Materials obtained from StatLib are not guaranteed in any way; StatLib is simply a method of distributing statistically-related materials to people who might find them of interest. Unless otherwise indicated, materials are exactly as submitted. All questions, comments, suggestions, complaints, compliments, etc., should be sent to the person who submitted the material to StatLib; his or her name and e-mail address should be included somewhere within the material. If you use an algorithm, data set, or other material from StatLib, please acknowledge both StatLib and the original contributor of the material. This document was adapted from material written by the person who maintains StatLib, Michael (Mike) Meyer, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, (412) 268-3108 or (412) 268-5627, mikem@stat.cmu.edu. (A picture of Mike is available from StatLib). Mike asks that you contact him only if *all* else fails, in which case please send details of your StatLib problem to him via e-mail and he will try to help. help statlib spector 02/04/92 revised caroln 04/02/95 revised caroln 07/07/95