Best Integrated Bibliographic Services
Joint Statistical Meetings: Introductory Overview Lecture August 5, 2008.
Contents
1 Traditional Subscription Services
2 Emerging Free Services
3 General Issues
4 Statistical Aspects
5 Some Problems
6 Bibliographic Knowledge Network
7 Development Program
8 Partners
9 Ongoing projects
10 BibServer
11 MathPeople
12 Statistics Topics
13 Summary
14 Conclusion
1 Traditional Subscription Services
2 Emerging Free Services
- Google Scholar
- Scirus (Elsevier)
- WorldCat (Library meta search engine)
- arXiv.org (Stat since April 2007), U.C. Davis, IOP
- CiteSeerX (Penn. State, Lee Giles)
- PubMed (NIH free archive: biomed and bio)
- Open Access Journals, e.g. Statistics Surveys
- Wikipedia,Google Knol,MedPedia, ...
- Social Bookmarking (Web 2.0):
del.icio.us (Yahoo),
CiteULike,Connotea (Nature)
- Semantic Web Services (Web 3.0): (API)
BibSonomy,
Freebase,
Google Docs,
Zotero (Firefox extension for browsing/organizing)
3 General Issues
Technical/Legal/Economic/Political/Statistical
Technical
- Architecture (Central/Distributed)
- Data Format/Structure (Objects, Types, BibTex, XML, ...)
- Software (LAMP, P = perl/php/python ..., also RoR)
- Navigation (Compartments)
Legal
- Ownership/Control/Licensing
- Privacy/Identity/Security
Economic/Political
- Organization
- Business Model
- Software Development
- Maintenance
4 Statistical Aspects
Bibliometry: Quantitative analysis of bibliographic data: selection/scoring/ranking/network stats
Citation Statistics Report: (IMS/IMU/ICIAM, 2008) [pdf]
scoring/ranking
Data Visualization, Machine Learning, Automated Classification, Collaborative Filtering
(NetFlix Prize)
5 Some Problems
- Compartmentalization: (silos, stovepipes)
- Organizational structure of disciplines
-
Quality and presentation of info limited by providers
- Navigation: Students and scholars need guidance.
- How to map the landscape of fields?
-
from the literature and from experts?
-
how to combine taxonomy/folksonomy?
-
how to connect researchers to literature they should know?
-
something like Google Earth to explore fields of knowledge?
- Maintenance: Incentive to maintain bib data reduced by free search services. Need to
- create better maintenance tools
- engage individuals and organizations to apply them
- Types: How to deal with the proliferation of types of structured documents?
6 Bibliographic Knowledge Network
Proposal developed in collaboration with
and numerous other partners (listed later).
Goal: To create
- openly navigable network of websites
- each node a bib guide to a specific topic or field
- each node maintained by a virtual organization
- incorporate/improve existing subject sites
- establish collective knowledge systems
7 Development Program
Create software and bibliographic workflows to
- select, brand, maintain, and annotate collections of structured scientific content.
- engage many small and distributed organizations in this activity
- expose bib data in machine-readable formats
- use machine learning to automate selection/cataloging/ranking
- develop statistical analysis of bib data
- establish collective knowledge systems on various scales
- promote connections between systems and disciplines
8 Partners
9 Ongoing projects
- BibServer
- MathPeople
- StatTopics
10 BibServer
(maintained in part by IMS/VTEX)
11 MathPeople
- developed with Jaeyhun Paek (Dalhousie D-Drive) and Hadley Wickham
- supported by multiple organizations
- MathPeople leverages multiple sources of name data to provide a distributed name authority system for people in the mathematical sciences.
- aggregates data about the same person from many different data sources
e.g.
12 Statistics Topics
- developed with Jeff Regier, supported by IMS.
- Stat Topics leverages multiple data sources to provide a comprehensive collection of topics in statistics
- provides scripted links to glossary and encyclopedia pages
- associates topics with people
- foundation for development of an open access Encyclopedia of Prob/Stat
- cf. Wikipedia, PlanetMath, Google Knol, MedPedia
- current initiative by Springer
to engage editorial support from statistical societies (John Kimmel)
13 Summary
- Services for the managemen/analysis/delivery of bibliographic data are in rapid flux
-
unique opportunities for statisticians to push towards more open services
- cf. R Project, BioConductor, Dataverse Network
-
potential for improvement in scholarly communication is very great
-
special potential for making statistical knowledge more accessible to researchers in other fields
14 Conclusion
What is most needed is human resources:
- individual researchers to make their bib data (including fulltext) available with open access
- individuals to persuade organizations of all sizes to make their aggregated bib data openly accessible
- software developers to provide data structures and workflows for large amounts of bib data
- editors and curators to improve the quality of bib data in their areas or expertize
- researchers to develop statistical analysis of bib data as a tool for advancement of knowledge
- senior statisticians to advise administrators about use of citation statistics in research assessment
Fiscal resources to attract the human ones are also needed.
Want to get involved? Please get in touch!
File translated from
TEX
by
TTH,
version 3.67.
On 2 Aug 2008, 15:19.