August 2007
The Coming Revolution in Scholarly Communications & Cyberinfrastructure
Paul Ginsparg, Cornell University


On the one decade time scale, it is likely that more research communities will join some form of global unified archive system without the current partitioning and access restrictions familiar from the paper medium, for the simple reason that it is the best way to communicate knowledge and hence to create new knowledge. The genomic and related resources described above are naturally interlinked by virtue of their common hosting by a single organization, a situation very different from that described earlier for astronomy research. For most disciplines, the key to progress will be development of common web service protocols, common languages (e.g., for manipulating and visualizing data), and common data interchange standards, to facilitate distributed forms of the above resources. The adoption of these protocols will be hastened as independent data repositories adopt dissemination of seamlessly discoverable content as their raison d'être. Analogs of the test parsings described above have natural analogs in all fields: such as astronomical objects and experiments in astronomy; mathematical terms and theorems in mathematics; physical objects, terminology, and experiments in physics; chemical structures and experiments in chemistry, etc., and many of the external databases to provide targets for such automated markup already exist.

One of the surprises of the past two decades is how little progress has been made in the underlying document format employed. Equation-intensive physicists, mathematicians, and computer scientists now generally create PDF from TeX. It is a methodology based on a pre-1980's print-on-paper mentality and not optimized for network distribution. The implications of widespread usage of newer document formats such as Microsoft's Open Office XML or the OASIS OpenDocument format and the attendant ability to extract semantic information and modularize documents are scarcely appreciated by the research communities. Machine learning techniques familiar from artificial intelligence research will assist in the extraction of metadata and classification information, assisting authors and improving services based on the cleaned metadata. Semantic analysis of the document bodies will facilitate the automated interlinking to external resources described above and lead to improved navigation and discovery services for readers. A related question will be what authoring tools and functions should be added to word processing software, both commercial and otherwise, to provide an optimal environment for scientific authorship? Many of the interoperability protocols for distributed database systems will equally accommodate individual authoring clients or their proxies, and we can expect many new applications beyond real-time automated markup and autonomous reference finding.

Every generation thinks it's somehow unique, but there are nonetheless objective reasons to believe that we are witnessing an essential change in the way information is accessed, the way it is communicated to and from the general public, and among research professionals - fundamental methodological changes that will lead to a terrain 10-20 years from now more different than it was 10-20 years ago than in any comparable time period.

Reference this article
Ginsparg, P. "Next-Generation Implications of Open Access," CTWatch Quarterly, Volume 3, Number 3, August 2007. http://www.ctwatch.org/quarterly/articles/2007/08/next-generation-implications-of-open-access/

