August 2007
The Coming Revolution in Scholarly Communications & Cyberinfrastructure
Incentivizing the Open Access Research Web
Publication-Archiving, Data-Archiving and Scientometrics
Tim Brody, University of Southampton, UK
Les Carr, University of Southampton, UK
Yves Gingras, Université du Québec à Montréal (UQAM)
Chawki Hajjem, Université du Québec à Montréal (UQAM)
Stevan Harnad, University of Southampton, UK; Université du Québec à Montréal (UQAM)
Alma Swan, University of Southampton, UK; Key Perspectives

Open Access

Making articles freely accessible online is also called Open Access (OA). OA is optimal for research and hence inevitable. Yet even with all of P’s less exacting infrastructural demands already met, P-OA has been very slow in coming. Only about 15% of yearly research article output is being made OA spontaneously today. This article discusses what can be done to accelerate P-OA, to the joint advantage of R, D & P, using a very special hybrid example, based on the research corpus itself (P), serving as the database (D) for a new empirical discipline (R).

For “scientometrics” – the measurement of the growth and trajectory of knowledge - both the metadata and the full texts of research articles are data, as are their download and citation metrics. Scientometrics collects and analyzes these data by harvesting the texts, metadata, and metrics. P-OA, by providing the database for scientometrics, will allow scientometrics to better detect, assess, credit and reward research progress. This will not only encourage more researchers to make their own research publications P-OA (as well as encouraging their institutions and funders to mandate that they make them P-OA), but it will also encourage more researchers to make their data D-OA too, as well as to increase their online research collaborations (R). And although the generic infrastructure for making publications P-OA is already functionally ready, the specific infrastucture for treating P as D will be further shaped and stimulated by the requirements of scientometrics as R.

First, some potentially confusing details need to be made explicit and then set aside: publications (P) themselves sometimes contain research data (D). A prominent case is chemistry, where a research article may contain the raw data for a chemical structure. Some chemists have accordingly been advocating OA for chemical publications not just as Publications (P) but as primary research Data (D), which need to be made accessible, interoperable, harvestable and data-mineable for the sake of basic chemical research (R), rather than just for the usual reading of research articles by individual users. The digital processing of publication-embedded data is an important and valid objective, but it is a special case and hence will not be treated here, because the vast majority of research Publications (P) today do not include their raw data. It is best to consider the problem of online access to data that are embedded in publications as a special case of online access to D rather than as P. Similarly, the Human Genome Database,9 inasmuch as it is a database rather than a peer-reviewed publication, is best considered as a special case of D rather than P.

Here, however, in the special case of scienotmetrics, we will be considering the case of P as itself a form of D, rather than merely as containing embedded D within it. We will also be setting aside the distinction between publication metadata (author, title, date, journal, affiliation, abstract, references) and the publication’s full-text itself. Scientometrics considers both of these as data (D). Processing the full-text’s content is the “semiometric” component of scientometrics. But each citing publication’s reference metadata are also logically linked to the publications they cite, so as the P corpus becomes increasingly OA, these logical links will become online hyperlinks. This will allow citation metrics to become part of the P-OA database too, along with download metrics. (The latter are very much like weblinks or citations; they take the form of a “hit-and-run.” Like citations however, they consist of a downloading site – identified by IP, although this could be made much more specific where the downloader agrees to supply more identifying metadata - plus a downloaded site and document.) We might call citation and download metrics “hypermetrics,” alongside the semiometrics, with which, together, they constitute scientometrics.


The objective of scientometrics is to extract quantitative data from P that will help the research publication output to be harvested, data-mined, quantified, searched, navigated, monitored, analyzed, interpreted, predicted, evaluated, credited and rewarded. To do all this, the database itself first has to exist and, preferably, it should be OA. Currently, the only way to do (digital) scientometrics is by purchasing licensed access to each publisher’s full-text database (for the semiometric component), along with licensed access to the Thompson ISI Web of Science 10 database for some of the hypermetrics. (Not the hypermetrics for all publications, because ISI only indexes about one third of the approximately 25,000 peer-reviewed research journals published across all fields, nations and languages.) Google Scholar 11 and Google Books 12 index still more, but are as yet very far from complete in their coverage – again because only about 15% of current annual research output is being made P-OA. But if this P-OA content can be raised to 100%, not only will doing scientometrics no longer depend on licensed access to its target data, but researchers themselves, in all disciplines, will no longer depend only on licensed access in order to be able to use the research findings on which they must build their own research.

Three things are needed to increase the target database from 15% to 100%: (1) functionality, (2) incentives, and (3) mandates.3 The network infrastructure needs to provide the functionality, the metrics will provide the incentive, and the functionality and incentives together will induce researchers’ institutions and funders to mandate OA for their research output (just as they already mandate P itself: “publish or perish”).

Reference this article
Harnad, S., Brody, T., Carr, L., Gingras, Y., Hajjem, C., Swan, A. "Incentivizing the Open Access Research Web," CTWatch Quarterly, Volume 3, Number 3, August 2007. http://www.ctwatch.org/quarterly/articles/2007/08/incentivizing-the-open-access-research-web/

