The Coming Revolution in Scholarly Communications & Cyberinfrastructure
Clifford Lynch, Coalition for Networked Information (CNI)


Another alternative is for the authors to store the underlying data in an institutional repository. While in some ways this is less desirable than using a disciplinary repository (due to potentials for economies of scale, easy centralized searching of material on a disciplinary basis, and for the development and maintenance of specialized discipline-specific software tools, for example) the institutional repository may be the only real option available for many researchers and for many types of data. Note that one function (and obligation) of each institutional repository is to provide the depositing researcher with a persistent accession identifier that can be used to reference the data.

Recognize that over time individual researchers may move from institution to institution and will ultimately die; technical systems evolve, organizations change mission and responsibilities, and funding models and funding agency interests and priorities shift — any of which can cause archived data to have to be migrated from one place to another or reorganized. The ability to resolve identifiers, to go from citation to data, is highly challenging when considered across long time horizons. The research library community collectively has made a clear commitment to the long-term preservation of access to the traditional scientific literature; the assumption of similar ultimate responsibility for scientific and scholarly data today is still highly controversial.

Just because a dataset has been deposited into a repository does not automatically mean that other researchers (or indeed the public broadly) can have access to it. This is a question ultimately of disciplinary norms, of requirements imposed by funding agencies, of university policies, and of law and public policy. What the e-science environment does is to make these policies and norms much more explicit and transparent, and, I believe, to advance a bias that encourages more rather than less access and sharing. And there is still work to be done on mechanisms and legal frameworks — for example, the analogs of the Creative Commons type licenses for datasets are under development by Science Commons and others, but are at a much less mature stage than those used for journal articles themselves, with part of the problem being that the copyright framework that governs articles is much more consistent globally than laws establishing rights in datasets and databases.1 Also to be recognized here are certain trends in the research community -- most notably university interests in technology transfer as a revenue stream, and the increasing overreach of some Institutional Review Boards in restricting the collection, preservation and dissemination of materials dealing in any way with human subjects - which run very much counter to the bias towards greater openness.

Setting aside the broad issue of the future of peer review, a particularly interesting set of questions involves the relationships between traditional peer review and the data that underlies an article under review. It’s often unclear the extent to which peer review of an article extends to peer review of the underlying data; even when the policies of a journal are explicit on this, I think it’s likely readers don’t have well-established expectations. Will there be a requirement that reviewers have access to underlying data as part of the review process, even if this data may not be immediately and broadly available when the article is published? A recent examination of editorial and refereeing policy by Science in the wake of a major incident of data falsification suggests that at least some journals may take a more aggressive and indeed even somewhat adversarial position with the authors of particularly surprising or high-visibility articles.2 And post-publication, there’s a very formalized means of correcting errors in published articles (or even withdrawing them) that’s now integrated into the online journal delivery systems (though not necessarily other open-access versions of articles that may be scattered around the net). Data correction, updating and maintenance take place (if at all) through separate curatorial mechanisms that are not synchronized to those for managing the article literature.

Lynch, C. "The Shape of the Scientific Article in The Developing Cyberinfrastructure," CTWatch Quarterly, Volume 3, Number 3, August 2007. http://www.ctwatch.org/quarterly/articles/2007/08/the-shape-of-the-scientific-article-in-the-developing-cyberinfrastructure/

