August 2007
The Coming Revolution in Scholarly Communications & Cyberinfrastructure
Timo Hannay, Nature Publishing

Open data and mashups

Another area with huge potential – but one that I have space to deal with only cursorily here – is that of open scientific data sets and forms of interoperability that allow these to be transferred not only between scientists but also between applications in order to create new visualizations and other useful transformations. There are numerous challenges, but there is also progress to report on each front. Too often scientists are unwilling to share data, whether for competitive or other reasons, though increasingly funders (and some publishers) are requiring them to do so. Even when the data are available, they usually lack the consistent formats and unambiguous metadata that would enable them to be efficiently imported into a new application and correctly interpreted by a researcher who was not present when they were collected. Yet data standards such as CML 66 and SBML 67 are emerging, as are metadata standards such as MIAME.68 As software applications also adopt these standards, we enter a virtuous circle in which there are increasing returns (at least at the global level) to openly sharing data using common standards.

For a glimpse of the benefits this can bring, witness the work of my colleague, Declan Butler, a journalist at Nature. While covering the subject of avian flu, it came to his attention that information about global outbreaks was fragmented, incompatible, and often confidential. So he took it upon himself to gather what data he could, merge it together and provide it in the form of a KML file, the data format used by Google Earth.69 Shortly afterwards he overlaid poultry density data.70 This not only meant the information was now available in one place, it also made it much more readily comprehensible to experts and non-experts alike. Imagine the benefits if this approach, largely the work of one man, was replicated across all of science.

Wither the scientific web?

Over the last 10 years or so, much of the discussion about the impact of the web on science – particularly among publishers – has been about the way in which it will change scientific journals. Sure enough, these have migrated online with huge commensurate improvements in accessibility and utility. For all but a very small number of widely read titles, the day of the print journal seems to be almost over. Yet to see this development as the major impact of the web on science would be extremely narrow-minded – equivalent to viewing the web primarily as an efficient PDF distribution network. Though it will take longer to have its full effect, the web's major impact will be on the way that science itself is practiced.

The barriers to full-scale adoption are not only (or even mainly) technical, but rather social and psychological. This makes the timings almost impossible to predict, but the long-term trends are already unmistakable: greater specialization in research, more immediate and open information-sharing, a reduction in the size of the 'minimum publishable unit,' productivity measures that look beyond journal publication records, a blurring of the boundaries between journals and databases, reinventions of the roles of publishers and editors, greater use of audio and video, more virtual meetings. And most important of all, arising from this gradual but inevitable embracement of technology, an increase in rate at which new discoveries are made and exploited for our benefit and that of the world we inhabit.

