May 2006
Designing and Supporting Science-Driven Infrastructure
Fran Berman and Reagan Moore, San Diego Supercomputer Center

1. Introduction

The 20th century brought about an “information revolution” that has forever altered the way we work, communicate, and live. In the 21st century, data is ubiquitous. Available in digital format via the web, desktop, personal device, and other venues, data collections both directly and indirectly enable a tremendous number of advances in modern science and engineering.

Today’s data collections span the spectrum in discipline, usage characteristics, size, and purpose. The life science community utilizes the continually expanding Protein Data Bank1 as a worldwide resource for studying the structures of biological macromolecules and their relationships to sequence, function, and disease. The Panel Study of Income Dynamics (PSID),2 a longitudinal study initiated in 1968, provides social scientists detailed information about more than 65,000 individuals spanning as many as 36 years of their lives. The National Virtual Observatory3 is providing an unprecedented resource for aggregating and integrating data from a wide variety of astronomical catalogs, observation logs, image archives, and other resources for astronomers and the general public. Such collections have broad impact, are used by tens of thousands of individuals on a regular basis, and constitute critical and valuable community resources.

However, the collection, management, distribution, and preservation of such digital resources does not come without cost. Curation of digital data requires real support in the form of hardware infrastructure, software infrastructure, expertise, human infrastructure, and funding. In this article, we look beyond digital data to its supporting infrastructure, and provide a holistic view of the software, hardware, human infrastructure, and costs required to support modern data-oriented applications in research, education, and practice.

