May 2006
Designing and Supporting Science-Driven Infrastructure
Fran Berman and Reagan Moore, San Diego Supercomputer Center

3. Preserving Data over Time

Some digital collections will continue to be valuable resources for the foreseeable future. These typically include irreplaceable collections (e.g., the Shoah Collection of Holocaust survivor testimony),5 valuable community reference collections (e.g., PDB, NVO, PSID), and historically valuable collections such as federal digital records.6 7 For these digital collections, lifetime is measured in decades, with continuous active preservation, and often new material is added over time. Over a collection’s decades of existence, the media on which it is stored will go through tens of generations, standard encoding formats will evolve, preservation staff and institutions may change, etc. In short, everything involved with the collection may evolve, and evolution must be planned and executed in a way that maintains the integrity of the data collection and minimizes disruption to access from its user community.

Because the time periods over which long-term digital collections are preserved are measured in decades, the need for preservation environments is critical. At SDSC, some of the current data collections have been migrated over the last 20 years onto six generations of storage technology. Over that period, the trend in tape media costs per byte has been exponential, dropping by half approximately every three years. If this exponential trend continues, the total life-time cost of media is only twice the original media cost, being

(1 + 1/2 + 1/4 + …) * (original cost).

Of course, tape media are only a modest portion of the true cost of long-term storage and the labor for administering the storage system, in particular managing the transitions between generations of storage technology, must be incorporated into cost models (see below). Generally, the number of individuals managing the collections can stay constant, after the initial period of implementation, even though both the size of the data files and the size of the storage media are growing. This means that costs related to storage management labor are increasing slower than costs related to collection building and maintenance.

Reference this article
Berman, F., Moore, R. "Designing and Supporting Data Management and Preservation Infrastructure," CTWatch Quarterly, Volume 2, Number 2, May 2006. http://www.ctwatch.org/quarterly/articles/2006/05/designing-and-supporting-data-management-and-preservation-infrastructure/

