November 2007
Software Enabling Technologies for Petascale Science
Dean N. Williams, Lawrence Livermore National Laboratory
David E. Bernholdt, Oak Ridge National Laboratory
Ian T. Foster, Argonne National Laboratory
Don E. Middleton, National Center for Atmospheric Research

1. Introduction

Climate research is inherently a multidisciplinary endeavor. As researchers strive to understand the complexity of our climate system, they form multi-institutional and multinational teams to tackle “Grand Challenge” problems. These multidisciplinary, virtual organizations need a common software infrastructure to access the many large global climate model datasets and tools. It is critical that this infrastructure provide equal access to climate data, supercomputers, simulations, visualization software, whiteboard, and other resources. To this end, we established the Earth System Grid (ESG) Center for Enabling Technologies (ESG-CET),1 a collaboration of seven U.S. research laboratories (Argonne, LANL, LBNL, LLNL, NCAR, NOAA/PMEL, and ORNL) and a university (USC/ISI) working together to identify and implement key computational and informational technologies for advancing climate change science. Sponsored by the Department of Energy (DOE) Scientific Discovery through Advanced Computing (SciDAC)-22 program, through the Offices of Advanced Scientific Computing Research (OASCR)3 and the Offices of Biological and Environmental Research (OBER),4 ESG-CET utilizes and develops computational resources, software, data management, and collaboration technologies to support observational and modeling data archives.

Work on ESG began with the “Prototyping an Earth System Grid” (ESG I) project, initially funded under the DOE Next Generation Internet (NGI) program, with follow-on support from OBER and DOE’s Mathematical, Information, and Computational Sciences (MICS) office. In this prototyping project, we developed Data Grid technologies for managing the movement and replication of large datasets, and applied these technologies in a practical setting (an ESG-enabled data browser based on current climate data analysis tools), achieving cross-country transfer rates of more than 500 Mb/s. Having demonstrated the potential for remotely accessing and analyzing climate data located at sites across the U.S., we won the “Hottest Infrastructure” award in the Network Challenge event at the SC’2000 conference.

While the ESG I prototype provided a proof of concept (“Turning Climate Datasets into Community Resources”), the SciDAC Earth System Grid (ESG) II project5 6 made this a reality. Our efforts in that project targeted the development of metadata technologies7 (standard schema, XML metadata extraction based on netCDF, and a Metadata Catalog Service), security technologies8 (Web-based user registration and authentication, and community authorization), data transport technologies9 10 (GridFTP-enabled OPeNDAP-G for high-performance access, robust multiple file transport and integration with mass storage systems, and support for dataset aggregation and subsetting), and web portal technologies to provide interactive access to climate data holdings. At this point, the technology was in place and assembled, and ESG II was poised to make a substantial impact on the climate modeling community.

In 2004, the National Center for Atmospheric Research (NCAR), a premier climate science laboratory and lead institution for the Community Climate System Model (CCSM) modeling collaboration, began its first publication of climate model data into the ESG system, drawing on simulation data archived at LANL, LBNL, NCAR, and ORNL. Late that same year, the Program for Climate Model Diagnosis and Intercomparison (PCMDI), an internationally recognized climate data center at LLNL, launched a production service providing access to climate model data germane to the Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report (AR4).11 (Because of international data requirements, restrictions, and timelines, the NCAR and PCMDI ESG data holdings were separated.) ESG has since become a world-renowned leader in developing technologies that provide scientists with virtual access to distributed data and resources.

In its first full year of production (late 2005), the two ESG sites provided access to a total of 220 TB of data, served over 3,000 registered users, and delivered over 100 TB of data to users worldwide. Analysis of just one component of ESG data holdings, those relating to the Coupled Model Intercomparison Project phase 3 (CMIP3), resulted in the publication of over 100 peer-reviewed scientific papers.

In 2006, we launched the current phase of the ESG effort, the ESG Center for Enabling Technologies (ESG-CET). The primary goal of this stage of the project is to broaden and generalize the ESG system to support a more broadly distributed, more international, and more diverse collection of archive sites and types of data. An additional goal is to extend the services provided by ESG beyond access to raw data by developing “server-side analysis” capabilities that will allow users to request the output from commonly used analysis and intercomparison procedures. We view such capabilities as essential if we are to enable large communities to make use of petascale data. However, their realization poses significant resource management and security challenges.

Pages: 1 2 3 4 5

Reference this article
Williams, D. N., Bernholdt, D. E., Foster, I. T., Middleton, D. E. "The Earth System Grid Center for Enabling Technologies: Enabling Community Access to Petascale Climate Datasets ," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/the-earth-system-grid-center-for-enabling-technologies-enabling-community-access-to-petascale-climate-datasets/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.