The term “cyberinfrastructure” is broadly defined to include computer applications, services, data, networks, and many other components supporting science.1 Here we discuss the underlying resources and integrative systems and software that together comprise a grid “facility” offering a variety of services to users and applications. These services can range from application execution services to data management and analysis services, presented in such a way that end-user applications can access these services separately or in combination (e.g., in a workflow).
We use the TeraGrid2 project to illustrate the functions and costs of providing national cyberinfrastructure. Developed and deployed in its initial configuration between 2001 and 2004, the TeraGrid is a persistent, reliable, production national facility that today integrates eighteen distinct resources at eight “resource provider” facilities.3 This facility supports over 1000 projects and several thousand users (Fig. 1) across the sciences. TeraGrid architecture, planning, coordination, operation, and common software and services are provided through the Grid Infrastructure Group (GIG), led by the University of Chicago. TeraGrid staff work with end-users, both directly and through surveys and interviews, to drive the technical design and evolution of the TeraGrid facility in support of science. In addition, TeraGrid is developing partnerships with major science facilities and communities to provide needed computational, information management, data analysis, and other services and resources, thus allowing those communities to focus on their science rather than on the creation and operation of services.
TeraGrid supports a variety of use scenarios, ranging from traditional supercomputing to advanced Grid workflow and distributed applications. In general terms, TeraGrid emphasizes two complementary types of use. TeraGrid “Deep” involves harnessing TeraGrid’s integrated high-capability resources to enable scientific discovery that would not otherwise be possible. TeraGrid “Wide” is an initiative that is adapting TeraGrid services and capabilities to be readily used by the broader scientific community through interfaces such as web portals and desktop applications. All of these use scenarios—even traditional supercomputing users—benefit from the common services that are operated across the participating organizations, such as uniform access to storage, common data movement mechanisms, facility-wide authentication, and distributed accounting and allocations systems that provide the basis for authorization.
Creating and operating a grid facility involves integrating resources, software, and user support services into a coherent set of services for users and applications. Resources are explored by Roskies,4 while Killeen and Simon5 discuss user and community support. We discuss here the software infrastructure and policies required to integrate these diverse components to create a persistent, reliable national-scale facility. While the federation of multiple, independent computing centers requires carefully designed federation, governance, and sociological policies and processes, in this article we focus only on the functional and technical costs of operating a national grid infrastructure.