May 2006
Designing and Supporting Science-Driven Infrastructure
Charlie Catlett, Pete Beckman, Dane Skow and Ian Foster, The Computation Institute, University of Chicago and Argonne National Laboratory

3.5 Catalytic Processes: Policy, Planning and Internal Coordination

The creation of a national-scale team comprised of individuals from multiple independent institutions requires careful attention to collaboration systems and processes in support of virtual, and distributed, teams. Two FTEs within the TeraGrid GIG maintain infrastructure (e.g., CVS repository, discussion forums, wiki and bugzilla servers) that is used both for day-to-day collaboration and to “curate” important project data. For coordination of activities, TeraGrid relies on two types of virtual teams. Working groups are persistent groups of TeraGrid staff with a common mission, such as supporting TeraGrid software, networks, or accounting systems. These working groups are complemented with short-term planning teams called requirements analysis teams (affectionately, “RATs”). Working groups typically involve key members from each resource provider site and coordinators from the GIG and meet regularly on an ongoing basis. RATs are generally smaller (4-6 members) and work on a particular issue for 6-8 weeks to produce a recommendation or proposal for new policy or projects.

The resources and integrative software and services that make up a national-scale grid facility define one axis of its operations. However there is also a distinct institutional axis, which is where decisions are made regarding the facility’s operations, policies, and major changes to its services, resources, and software. TeraGrid formalizes these latter processes in terms of a numbered, citable, persistent document series, not unlike those used by standards bodies. The initial document28 in the series lays out the roles and responsibilities of TeraGrid’s GIG and resource provider partners as well as a decision-making process.

While top-down, hierarchical management is feasible in a single organization, a federation of interdependent peer organizations requires a different model. At the same time, while democratic processes may work for loose collaborations, they are not appropriate for operation of a production facility. TeraGrid decision-making relies on consensus among representatives of each resource provider, under the leadership of the principal investigator of the GIG who serves as overall TeraGrid project director. This team of resource provider and GIG principals, called the Resource Provider Forum, meets weekly in an open Access Grid session and quarterly for face-to-face review and planning.

4. Summary and Conclusions

Figure 2 shows how the TeraGrid cyberinfrastructure facility allocates staff to provide high-capability, high capacity, high-reliability computational, information management, and data analysis services on a national scale. Approximately 25% of the staff are allocated to common integration functions (TeraGrid GIG) and 75% to resource provider facility functions. User support and external communications are emphasized at similar levels in both the resource provider efforts and the common GIG effort. GIG effort is the bulk of the software, policy and management, and operational services; resource provider effort is the bulk of the resource integration and support and functions. Note that even the “central” functions are distributed: the common services are largely staffed in a distributed fashion at the resource provider sites. TeraGrid’s GIG, operated by the University of Chicago, relies on subcontracts with resource provider facilities for more than 2/3 of the GIG staff, making even the common services team a distributed enterprise. What is important is that this GIG staff, and the services that it provides, is coordinated by a single entity.

Figure 2

Figure 2. TeraGrid staffing distribution by functional area, April 2006.

Although these numbers will differ in the particular areas from one national grid project to another, we believe that they are representative of the general balance of requirements, both among different functions and between “common” or centrally-provided services and those provided by resource provider facilities.

1 Atkins, D.E., Droegemeier, K.K., Feldman, S.I., Garcia-Molina, H., Klein, M.L., Messina, P., Messerschmitt, D.G., Ostriker, J.P. and Wright, M.H. “Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Panel on Cyberinfrastructure,” 2003.
2 The TeraGrid 2006 - www.teragrid.org/
3 TeraGrid Resource Providers are Argonne National Laboratory / University of Chicago, Indiana University, the National Center for Supercomputing Applications, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, Purdue University, the San Diego Supercomputer Center, and the Texas Advanced Computing Center.
4 Roskies, R., Zacharia, T. “Designing and Supporting High-end Computational Facilities,” CTWatch Quarterly 2(2): May 2006.
5 Killeen, T. L., Simon, H. D. “Supporting National User Communities at NERSC and NCAR,” CTWatch Quarterly 2(2): May 2006.
6 Foster, I. “Globus Toolkit Version 4: Software for Service-Oriented Systems,” IFIP International Conference on Network and Parallel Computing, 2005, Springer-Verlag LNCS 3779, 2-13.
7 Novotny, J., Tuecke, S. and Welch, V. “An Online Credential Repository for the Grid: MyProxy,” 10th IEEE International Symposium on High Performance Distributed Computing, San Francisco, 2001, IEEE Computer Society Press.
8 Litzkow, M. and Livny, M. “Experience with the Condor Distributed Batch System,” IEEE Workshop on Experimental Distributed Systems, 1990.
9 Smallen, S., Olschanowsky, C., Ericson, K., Beckman, P. and Schopf, J.M. “The Inca Test Harness and Reporting Framework,” SC’2004 High Performance Computing, Networking, and Storage Conference, 2004.
10 NSF Middleware Initiative (NMI), 2006 - www.nsf-middleware.org/
11 NSF Middleware Initiative (NMI) Grid Research Integration Development and Support (GRIDS) Center, 2006 - www.grids-center.org/
12 Droegemeier, K. et al, “Linked Environments for Atmospheric Discovery (LEAD): Architecture, Technology Roadmap, and Deployment Strategy,” 21st Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology, 2005, American Meteorological Society.
13 Open Science Grid (OSG), 2006 - www.opensciencegrid.org/
14 Avery, P. and Foster, I. “The GriPhyN Project: Towards Petascale Virtual Data Grids,” 2001 - www.griphyn.org/
15 Avery, P., Foster, I., Gardner, R., Newman, H. and Szalay, A. “An International Virtual-Data Grid Laboratory for Data Intensive Science,” Technical Report GriPhyN-2001-2, 2001 - www.griphyn.org
16 Schissel, D.P., Keahey, K., Araki, T., Burruss, J.R., Feibush, E., Flanagan, S.M., Foster, I., Fredian, T.W., Greenwald, M.J., Klasky, S.A., Leggett, T., Li, K., McCune, D.C., Lane, P., Papka, M.E., Peng, Q., Randerson, L., Sanderson, A., Stillerman, J., Thompson, M.R. and Wallace, G. “The National Fusion Collaboratory Project: Applying Grid Technology for Magnetic Fusion Research,” Workshop on Case Studies on Grid Applications, 2004.
17 GEON: The Geosciences Network, 2006 - www.geongrid.org/
18 Bernholdt, D., Bharathi, S., Brown, D., Chanchio, K., Chen, M., Chervenak, A., Cinquini, L., Drach, B., Foster, I., Fox, P., Garcia, J., Kesselman, C., Markel, R., Middleton, D., Nefedova, V., Pouchard, L., Shoshani, A., Sim, A., Strand, G. and Williams, D. “The Earth System Grid: Supporting the Next Generation of Climate Modeling Research,” Proceedings of the IEEE, 93 (3). 485-495. 2005.
19 National Virtual Observatory, 2006 - www.us-vo.org/
20 Szalay, A. and Gray, J. “The World-Wide Telescope,” Science, 293. 2037-2040. 2001.
21 Nanotechnology Simulation Hub (NanoHub), 2006 - www.nanohub.org/
22 Ellisman, M. and Peltier, S. “Medical Data Federation: The Biomedical Informatics Research Network,” The Grid: Blueprint for a New Computing Infrastructure (2nd Edition), Morgan Kaufmann, 2004.
23 Cancer Bioinformatics Grid (caBIG), 2006 - cabig.nci.nih.gov/
24 Account Management Information Exchange (AMIE), 2006 - scv.bu.edu/AMIE/
25 Allcock, B., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I. and Foster, I., “The Globus Striped GridFTP Framework and Server”. SC’2005, 2005.
26 Baru, C., Moore, R., Rajasekar, A. and Wan, M. “The SDSC Storage Resource Broker,” 8th Annual IBM Centers for Advanced Studies Conference, Toronto, Canada, 1998.
27 Foster, I., Kesselman, C. and Tuecke, S. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputer Applications, 15 (3). 200-222. 2001.
28 Catlett, C., Goasguen, S. and Cobb, J. “TeraGrid Policy Management Framework,” TeraGrid Report TG-1, 2006.

Pages: 1 2 3 4 5 6 7

Reference this article
Catlett, C., Beckman, P., Skow, D., Foster, I. "Creating and Operating National-Scale Cyberinfrastructure Services," CTWatch Quarterly, Volume 2, Number 2, May 2006. http://www.ctwatch.org/quarterly/articles/2006/05/creating-and-operating-national-scale-cyberinfrastructure-services/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.