May 2006
Designing and Supporting Science-Driven Infrastructure
Charlie Catlett, Pete Beckman, Dane Skow and Ian Foster, The Computation Institute, University of Chicago and Argonne National Laboratory

3.2 Coordinating User Support and User Support Infrastructure

User support is best done in a manner that can fully exploit all available human connections to users and their problem domains. The most frequent model is to have the user support staff local to the resource providers. This model is motivated in part by the historical organization of computing centers as vertically integrated, standalone facilities, and in part by the fact that close connection to the users and their issues is important to the centers, providing vital information for tuning, improving and designing next generation facilities.

TeraGrid leverages this model, coordinating the support staff across the sites to provide a set of support programs that give users a “one stop shop” whose major function (beyond basic “first aid”) is to establish the connection between the user and the appropriate local support. This approach also allows us to draw on the expertise and availability of peers across the full organization.

This coordinated, leveraged approach is essential when supporting a user community in the context of a distributed grid facility, where services and applications involve multiple components. Diagnosing and tuning applications in such an environment often requires the engagement of experts from multiple organizations. At the same time, it is important that a single responsible party “own” getting a solution to the user. Often, providing a modest amount of focused attention, while drawing on specialists across the facility, allows researchers to make rapid substantial progress in the efficiency and capabilities of their applications.

TeraGrid user support services comprise three FTEs who provide central coordination and 25 applications support and consulting FTEs from the resource provider facilities. A particular benefit to this distributed teaming approach is that TeraGrid can draw on a much more diverse group of experts than can be found in any single facility.

The TeraGrid GIG is also creating a team of experts whose role currently is to integrate a set of 
prototype science gateways. Consisting of 10 FTE located at eight science gateway sites, this distributed support team will shift within 12-18 months from primarily integrating prototypes to becoming a general support team for the dozens of science gateways that we anticipate will emerge from these early pioneering efforts. Complementing the direct end-user support team, this team’s customers will be user support and technical staff associated with science gateways.

3.3 External Communications, Training and Documentation

As with a single-site facility, a national cyberinfrastructure requires focused effort on communications to key groups, including end-users, funding agencies, and other stakeholders. Each resource provider within a national grid facility will provide documentation and training for the resources and services locally provided, and these materials must be proactively integrated, in a similar fashion to the services and resources themselves. This tasks requires an overall communication architecture that provides structure and common interfaces and formats for the training and documentation materials, along with the curation – the analog to software verification and validation – of the overall systems.

TeraGrid coordinates these areas with two FTEs who work with three FTEs at resource provider facilities as well as the external relations, education, and training staff at those facilities (but not dedicated to TeraGrid).

A key strategy for not only communication but also user support and simplifying the use of TeraGrid is a user portal program that provides users with a web-based, customizable interface for training, documentation, and common user functions such as resource directories, job submission and monitoring, and management of authorization credentials across TeraGrid. The user portal project involves two FTEs who work closely with the communications, training, and documentation staff.

