November 2006 B
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Declan Murphy, Sun Microsystems, Inc.
Thomas Nash, Sun Microsystems, Inc.
Lawrence Votta, Jr., Sun Microsystems, Inc.
Jeremy Kepner, MIT Lincoln Laboratory


B. Productively Used Resources at the Job Level

As noted before we define the job level utility as the cost of the resources productively used

Ujob=cCPUECPURCPU+cmemEmemRmem+cBWEBWRBW+c10E10R10 (7)

and the job level efficiency as

Ejob=Ujob/Csys (8)

Here, the subscripts refer to the resource types, CPU, memory, inter-processor bandwidth, and I/O bandwidth resources, respectively. This can obviously be generalized to include other resources with significant costs.

The cr are the total costs attributable to each resource per unit of that resource. The total life cost is Csys described above,

Csys=cCPURCPU+cmemRmem+cBWRBW+c10R10 (9)

The Rr are the total lifetime resources of type r used by all of the N project activities.18 The resources are assumed for the purposes of this job-level variable to have been 100% allocated and the time to be 100% available for one of the N activities.19 Note that costs only come into Eq.5 to provide a relative weight for the different resources.

The Rr are based on performance measurements and a choice of configuration options. Remember that the Rr include the lifetime, so that the units for CPU, memory, bandwidth, and IO are [ops] (not ops/sec), [byte-years], [bytes], and [bytes], respectively. We weight the resources by their relative costs, so Ujob, is just the cost ($) of the resources that the project teams could have used if they were perfectly efficient. The cr provide the conversion from resource units to utility as cost in $ [$/ops], [$/byte-years], [$/bytes], [$/bytes].

The Er are the fraction of the total of each resource productively utilized on average by all the jobs in the N project activities. The variables Er are efficiency estimators that Job-level Productivity Benchmark efforts can aim to measure (for specified workflows and job mixes) along with the average effort per activity, .

By “productively,” we mean that resources used only to support parallelization are not counted, and that the single processor algorithm being used is not wasteful of resources. This can either be a protocol rule in the benchmarking or the benchmarks can down-rate the utilization fractions Er for resource usage that is not in direct support of the task or algorithms that are less than optimal.20

The Er are [dimensionless].

C. Project and System Level Efficiency

Eadm, the measurable part of system efficiency, may be understood as the effectiveness of the administrative staff in allocating resources efficiently given the tools and the real environment of their system and the time that they have available, as included in the cost cadm. An estimator for this traditional measure of system utilization, Eadm, is what System-level and Administration Productivity Benchmarks can aim to measure, as discussed in Section 4.

Eproj is the subjective component of the figure of merit to allow for evaluation of issues, which might be attributed to system hardware or software, such as project failures or delays and the accessibility of the computing system environment to staff with different levels of computing skills. In general, it includes system or configuration factors that impact the effectiveness of programming teams at accomplishing their goals. This is where utility vs. time considerations may be included, as discussed in [1].

Eadm and Eproj are [dimensionless].

1 This concept is described a bit more rigorously in [2].
2 Murphy, D., Nash, T., Votta, L. Jr., “A System-wide Productivity Figure of Merit,” Sun Labs Technical Report, TR-2006-154, Sun Microsystems Laboratory (2006). sunlabs.com/techrep/2006/abstract-154.html
3 In particular, the important aspects that address user work productivity at the job-level reduce identically to Sterling's w-model of productivity [4].
4 Sterling, T., “Productivity Metrics and Models for High Performance Computing,” Journal High Performance Computing and Applications 18(4):433-440 (2004).
5 These extensions may be seen as a response to Kepner's remark “that all necessary ensemble averaging over users, applications and systems can be performed without loss of generality.” In fact, we show in [2] that our figure of merit follows from ensemble averaging in this way, starting essentially at the synthesis expression in Section 2.2 of Kepner's paper [6].
6 Kepner, J. “HPC Productivity Model Synthesis,” Journal of High Performance Computing and Applications 18(4):505-523 (2004).
7 One can also come to the same formula by looking at the elephant as if we were the committee that designed it, with a completely parameterized model. From this complete model, by calling out assumptions and simplifications, we can reduce to our productivity figure of merit. This approach is pursued in an appendix of [2] and the methodology can be used as a basis to derive a custom figure of merit in cases where our assumptions and simplifications are not applicable to a site's needs.
8 However, a system approaching 100% allocation will have long queues that would have a deleterious effect on the total utility coming out of the computer. Effects like long queues should be included as part of the environment conditions (the “operating point”) in which job level productivity measurements of Ejob and , defined below, are made.
9 For example cCPU is that fraction of the total lifetime cost, Csys, that is attributable to CPU, in $/ops. For CPU, Ujob can be understood (in Sterling's notation [4]) as S x T x cCPU/Csys summed over all jobs during the lifetime T, with S the attained performance [ops/sec].
10 Since Esys is intended to compare the system level resource utilization efficiency for different system designs, vendors, and configurations, it needs to be a ratio to some absolute standard. A reasonable definition that we will use is the “ideal” one, which delivers the sum of the values of all the projects.
11 The sustained systems performance benchmark (SSP)[12] developed at NERSC is the result of this type of analysis. It is used in practice to assist in procurement decisions.
12 Kramer, W., Shalf, J. Strohmaier, E. “The NERSC Sustained System Performance (SSP) Metric,” Lawrence Berkeley National Laboratory. Paper LBNL-58868 (2005). repositories.cdlib.org/lbnl/LBNL-58868
13 The Effective System Performance benchmark (ESP)[14] developed at NERSC is an important step toward this type of evaluation.
14 Wong, A., Oliker, L., Dramer, W., Kaltz, T., Bailey, D. “ESP: A system utilization benchmark,” In Proceedings of the Supercomputing 2000 Conference (2000).
15 In the likely event that host environments wish to specify time-dependent values for projects, these may be parameterized functions or tables. In either case, the spreadsheet will have to be revised to accommodate time-dependent values and system level efficiencies since the present examples do not take time into account.
16 Value judgments may be implicit in some of our simplifications. If these are not consistent with those of a particular environment, they can obviously be adjusted using the basic framework of this approach and the spreadsheet. What is important is to attempt to include as many cost, performance, and productivity components as possible into the best possible single productivity figure of merit.
17 Note that electrical running costs may be dependent upon the job mix.
18 Sums over the jobs in a workflow and the projects at a computing center are shown explicitly in [2].
19 This assumption includes a simplification that the different resources are all available and allocated in the same proportions. It normalizes Ujob so that for perfect job level utility optimization Ujob = Csys.
20 It may be best to define “productive use of resources” with examples. If we replicate memory just to reduce latency by keeping data close to the CPUs in a parallelized problem, this use of the memory resource does not count as productive, although the CPU utilization may increase and be counted as a result. But if the parallelization allows us to expand the problem space (larger matrices, reduced granularity, etc.), or increase the throughput in data intensive situations, and increase the utility of a successful result, this memory use does count. Similarly, bandwidth to access non-local memory only counts if utility is increased as a result, and not just if it is replicated and “indirect” support of parallelization. The excess usage of a resource wasteful algorithm (for example, one that burns resources unnecessarily on a single processor) or of wasteful localization of a scaled up problem (for example, data placed so unnecessary bandwidth is used) does not count as productive use. In the end, to paraphrase Justice Potter Stewart in the context of something else difficult to define, we know “productive resource utilization” when we see it.

Pages: 1 2 3 4 5 6 7

Reference this article
"A System-wide Productivity Figure of Merit," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B. http://www.ctwatch.org/quarterly/articles/2006/11/a-system-wide-productivity-figure-of-merit/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.