November 2006 B
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Erich Strohmaier, Lawrence Berkeley National Laboratory

Model Time per access Parameters
0 Flat memory T = g = constant g
1 Two level memory T = P(c/M)*g1 + (1-P(c/M))*g2 g1, g2
2 Latency, Bandwidth T = (l+g*(L-1))/L l, g
3 L, B for two levels T = P(c/M)*(l1+g1*(L-1))/L + (1-P(c/M))*(l2+g*(L-1))/L2 l1, g1, l2, g2
Cache-hit probability: P(c/M) = (c/M)α
Table 2: Performance models used for the average time per data access of Apex-MAP and their performance parameters. M, L and α are Apex-MAP parameters and c is a system parameter reflecting the most important memory or system hierarchy level, which has to be set to different appropriate levels for serial and parallel models.

I use four models to characterize serial benchmark execution and parallel execution of Apex-MAP. The simplest model represents an ideal system on which all data accesses take equal time (flat global memory). My second model assumes a two level memory hierarchy, with constant access time for each level. M is the global memory utilized by Apex-MAP and c the amount of memory in the faster second level. The probability to find data in c can be derived from the properties of the temporal locality parameter α as (c/M)α. The third model assumes access time depends linearly on block length L with two parameters for latency l and gap g (inverse bandwidth). The fourth model finally combines the previous two by assuming a linear timing model for each level in a hierarchical model. All models and their timing equations are shown in Table 2.

Performance models should be calibrated with hardware specifications or low-level micro-benchmarks. Due to the simplicity of my models, predictions based on hardware specifications can be expected to be off by large margins. I use a back-fitting procedure to minimize SSE and the prediction error of the models. To check the validity of the models I inspect residual error plots and fitted latency and gap values to rule out any fictitious models or parameter values.

For complex parallel system hierarchies it is not always clear what the second most important hierarchy level is and what the value of c should be. It is therefore advisable to at least use a backfitting approach to confirm an initial choice, or to probe the performance signature of a system, to determine which level is most influential on performance.

Pages: 1 2 3 4 5 6 7

Reference this article
"Performance Complexity: An Execution Time Metric to Characterize the Transparency and Complexity of Performance," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B. http://www.ctwatch.org/quarterly/articles/2006/11/performance-complexity-an-execution-time-metric-to-characterize-the-transparency-and-complexity-of-performance/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.