August 2005
The Coming Era of Low Power, High-Performance Computing — Trends, Promises, and Challenges
Satoshi Matsuoka, Tokyo Institute of Technology


The main theme of this issue of CTWatch Quarterly is the new trend within the high performance computing (HPC) toward lower power requirements. Low power computing itself is not new — it has had a long history in embedded systems, where battery life is at a premium. In fact, the applicability of low power has widened its scope in both directions on the power consumption scale. Lower power consumption in the microwatts arena — so-called “ultra low power” (ULP) — is necessary to enable applications such as wireless remote sensing, where a device may have to run on a single small battery for months and need to be networked to collect data. In a more familiar context, most PCs have recently become Energy Star1 compliant. In fact, a really dramatic shift in design emphasis occurred around 2003-2004, when the industry began to move from the pursuit of desktop performance alone to the pursuit of desktop performance/power in combination. Recent processors initially designed for energy efficient notebooks, such as Intel’s Pentium-M, have started to find their way into desktop units. In fact, there is strong speculation that future mainstream PC processors will be successors of the Pentium-M style, a power efficient design.

But why do we want to save power in the HPC arena since the goal has always been to go faster at almost any cost? Certainly it is fair to say that performance/power has always been an engineering concern in designing HPC machines. For example, NEC claims to have achieved five times better performance/power efficiency in their SX-6 model over their previous generation SX-5.2 Where HPC machines function as large servers in datacenters, reducing power would also result in substantial cost savings in their operations. And of course, there are important social and economic reasons for reducing the extremely high power consumption of many HPC installations.

However, the recent attention to low power in HPC systems is not merely driven by such “energy-conscious” requirements alone. There have been recent research results, especially spearheaded by those of the BlueGene/L3 group, that seem to indicate that being low power may be fundamental to future system scalability, including future petascale systems, personalized terascale systems, and beyond. The purpose of the articles in this issue is to reveal such new trends and discuss the future of HPC from the perspective of low power computing.

In the remainder of this article, we will show how low power designs in the traditional arena of embedded computing, plus the very interesting ultra low power systems that are now receiving considerable attention, relate to low power HPC. In particular, we will discuss how technologies developed for low power embedded systems might be applicable to low power HPC and what the future holds for further research and development in this area that aims for greater performance in next generation HPC.

Pages: 1 2 3 4 5 6 7

Wu-chun Feng, Los Alamos National Laboratory


Why should the high-performance computing community even care about (low) power consumption? The reasons are at least two-fold: (1) efficiency, particularly with respect to cost, and (2) reliability.

For decades, we have focused on performance, performance, and occasionally, price/performance, as evidenced by the Top500 Supercomputer List1 as well as the Gordon Bell Awards for Performance and Price/Performance at SC.2 So, to achieve better performance per compute node, microprocessor vendors have not only doubled the number of transistors (and speed) every 18-24 months, but they have also doubled the power densities, as shown in Figure 1. Consequently, keeping a large-scale high-performance computing (HPC) system functioning properly requires continual cooling in a large machine room, or even a new building, thus resulting in substantial operational costs. For instance, given that the cooling bill alone at Lawrence Livermore National Laboratory (LLNL) is $6M/year and given that for every watt (W) of power consumed by an HPC system at LLNL, 0.7 W of cooling is needed to dissipate the power; the annual cost to both power and cool HPC systems at LLNL amounts to a total of $14.6M per year, and this does not even include the costs of acquisition, integration, upgrading, and maintenance.3 Furthermore, when nodes consume and dissipate more power, they must be spaced out and aggressively cooled; otherwise, such power causes the temperature of a system to increase rapidly enough that for every 10º C increase in temperature, the failure rate doubles, as per Arrhenius’ equation as applied to microelectronics.4

Figure 1. Moore's Law for Power Consumption

Our own informal empirical data from late 2000 to early 2002 indirectly supports Arrenhius’ equation. In the winter, when the temperature inside our warehouse-based work environment at Los Alamos National Laboratory (LANL) hovered around 21-23º C, our 128-CPU Beowulf cluster — Little Blue Penguin (LBP) — failed approximately once per week. In contrast, the LBP cluster failed roughly twice per week in the summer when the temperature in the warehouse reached 30-32º C. Such failures led to expensive operational and maintenance costs relative to technical staff working to fix the failures and the cost of replacement parts. Furthermore, there is the lost productivity of technical staff due to the failures.

Perhaps more disconcerting is how our warehouse environment affected the results of the Linpack benchmark when running on a dense Beowulf cluster back in 2002: The cluster produced an answer outside the residual (i.e., a silent error) after only ten minutes of execution. Yet when the same cluster was placed in an 18-19º C machine-cooled room, it produced the correct answer. This experience loosely corroborated a prediction made by Graham, et al — “In the near future, soft errors will occur not just in memory but also in logic circuits.”5

Power (and its affect on reliability) is even more of an issue for larger-scale HPC systems, such as those shown in Table 1. Despite having exotic cooling facilities in place, the reliability of these large-scale HPC systems is measured in hours,6 and in all cases, the leading source of outage is hardware, with the cause often being attributed to excessive heat. Consequently, as noted by Eric Schmidt, CEO of Google, what matters most to Google “is not speed but power — low power, because data centers can consume as much electricity as a city.”7 That is, though speed is important, power consumption (and hence, reliability) is more so. By analogy, what Google, and arguably application scientists in HPC, desires is the fuel-efficient, highly reliable, low-maintenance Toyota Camry of supercomputing, not the Formula One race car of supercomputing with its energy inefficiency, unreliability, and exorbitant operational and maintenance costs. In addition, extrapolating today’s failure rates to an HPC system with 100,000 processors suggests that such a system would “spend most of its time checkpointing and restarting. Worse yet, since many failures are heat related, the [failure] rates are likely to increase as processors consume more power.”5

System CPUs Reliability
ASCI Q 8,192 MTBI: 6.5 hours.
Leading outage sources: storage, CPU, memory.
ASCI White 8,192 MTBF: 5.0 hours ('01) and 40 hours ('03).
Leading outage sources: storage, CPU, 3rd-party HW.
PSC Lemieux 3,016 MTBI: 9.7 hours.

Table 1. Reliability of Leading-Edge HPC Systems
MTBI: mean time between interrupts = wall clock hours / # downtime periods, MTBF: mean time between failures (measured)

Pages: 1 2 3 4 5 6 7

Jose Castanos, George Chiu, Paul Coteus, Alan Gara, Manish Gupta, Jose Moreira, IBM T.J. Watson Research Center


In Gulliver’s Travels (1726) by Jonathan Swift, Lemuel Gulliver traveled to various nations. One nation he traveled to, called Lilliput, was a country that consisted of weak pygmies. Another nation, called Brobdingnag, was that of mighty giants. When we build a supercomputer with thousands to more than hundreds of thousands of chips, is it better to choose a few mighty and powerful Brobdingnagian processors, or is it better to start from many Lilliputian processors to achieve the same computational capability? To answer this question, let us trace the evolution of computers.

The first general purpose computer, ENIAC (Electronic Numerical Integrator And Calculator), was publicly disclosed in 1946. It took 200 microseconds to perform a single addition and it was built with 19,000 vacuum tubes. The machine was enormous, 30 m long, 2.4 m high and 0.9 m wide. Vacuum tubes had a limited lifetime and had to be replaced often. The system consumed 200 kW. ENIAC cost the US Ordnance Department $486,804.22.

In December 1947, John Bardeen, Walter Brattain, and William Shockley at Bell Laboratories invented a new switching technology called the transistor. This device consumed less power, occupied less space, and was made more reliable than those of vacuum tubes. Impressed by these attributes, IBM built its first transistor based computer called Model 604 in 1953. By early 1960, transistor technology became ubiquitous. Further drive towards lower power, less space, higher reliability, and lower cost resulted in the invention of integrated circuits in 1959 by Jack Kilby of Texas Instruments. Kilby made his first integrated circuit in germanium. Robert Noyce at Fairchild used a planar process to make connections of components within a silicon integrated circuit in early 1959, which became the foundation of all subsequent generations of computers. In 1966, IBM shipped the System/360 all-purpose mainframe computer made of integrated circuits.

Within the transistor circuit families, the most powerful transistor technology was the bipolar junction transistor (BJT) rather than the CMOS (Complementary Metal Oxide Semiconductor) transistor . However, compared to CMOS transistors, the bipolar ones, using the fastest ECL (emitter coupled logic) circuit, cost more to build, had a lower level of integration, and consumed more power. As a result, the semiconductor industry moved en masse to CMOS in early 1990s. From then on, the CMOS technology became the entrenched technology, and supercomputers were built with the fastest CMOS circuits. This picture lasted until about 2002 where CMOS power and power density rose dramatically to the point that they exceeded those of the corresponding bipolar numbers in the 1990’s. Unfortunately, there was no lower power technology lying in wait to diffuse the crisis. Thus, we find ourselves again at a crossroad to build the next generation supercomputer. According to the “traditional” view, the way to build the fastest and largest supercomputer is to use the fastest microprocessor chips as the building block. The fastest microprocessor is in turn built upon the fastest CMOS switching technology that is available to the architect at the time the chip is designed. This line of thought is sound provided that there are no other constraints to build supercomputers. However, in the real world there are many constraints (heat, component size, etc.) that make this reasoning unsound.

In the mean time, portable devices such as PDAs, cellphones, and laptop computers, developed since the 1990’s, all require low power CMOS technology to maximize the battery recharge interval. In 1999, IBM foresaw the looming power crisis and asked the question whether we could architect supercomputers using low power, low frequency, and inexpensive (Lilliputian) embedded processors to achieve a better collective performance than using high power, high frequency (Brobdingnagian) processors. While this approach has been successfully utilized for special purpose machines such as the QCDOC supercomputer, this counter-intuitive proposal was a significant departure from the traditional approach to supercomputer designs. However, the drive toward lower power and lower cost remained a constant theme throughout.

Pages: 1 2 3 4 5

Conference Report
Fran Berman, SDSC
Ruzena Bajcsy, UC Berkeley


At first blush, the technical issues involved with designing, implementing, and deploying Cyberinfrastructure seem to present the greatest challenges. Integrating diverse resources to deliver aggregate performance, engineering the system to provide both usability and reliability, developing and building adequate user environments to monitor and debug complex applications enabled by Cyberinfrastructure, ensuring the security of Cyberinfrastructure resources, etc. are all immensely difficult technical challenges and all are more or less still works-in-progress.

After ten years of experience since the I-Way Grid experiment at SC’95, and many more years of experience with team-oriented distributed projects and experiences such as the Grand Challenge program from the 1980s, NSF’s large-scale ITR projects, TeraGrid, etc. it is clear that some of the most challenging problems in designing, developing, deploying, and using Cyberinfrastructure arise from the social dynamics of making large-scale, coordinated projects and infrastructure work. From an increasingly substantive experience base with such projects, it is clear that the cultural, organizational, and policy dynamics, as well as the social impact of Cyberinfrastructure will be critical to its success.

The expansion of the focus on social scientists as end users of Cyberinfrastructure to critical designers and process builders of Cyberinfrastructure motivated the organization of the NSF SBE-CISE Workshop on Cyberinfrastructure and the Social Sciences1 in March at Airlie House in Warrenton, Virginia. Targeted to a broad spectrum of decision makers and innovative thinkers in the Social Sciences and Computer Sciences, and organized by a multi-disciplinary team of SBE and CISE researchers including a Political Scientist (Henry Brady, UC Berkeley), an Economist (John Haltiwanger, University of Maryland), and two Computer Scientists (Ruzena Bajcsy from UC Berkeley and Fran Berman from SDSC and UC San Diego), the workshop strived to provide substantive, useful and usable feedback to NSF on programs and activities for which the SBE and CISE communities could partner together to build, deploy, and use Cyberinfrastructure. The Airlie workshop focused on two goals:

  1. To develop a Final Report that lays out a forward path of Cyberinfrastructure research, experimentation, and infrastructure for the SBE and CISE community and provide a framework for projects and efforts in this integrated area.
  2. To provide a venue for community building within the SBE and CISE communities, and in particular, a venue for a multi-disciplinary synergistic community that leverages the perspectives and research of both SBE and CISE constituencies.

Pages: 1 2 3

Opinion Editorial
Fran Berman Director, San Diego Supercomputer Center; HPC Endowed Chair, UC San Diego


Recent articles in community publications have focused on the critical need for capable high performance computing (HPC) resources for the open academic community. Compelling reports from the National Research Council, PITAC, the National Science Board, and others point to our current diminished ability to provide adequate computational and data management support for U.S. researchers, and the impact of insufficient technology capacity and capability on the loss of U.S. competitiveness and leadership.

As stated compellingly and increasingly, adequate capability and capacity in HPC is necessary, but it is not sufficient for leadership and competitiveness in science and engineering. Beyond the gear, concrete and strategic goals are critical to achieve competitiveness in science and engineering.

What do we want to accomplish as a nation in science and engineering? Competitiveness for many is reduced to an HPC “arms race” — who has the top spots on the Top500 list? For others, competitiveness amounts to U.S. dominance in the science and engineering world, represented by the number of awards, prizes, and other recognitions for U.S. researchers. For still others, competitiveness is represented by what researchers and educators see as the diversion of a looming “perfect storm” — decreasing funding for science and engineering in the U.S., increasing outsourcing of people and ideas to Europe, Asia and elsewhere, and decreasing students graduating in the sciences and engineering.

For any definition of competitiveness, the means to the end is a serious application of the Gretzky Rule: “Skate to where the puck will be.” It is clear that we need concrete goals and a plan, timetable, and resources to achieve them. But what should our goals be? Which goals should have priority over others? How should we accomplish our goals? More funding is an easy answer, and indeed, nothing substantive can be done without resources. But leadership, concrete goals, and a strategic plan for achieving these goals ranks just as highly to ensure that funding is well spent and our efforts are successful.

So how can we apply the Gretzky rule to the going definitions of competitiveness?

The Gretzky Rule and Competitiveness in the HPC “Arms Race”

These days, competitiveness in high performance computing is commonly measured by ranking on the Top500 list.1 This approach is inadequate to really measure architectural innovation, robustness, or even performance on applications that do not resemble the Linpack benchmarks; however, it is an easy measure and it has been effective in making the case for competitiveness beyond the scientific community. The current top spot on the list is occupied by Livermore’s Blue Gene, however the emergence several years ago of the Japanese Earth Simulator (now at spot 4) provided a “wake up call” (Dongarra called it “computenik” in the New York Times) to the U.S.

The Earth Simulator provides a textbook application of the Gretzky Rule: Japan committed roughly 5 years and 500 million dollars to planning and executing the Earth Simulator, which stayed at the top spot on the Top500 list between June 2002 and June 2004 inclusive. Careful planning, investment, and commitment enabled the Earth Simulator to create an impact which is still being felt in the U.S. and Europe.

So what did we learn about competitiveness from the Earth Simulator? A concrete goal achieved by strategic planning, commitment, and resources over an appropriate timeframe made this a reality.

Pages: 1 2

Reference this article
Berman, F. "On Perfect Storms, Competitiveness, and the “Gretzky Rule”," CTWatch Quarterly, Volume 1, Number 3, August 2005. http://www.ctwatch.org/quarterly/articles/2005/08/on-perfect-storms-competitiveness-and-the-gretzky-rule/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.