February 2007
The Promise and Perils of the Coming Multicore Revolution and Its Impact
Jack Dongarra, Oak Ridge National Laboratory; University of Tennessee


Over the past few years, the familiar idea that software packages can have a life of their own has been extended in a very interesting way. People are seriously exploring the view that we should think of software as developing in and depending on a kind of ecological system, a complex and dynamic web of other software, computing platforms, people and organizations.1 To the more technical segment of the cyberinfrastructure community, this concept of a software ecology may seem to be just a metaphor, more suited to marketing than sound analysis and planning. Yet we all know that good metaphors are often essential when it comes to finding a productive way to think and talk about highly complex situations that are not well understood, but which none the less have to be confronted. The dramatic changes in computing discussed in this issue of CTWatch QuarterlyThe Promise and Perils of the Coming Multicore Revolution and Its Impact – represent an extreme case of just such a situation. We should therefore expect that the heuristic value of the software ecology model will be put to a pretty severe test.

All of the articles in this issue mention one main cause of the multicore revolution that echoes, in a surprising way, an environmental concern very much in today's news – system overheating. The underlying physical system on which all software ecologies depend, i.e., the computer chip, has traditionally been designed in a way that required them to get hotter as they got faster, but now that process has reached its limit. The collision of standard processor designs with this thermal barrier, as well as with other stubborn physical limits, has forced processor architects to develop chip designs whose additional computing power can only be utilized by software that can effectively and efficiently exploit parallelism. Precious little of the software now in use has that capability. Precious few in the computing community have any idea of how to change the situation any time soon. As the hardware habitat for software adapted for serial processing declines, and the steep challenges of creating good parallel software become more and more evident, the consequences of the discontinuity thereby produced seem destined to reverberate through nearly every element of our software ecosystem, including libraries, algorithms, operating systems, programming languages, performance tools, programmer training, project organization, and so on.

Since the impacts of these changes are so broad, far reaching and largely unknown, it is important in discussing them to have different points of view within the global software ecosystem represented. The sample of views presented here includes one from a group of academic researchers in high performance computing, and three from different industry niches. As one would expect, despite certain commonalities, each of them highlights somewhat different aspects of the situation.

The article from the more academic perspective, authored by Dennis Gannon, Geoffrey Fox, the late Ken Kennedy, and myself, focuses primarily on the relatively small but important habitat of Computational Science. Starting from the basic thesis that science itself now requires and has developed a software ecosystem that needs stewardship and investment, we provide a brief characterization of three main disruptors of the status quo: physical limits on clock rates and voltage, disparities between processor speed and memory bandwidth, and economic pressures encouraging heterogeneity at the high end. Since the HPC community has considerably more experience with parallel computing than most other communities, it is in a better position to communicate some lessons learned from science and engineering applications about scalable parallelism. The chief one is that scalable parallel performance is not an accident. We look at what these lessons suggest about the issues that commodity applications might face and draw out some of their future implications for the critical areas of numerical libraries and compiler technologies.

Pages: 1 2

Jack Dongarra, Oak Ridge National Laboratory; University of Tennessee
Dennis Gannon, Indiana University
Geoffrey Fox, Indiana University
Ken Kennedy, Rice University

1. Introduction

The idea that computational modeling and simulation represents a new branch of scientific methodology, alongside theory and experimentation, was introduced about two decades ago. It has since come to symbolize the enthusiasm and sense of importance that people in our community feel for the work they are doing. But when we try to assess how much progress we have made and where things stand along the developmental path for this new "third pillar of science," recalling some history about the development of the other pillars can help keep things in perspective. For example, we can trace the systematic use of experiments back to Galileo in the early 17th century. Yet for all the incredible successes it enjoyed over its first three centuries, the experimental method arguably did not fully mature until the elements of good experimental design and practice were finally analyzed and described in detail by R. A. Fisher and others in the first half of the 20th century. In that light, it seems clear that while Computational Science has had many remarkable youthful successes, it is still at a very early stage in its growth.

Many of us today who want to hasten that growth believe that the most progressive steps in that direction require much more community focus on the vital core of Computational Science: software and the mathematical models and algorithms it encodes. Of course the general and widespread obsession with hardware is understandable, especially given exponential increases in processor performance, the constant evolution of processor architectures and supercomputer designs, and the natural fascination that people have for big, fast machines. But when it comes to advancing the cause of computational modeling and simulation as a new part of the scientific method, there is no doubt that the complex software "ecosystem" it requires must take its place on the center stage.

At the application level, the science has to be captured in mathematical models, which in turn are expressed algorithmically and ultimately encoded as software. Accordingly, on typical projects the majority of the funding goes to support this translation process that starts with scientific ideas and ends with executable software, and which over its course requires intimate collaboration among domain scientists, computer scientists and applied mathematicians. This process also relies on a large infrastructure of mathematical libraries, protocols and system software that has taken years to build up and that must be maintained, ported, and enhanced for many years to come if the value of the application codes that depend on it are to be preserved and extended. The software that encapsulates all this time, energy, and thought routinely outlasts (usually by years, sometimes by decades) the hardware it was originally designed to run on, as well as the individuals who designed and developed it.

Thus the life of Computational Science revolves around a multifaceted software ecosystem. But today there is (and should be) a real concern that this ecosystem of Computational Science, with all its complexities, is not ready for the major challenges that will soon confront the field. Domain scientists now want to create much larger, multi-dimensional applications in which a variety of previously independent models are coupled together, or even fully integrated. They hope to be able to run these applications on Petascale systems with tens of thousands of processors, to extract all the performance these platforms can deliver, to recover automatically from the processor failures that regularly occur at this scale, and to do all this without sacrificing good programmability. This vision of what Computational Science wants to become contains numerous unsolved and exciting problems for the software research community. Unfortunately, it also highlights aspects of the current software environment that are either immature or under funded or both.1

Pages: 1 2 3 4 5 6

John L. Manferdelli, Microsoft Corporation


Major changes in the commercial computer software industry are often caused by significant shifts in hardware technology. These changes are often foreshadowed by hardware and software technology originating from high performance, scientific computing research. Innovation and advancement in both communities have been fueled by the relentless, exponential improvement in the capability of computer hardware over the last 40 years and much of that improvement (keenly observed by Gordon Moore and widely known as "Moore's Law") was the ability to double the number of microelectronic devices that could be crammed onto a constant area of silicon (at a nearly constant cost) every two years or so. Further, virtually every analytical technique from the scientific community (operations research, data mining, machine learning, compression and encoding, signal analysis, imaging, mapping, simulation of complex physical and biological systems, cryptography) has become widely deployed, broadly benefiting education, health care and entertainment as well as enabling the world-wide delivery of cheap, effective and profitable services from eBay to Google.

In stark contrast to the scientific community, commercial application software programmers have not, until recently, had to grapple with massively concurrent computer hardware. While Moore's law continues to be a reliable predictor of the aggregate computing power that will be available to commercial software, we can expect very little improvement in serial performance of general purpose CPUs. So if we are to continue to enjoy improvements in software capability at the rate we have become accustomed to, we must use parallel computing. This will have a profound effect on commercial software development including the languages, compilers, operating systems, and software development tools, which will in turn have an equally profound effect on computer and computational scientists.

Computer Architecture: What happened?

Power dissipation in clocked digital devices is proportional to the clock frequency, imposing a natural limit on clock rates. While compensating scaling has enabled commercial CPUs to increase clock speed by a factor of 4,000 in the last 10 years, the ability of manufacturers to dissipate heat has reached a physical limit. Leakage power dissipation gets worse as gates get smaller, because gate dielectric thicknesses must proportionately decrease. As a result, a significant increase in clock speed without heroic (and expensive) cooling is not possible. Chips would simply melt. This is the "Power Wall" confronting serial performance, and our back is firmly against it: Significant clock-frequency increases will not come without heroic measures, or materials technology breakthroughs.

Not only does clock speed appear to be limited, but memory performance improvement increasingly lags behind processor performance improvement. This introduces a problematic and growing memory latency barrier to computer performance improvements. To try to improve the "average memory reference" time to fetch or write instructions or data, current architectures have ever growing caches. Cache misses are expensive, causing delays of hundreds of (CPU) clock cycles. The mismatch in memory speed presents a "Memory Wall" for increased serial performance.

In addition to the performance improvements that have arisen from frequency scaling, hardware engineers have also improved performance, on average, by having duplicate hardware speculatively execute future instructions before the results of current instructions are known, while providing hardware safeguards to prevent the errors that might be caused by out of order execution.1 Unfortunately, branches must be "guessed" to decide what instructions to execute simultaneously (if you guess wrong, you throw away this part of the result) and data dependencies may prevent successive instructions from executing in parallel, even if there are no branches. This is called Instruction Level Parallelism (ILP). A big benefit of ILP is that existing programs enjoy performance benefits without any modification. But ILP improvements are difficult to forecast since the "speculation" success is difficult to predict, and ILP causes a super-linear increase in execution unit complexity (and associated power consumption) without linear speedup. Serial performance acceleration using ILP has also stalled because of these effects.2 This is the "ILP Wall."

Pages: 1 2 3 4 5 6

John McCalpin, Advanced Micro Devices, Inc.
Chuck Moore, Advanced Micro Devices, Inc.
Phil Hester, Advanced Micro Devices, Inc.


The title of this issue, "The Promise and Perils of the Coming Multicore Revolution and Its Impact", can be seen as a request for an interpretation of what multicore processors mean now, or as an invitation to try to understand the role of multicore processors from a variety of different directions. We choose the latter approach, arguing that, over time, multicore processors will play several key roles in an ongoing, but ultimately fundamental, transformation of the computing industry.

This article is organized as an extended discussion of the meaning of the terms "balance" and "optimization" in the context of microprocessor design. After an initial review of terminology (Section 1), the remainder of the article is arranged chronologically, showing the increasing complexity of "balance" and "optimization" over time. Section 2 is a retrospective, "why did we start making multiprocessors?" Section 3 discusses ongoing developments in system and software design related to multicore processors. Section 4 extrapolates current trends to forecast future complexity, and the article concludes with some speculations on the future potential of a maturing multicore processor technology.1

1. Background
Key Concepts

Before attempting to describe the evolution of multicore computing, it is important to develop a shared understanding of nomenclature and goals. Surprisingly, many commonly used words in this area are used with quite different meanings across the spectrum of producers, purchasers, and users of computing products. Investigation of these different meanings sheds a great deal of light on the complexities and subtleties of the important trade-offs that are so fundamental to the engineering design process.


Webster's online dictionary lists twelve definitions for "balance."2 The two that are most relevant here are:

  • 5 a : stability produced by even distribution of weight on each side of the vertical axis b : equipoise between contrasting, opposing, or interacting elements c : equality between the totals of the two sides of an account
  • 6 a : an aesthetically pleasing integration of elements

Both of these conceptualizations appear to play a role in people's thinking about "balance" in computer systems. The former is quantitative and analytical, but attempts to apply this definition quickly run into complications. What attributes of a computer system should display an "even distribution"? Depending on the context, the attributes of importance might include cost, price, power consumption, physical size, reliability, or any of a host of (relatively) independent performance attributes. There is no obvious "right answer" in choosing which of these attributes to "balance" or how to combine multiple comparisons into a single "balanced" metric.

The aesthetic definition of "balance" strikes an emotional chord when considering the multidimensional design/configuration space of computer systems, but dropping the quantitative element eliminates the utility of the entire "balance" concept when applied to the engineering and business space.

The failure of these standard definitions points to a degree of subtlety that cannot be ignored. We will happily continue to use the word "balance" in this article, but with the reminder that the word only makes quantitative sense in the context of a clearly defined quantitative definition of the relevant metrics, including how they are composed into a single optimization problem.


Webster's online dictionary lists only a single definition for "optimization":3

  • : an act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, or effective as possible; specifically : the mathematical procedures (as finding the maximum of a function) involved in this

For this term, the most common usage error among computer users draws from the misleadingly named "optimizers" associated with compilers. As a consequence of this association, the word "optimization" is often used as a synonym for "improvement." While the terms are similar, optimization (in its mathematical sense) refers to minimizing or maximizing a particular objective function in the presence of constraints (perhaps only those provided by the objective function itself). In contrast, "improvement" means to "make something better" and does not convey the sense of trade-off that is fundamental to the concept of an "optimization" problem. Optimization of a computer system design means choosing parameters that minimize or maximize a particular objective function, while typically delivering a "sub-optimal" solution for other objective functions. Many parameters act in mutually opposing directions for commonly used objective functions, e.g., low cost vs high performance, low power vs high performance, etc. Whenever there are design parameters that act in opposing directions, the "optimum" design point will depend on the specific quantitative formulation of the objective function – just exactly how much is that extra 10% in performance (or 10% reduction in power consumption, etc.) worth in terms of the other design goals?

Pages: 1 2 3 4 5 6 7 8 9

Dave Turek, IBM


Over the past several years, public sector institutions (universities and government) and commercial enterprises have deployed supercomputing systems at an unparalleled rate, courtesy of favorable acquisition economics and compelling technological and business process innovation.

Manufacturers and producers of supercomputers have leveraged the price performance improvements of commodity microprocessors, the innovation in interconnect technologies, and the rise of Linux and open source software to reach classes of consumers unheard of as recently as five years ago.

In June of 1997, the so-called "fastest computer in the world," the ASCI Red machine at Sandia National Laboratory, was the first system exceeding a teraflop in compute power.1 Today, the same amount of compute power could be acquired for around $200,000, making supercomputing affordable to small companies, single academic departments and, in some cases, even individual researchers.

While these dramatic improvements in systems affordability have been taking place, the ability to extract value from the system is principally the consequence of well written and effective software. Here, the story for the industry is not quite as sanguine. From our customers, we hear that these systems are still difficult to use (for complete exploitation), and the applications they need either need to be ported, or even rewritten, to properly take advantage of all the hardware innovation in modern supercomputers.

This view is universal and strikes at the heart of the economic or scientific competitiveness of the institution: "Today's computational science ecosystem is unbalanced, with a software base that is inadequate to keep pace with and support evolving hardware and application needs … The result is greatly diminished productivity for both researchers and computing systems."2

As we contemplate each new hardware innovation for supercomputing, we must understand at the most fundamental level that software is the key to unlocking the value of the system for the benefit of the enterprise or the researcher.

Pages: 1 2 3

Reference this article
Turek, D. "High Performance Computing and the Implications of Multi-core Architectures," CTWatch Quarterly, Volume 3, Number 1, February 2007. http://www.ctwatch.org/quarterly/articles/2007/02/high-performance-computing-and-the-implications-of-multi-core-architectures/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.