Jack Dongarra, Oak Ridge National Laboratory; University of Tennessee
CTWatch Quarterly
February 2007

Over the past few years, the familiar idea that software packages can have a life of their own has been extended in a very interesting way. People are seriously exploring the view that we should think of software as developing in and depending on a kind of ecological system, a complex and dynamic web of other software, computing platforms, people and organizations.1 To the more technical segment of the cyberinfrastructure community, this concept of a software ecology may seem to be just a metaphor, more suited to marketing than sound analysis and planning. Yet we all know that good metaphors are often essential when it comes to finding a productive way to think and talk about highly complex situations that are not well understood, but which none the less have to be confronted. The dramatic changes in computing discussed in this issue of CTWatch QuarterlyThe Promise and Perils of the Coming Multicore Revolution and Its Impact – represent an extreme case of just such a situation. We should therefore expect that the heuristic value of the software ecology model will be put to a pretty severe test.

All of the articles in this issue mention one main cause of the multicore revolution that echoes, in a surprising way, an environmental concern very much in today's news – system overheating. The underlying physical system on which all software ecologies depend, i.e., the computer chip, has traditionally been designed in a way that required them to get hotter as they got faster, but now that process has reached its limit. The collision of standard processor designs with this thermal barrier, as well as with other stubborn physical limits, has forced processor architects to develop chip designs whose additional computing power can only be utilized by software that can effectively and efficiently exploit parallelism. Precious little of the software now in use has that capability. Precious few in the computing community have any idea of how to change the situation any time soon. As the hardware habitat for software adapted for serial processing declines, and the steep challenges of creating good parallel software become more and more evident, the consequences of the discontinuity thereby produced seem destined to reverberate through nearly every element of our software ecosystem, including libraries, algorithms, operating systems, programming languages, performance tools, programmer training, project organization, and so on.

Since the impacts of these changes are so broad, far reaching and largely unknown, it is important in discussing them to have different points of view within the global software ecosystem represented. The sample of views presented here includes one from a group of academic researchers in high performance computing, and three from different industry niches. As one would expect, despite certain commonalities, each of them highlights somewhat different aspects of the situation.

The article from the more academic perspective, authored by Dennis Gannon, Geoffrey Fox, the late Ken Kennedy, and myself, focuses primarily on the relatively small but important habitat of Computational Science. Starting from the basic thesis that science itself now requires and has developed a software ecosystem that needs stewardship and investment, we provide a brief characterization of three main disruptors of the status quo: physical limits on clock rates and voltage, disparities between processor speed and memory bandwidth, and economic pressures encouraging heterogeneity at the high end. Since the HPC community has considerably more experience with parallel computing than most other communities, it is in a better position to communicate some lessons learned from science and engineering applications about scalable parallelism. The chief one is that scalable parallel performance is not an accident. We look at what these lessons suggest about the issues that commodity applications might face and draw out some of their future implications for the critical areas of numerical libraries and compiler technologies.

John Manferdelli of Microsoft explores the situation for commercial application and system programmers, where the relative paucity of experience with parallel computing is liable to make the "shock of the new," delivered by the multicore revolution, far more severe. After presenting David Patterson's compact formulation of how the performance of serial processing has been indefinitely barred from further improvements by the combined force of the "power wall," the "memory wall," and the "ILP wall," he describes several complementary approaches for getting more concurrency into programming practice. But to be successful in helping commercial developers across the concurrency divide, these new development tools and techniques will require improvements on other fronts, most especially in operating systems. He sketches an important preview of how we may expect operating systems to be adapted for concurrency, providing new approaches to resource sharing that allow different system subcomponents to have flexible access to dedicated resources for specialized purposes.

A view from the inside of the multicore revolution, offering an extended discussion of the critical factors of "balance" and "optimization" in processor design, is provided in the article by John McCalpin, Chuck Moore, and Phil Hester of Advanced Micro Devices (AMD). Their account of motivation for adopting a multicore design strategy is much more concrete and quantitative than the previous articles, as is only appropriate for authors who had to grapple with this historic problem at first hand and as a mission critical goal. They offer a fascinating peek inside the rationale for the movement to multicore, laying out the lines of reasoning and the critical tradeoffs and considerations (including things like market trends) that lead to different design points for processor manufacturers. Against this background, the speculations they offer about what we might expect in the near and mid-range future are bound to have more credibility than similar exercises by others not so well positioned.

Finally, the article by David Turek of IBM highlights a distinctly different but equally important strand in the multicore revolution: the introduction of new hybrid multicore architectures and their application to supercomputing architectures. A conspicuous example of this significant trend is use of the Cell Broadband Engine processor, created by an industry consortium to power Sony's next generation PlayStations, in the construction of novel supercomputer designs, like the Roadrunner system at Los Alamos National Laboratory. Such hybrid systems provide convincing illustrations of how unexpected combinations of technological and economic forces in the software ecosystem can combine to produce new innovation. The dark side of this trend toward heterogeneity, however, is that it severely complicates the planning process of small ISV's, who have only scarce resources to apply to the latest hardware innovations.

It is hard to read the articles in this issue of CTWatch Quarterly without coming to the conclusion, agreed upon by many other leaders in the field, that modern software ecosystems are about to be destabilized, not to say convulsed, by a major transformation in their hardware substrate. Over time, this may actually improve the health of the software by changing people's attitudes about the value of software. There have long been complaints, especially in the HPC community, that software is substantially undervalued, with inadequate investments, beyond the initial research phase, in software hardening, enhancement, and long term maintenance. New federal programs, such as the NSF's Software Development for Cyberinfrastructure (SDCI), represent a good first step in recognizing that good software has become absolutely essential to productivity in many areas. Yet I believe the multicore revolution, which is now upon us, will drive home the need to make that recognition into a guiding principle of our national research strategy. For that reason alone, but for many others as well, the CTWatch community, and the scientific computing community in general, will miss the incandescent presence of Ken Kennedy, who died earlier this month. The historic challenges we are about to confront will require the very kind of visionary leadership for which Ken had all the right stuff, as he showed over and over again during his distinguished career. I will miss my friend.

1 Messerschmitt, D. G., Szyperski, C. Software ecosystem : understanding an indispensable technology and industry. Cambridge, MA: MIT Press, 2003.

URL to article: