February 2007
The Promise and Perils of the Coming Multicore Revolution and Its Impact
John L. Manferdelli, Microsoft Corporation


David Patterson of Berkeley has a formulaic summary of the serial performance problem: "The power wall + the memory wall + the ILP wall = a brick wall for serial performance." Thus, the heroic line of development followed by materials scientists and computer designers to increase serial performance now yields diminishing returns. Computer architects have been forced to turn to parallel architectures to continue to make progress. Parallelism can be exploited by adding more independent CPUs, data-parallel execution units, additional registers sets (hardware threads), more independent memory controllers to increase memory bandwidth (this requires more output pins) and bigger caches. Computer architects can also consider incorporating different execution units, which dramatically improve some computations but not others (e.g., GPU like units that excel at structured data parallelism and streaming execution units with local memory as Cray did in many of its early machines3). Heterogeneity need not only mean completely different "abstract" execution unit models but may include incorporating computation engines with the same instruction set architecture, but different performance and power consumption characteristics. All of these take advantage of dramatically higher on-chip interconnect data rates.

Moore's law will grant computer architects ever more gates for the foreseeable future, and the challenge is to use them to deliver performance and power characteristics fit for their intended purpose. Figure 1 below illustrates a few hardware design choices. In 1(a), a client configuration might consist of two large "out of order" cores (OoC) incorporating all the ILP of current processors to run existing programs together with many smaller "in-order" cores (IoC) for programs that can take advantage of highly parallel software. Why many IoCs rather than correspondingly fewer of the larger OoCs? The reason is that spending gates on out-of order has poorer performance returns than simpler in-order cores, if parallel software can scale with cores. The server configuration in 1(b) incorporates many more IoCs and a "custom" core (say a crypto processor). Finally, multi-core computers are not just beneficial for raw performance but also reliability and power management so embedded processors will also undergo an architecture shift as illustrated in 1(c).

Figure 1

Figure 1. (a, b, c – clockwise from upper left). Sample client, server, embedded chips.

Pages: 1 2 3 4 5 6

Reference this article
Manferdelli, J. "The Many-Core Inflection Point for Mass Market Computer Systems," CTWatch Quarterly, Volume 3, Number 1, February 2007. http://www.ctwatch.org/quarterly/articles/2007/02/the-many-core-inflection-point-for-mass-market-computer-systems/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.