May 2006
Designing and Supporting Science-Driven Infrastructure
Ralph Roskies, Pittsburgh Supercomputing Center
Thomas Zacharia, Oak Ridge National Laboratory

On-going non-personnel expenses

The power costs must not only take into account the power needs of the computer, but also the cost of the cooling. As a rule of thumb, multiply the power consumption of the system alone by 35-40% to estimate the additional power consumption of the required cooling. Today’s rates for power vary substantially over the country, ranging from under 3 cents/kwh to over 10 cents/kwh.

First year maintenance may be included in the price of a new system. After that, unless the purchase has explicitly included multi-year maintenance, annual maintenance costs seem to range between 4-8% of the purchase price of the machine. It is not necessary to get a maintenance contract with extremely rapid response. For a system with a large node count, it is much more important to be able to remove a node from the system rapidly, reconfigure, preferably with spares, and continue. Next day service may be adequate for the vendor to then do any required hardware maintenance on the removed nodes. It is almost always better to negotiate maintenance options with the vendor while negotiating for the original system, for that is when you have most leverage with the vendor. It is wise to structure these as annual options so that you can cancel the maintenance contract with the vendor if you can find a better deal.

Operation expenses can be kept down by developing operator-free systems. For this, you need an extensive alerting infrastructure, which relays system events to system administrators via pagers or text messaging on their cell phones. Underlying it is a monitoring system extensive and reliable enough to report any of the anomalies that system operators would likely catch. You actually need a hierarchy of monitoring, from simple pass/fail on individual low level devices, like nodes, disks, etc. to high level testing of several components in sequence and verifying that the end-to-end results are correct.

As a new trend, the four to five year operating cost including maintenance, space, power, and cooling of a major computer, which for many years was a small part of the total cost of ownership of a system, is now becoming a much more significant factor, and may even exceed the original capital investment.


Increasingly, system software for debuggers, mathematical libraries, job scheduling, performance analysis, and even compilers, is provided by companies other than the hardware vendor. The cost of this required third party software can be substantial, and often the suppliers do not have early access to hardware from the vendors. Make certain that you understand exactly what software will be supplied with the system, and what arrangements the vendor has with the independent software vendors who will supply these other needed tools. The cost of these licenses can be large. However, it is not always necessary to license tools such as debuggers for the full system. For example, debugging tools are not very effective above 100-200 tasks, so don’t bother to license the debugger for 2000 nodes. This can save a substantial amount of money. There are high-quality, robust mathematical libraries that are available for free from universities and government laboratories as a result of many years of development from the NSF and DOE. Often, vendors have optimized versions of these libraries available for their systems. 

Pages: 1 2 3 4

Reference this article
Roskies, R., Zacharia, T. "Designing and Supporting High-end Computational Facilities," CTWatch Quarterly, Volume 2, Number 2, May 2006. http://www.ctwatch.org/quarterly/articles/2006/05/designing-and-supporting-high-end-computational-facilities/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.