November 2007
Software Enabling Technologies for Petascale Science
E. Wes Bethel, Lawrence Berkeley National Laboratory
Chris Johnson, University of Utah
Cecilia Aragon, Lawrence Berkeley National Laboratory
Prabhat, Lawrence Berkeley National Laboratory
Oliver Rübel, Lawrence Berkeley National Laboratory
Gunther Weber, Lawrence Berkeley National Laboratory
Valerio Pascucci, Lawrence Livermore National Laboratory
Hank Childs, Lawrence Livermore National Laboratory
Peer-Timo Bremer, Lawrence Livermore National Laboratory
Brad Whitlock, Lawrence Livermore National Laboratory
Sean Ahern, Oak Ridge National Laboratory
Jeremey Meredith, Oak Ridge National Laboratory
George Ostrouchov, Oak Ridge National Laboratory
Ken Joy, University of California, Davis
Bernd Hamann, University of California, Davis
Christoph Garth, University of California, Davis
Martin Cole, University of Utah
Charles Hansen, University of Utah
Steven Parker, University of Utah
Allen Sanderson, University of Utah
Claudio Silva, University of Utah
Xavier Tricoche, University of Utah

Why Visualization Works So Well

One of the reasons that scientific visualization, and visual data analysis, has proven to be highly effective in knowledge discovery is because it leverages the human cognitive system. Pseudocoloring, a staple visualization technique, performs a mapping of data values to colors in images to take advantage of this very ability. Figure 2 is a good example, where high data values are mapped to a specific color that attracts the eye. Additionally, a very clear 3D structure becomes apparent in this image; it would be virtually impossible to “see” such structure by looking at a large table of numbers. While Figure 2 shows a 3D example, we are all familiar with 2D versions of this technique; the weather report on the evening news often shows pseudo-colored representations of temperature or levels of precipitation overlaid on a map.

Figure 2

Figure 2. Two types of “features” are immediately visible in this image showing the entropy field of a radiation/hydrodynamic simulation that models the accretion-induced collapse of a star, a phenomena that produces supernovae. One “feature” is the “sandwiching” of high values of entropy between lower values. The other is an overall sense of 3D structure. (Simulation data courtesy of Adam Burrows, University of Arizona, SciDAC Science Application “The Computational Astrophysics Consortium,'” image courtesy of the Visualization Group, Lawrence Berkeley National Laboratory.)

Surviving the Data Tsunami

Many “tried and true” visualization techniques – like using psuedocoloring to map scalar data values to color – do a great job of leveraging the human cognitive system to accelerate discovery and understanding of complex phenomena. However, we are faced with some difficult challenges when considering the notion of using visualization as a knowledge discovery vehicle on very large datasets. One of many challenges is limited human cognitive bandwidth, which is conveyed in the notional chart shown in Figure 3. This chart conveys that while our ability to generate, collect, store and analyze data grows at a rate tracking the increase in processor speed and storage density, we as humans have fixed cognitive capacity to absorb information. Given that our ability to generate data far exceeds what we can possibly understand, one major challenge for “petascale visual data exploration and analysis” is how to effectively “impedance match” between “limitless data” and a fixed human cognitive capacity.

Figure 3

Figure 3. While our ability to generate, collect, store, and analyze data grows at a rate that tracks the increase in processor speed and storage density, our ability as humans to absorb information remains fixed (Illustration adapted from a slide by J. Heer, PARC User Interface Research Group).

What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” Herb Simon, as quoted by Hal Varian.5

In the context of our work – namely, petascale visual data analysis – we are faced with several dilemmas. First, even if we could simply scale up our existing tools and algorithms so they would operate at the petascale rather than the terascale, would the results be useful for knowledge discovery? Second, if the answer to the first question is “no,” then how can we help to “allocate attention efficiently among the overabundance of information”?

Let’s examine the first question a bit more closely. First, let’s assume that we’re operating in a gigabyte-sized dataset (109 data points), and we’re displaying the results in a monitor that has, say, 2 million pixels (2*106 pixels). For the sake of discussion, let’s assume we’re going to create and display an isosurface of this dataset. Studies have shown that on the order of about N2/3 , grid cells in a dataset of size N3 will contain any given isosurface.6 In our own work, we have found this estimate to be somewhat low – our results have shown the number to be closer to N0.8 for N3 data. Also, we have found an average of about 2.4 triangles per grid cell will result from the isocontouring algorithm. If we use these two figures as lower and upper bounds, then for our gigabyte-sized dataset, we can reasonably expect on the order of between about 2.1 and 40 million triangles for many isocontouring levels. At a display resolution of about 2 million pixels, the result is a depth complexity – the number of objects at each pixel along all depths – of between 1 and 20.

With increasing depth complexity come at least two types of problems. First, more information is “hidden from view.” In other words, the nearest object at each pixel hides all the other objects that are further away. Second, if we do use a form of visualization and rendering that supports transparency – so that we can, in principle, see all the objects along all depths at each pixel – we are assuming that a human observer will be capable of distinguishing among the objects in depth. At best, this latter assumption does not always hold true, and at worst, we are virtually guaranteed the viewer will not be able to gain any meaningful information from the visual information overload.

If we scale up our dataset from gigabyte (109) to terabyte (1012), then we can expect on the order of between 199 million and 9.5 billion triangles representing a depth complexity ranging between about 80 and 4700, respectively. Regardless of which estimate of the number of triangles we use, we end up drawing the same conclusion: depth complexity and, correspondingly, scene complexity and human workload, grow linearly with the size of the source data. Even if we are able to somehow display all those triangles, we would be placing an incredibly difficult burden on the user. He or she will be facing the impossible task of visually trying to locate “smaller needles in a larger haystack.”

The multi-faceted approach we’re adopting takes square aim at the fundamental objective: help the scientific researchers more quickly and efficiently do science. In one view, one primary tactical approach that seems promising is to help focus user attention on easily consumable images from the large data collection. We do not have enough space in this brief article to cover all aspects of our team’s effort in this regard. Instead, we provide a few details about a couple of especially interesting challenge areas.

Pages: 1 2 3 4 5 6 7 8

Reference this article
Bethel, E. W., Johnson, C., Aragon, C., Prabhat, Rübel, O., Weber, G., Pascucci, V., Childs, H., Bremer, P.-T., Whitlock, B., Ahern, S., Meredith, J., Ostrouchov, G., Joy, K., Hamann, B., Garth, C., Cole, M., Hansen, C., Parker, S., Sanderson, A., Silva, C., Tricoche, X. "DOE's SciDAC Visualization and Analytics Center for Enabling Technologies - Strategy for Petascale Visual Data Analysis Success," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/does-scidac-visualization-and-analytics-center-for-enabling-technologies-strategy-for-petascale-visual-data-analysis-success/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.