November 2007
Software Enabling Technologies for Petascale Science
Arie Shoshani, Lawrence Berkeley National Laboratory
Ilkay Altintas, San Diego Supercomputer Center
Alok Choudhary, Northwestern University
Terence Critchlow, Pacific Northwest National Laboratory
Chandrika Kamath, Lawrence Livermore National Laboratory
Bertram Ludäscher, University of California, Davis
Jarek Nieplocha, Pacific Northwest National Laboratory
Steve Parker, University of Utah
Rob Ross, Argonne National Laboratory
Nagiza Samatova, Oak Ridge National Laboratory
Mladen Vouk, North Carolina State University

Advanced I/O Infrastructure

As high-performance computing applications scale and move from performing simulation and computing to data analysis they become tremendously data-intensive, creating a potential bottleneck in the entire scientific discovery cycle. At the same time, it is a well-known phenomenon that I/O access rates have not kept pace with high-performance computing performance as a whole. Because of this phenomenon, it becomes increasingly important for us to extract the highest possible performance from the I/O hardware that is available to us. Even if the raw hardware capacity for storage and I/O is available in an infrastructure, the complexity arising from the scale and parallelism is daunting and requires significant advances in software to provide the required performance to applications.

Figure 8

Figure 8. The I/O stack.

The Storage Efficient Access (SEA) component provides the software infrastructure necessary for efficient use of the I/O hardware by applications. This is accomplished through a sequence of tightly coupled software layers, shown in Figure 8, building on top of I/O hardware at the bottom and providing application-oriented, high-level I/O interfaces at the top. Three APIs are made available for accessing SEA components: Parallel netCDF at the high-level I/O library level, ROMIO at the MPI-IO level, and Parallel Virtual File System (PVFS) at the file level.

Figure 9

Figure 9. Serialization problems in original netCDF removed in Parallel netCDF to achieve a 10 fold performance increase.

PVFS9 can provide multiple GB/second parallel access rates, and is freely available. Above the parallel file system is software designed to aid applications in more efficiently accessing the parallel file system. Implementations of the MPI-IO interface are arguably the best example of this type of software. MPI-IO provides optimizations that help map complex data movement into efficient parallel file system operations. Our ROMIO10 MPI-IO interface implementation is freely distributed and is the most popular MPI-IO implementation for both clusters and a wide variety of vendor platforms. MPI-IO is a powerful but low-level interface that operates in terms of basic types, such as floating point numbers, stored at offsets in a file. However, some scientific applications desire more structured formats that map more closely to the application’s use, such as multidimensional datasets. NetCDF11 is a widely used API and portable file format that is popular in the climate simulation and data fusion communities. As part of the work in the SDM center, a parallel version of NetCDF (pNetCDF) was developed. It provides a new interface for accessing NetCDF data sets in parallel. This new parallel API closely mimics the original API, but is designed with scalability in mind and is implemented on top of MPI-IO. Performance evaluations using micro-benchmarks as well as application I/O kernels have shown major scalability improvements over previous efforts. Figure 9 shows schematically the concept of adding a parallel netCDF layer to eliminate serialization through a single processor.

Upcoming systems will incorporate hundreds of thousands of compute processors along with support nodes. Using POSIX and MPI-IO interfaces, I/O operations will be forwarded through a set of I/O nodes to storage targets. Work is underway to develop efficient forwarding systems to match petascale architectures and to best connect to underlying file systems, including PVFS.

Active Storage

Despite recent advancements in storage technologies for many data intensive applications, analysis of data remains a serious bottleneck. In traditional cluster systems, I/O-intensive tasks must be performed in the compute nodes. This produces a high volume of network traffic. One option for data analysis is to leverage resources not on the client side, but on the storage side referred to as Active Storage. The original research efforts on active storage were based on a premise that modern storage architectures might include usable processing resources at the storage controller or disk; unfortunately, commodity storage has not yet reached this point. However, parallel file systems offer a similar opportunity. Because the servers used in parallel file systems often include commodity processors similar to the ones used in compute nodes, many Giga-op/s of aggregate processing power are often available in the parallel file system. As part of the SEA layer technology, our goal in the Active Storage project is to leverage these resources for data processing. Scientific applications that rely on out-of-core computation are likely candidates for application of this technique, because their data is already being moved through the file system. The Active Storage approach allows moving computations involving data stored in a parallel file system from the compute nodes to the storage nodes. Benefits of Active Storage include low network traffic, local I/O operations, and better overall performance. The SDM center has implemented Active Storage on Lustre and PVFS parallel file systems. We plan to pursue deployment of Active Storage in biology or climate application.

Figure 10

Figure 10. The Active Storage architecture.

1 sdmcenter.lbl.gov, contains extensive publication lists, with access to full papers.
2 kepler-project.org/
3 www.phy.ornl.gov/tsi/
4 nstx.pppl.gov/
5 www.llnl.gov/casc/sapphire/sapphire_home.html
6 cran.r-project.org/doc/packages/RScaLAPACK.pdf
7 sdm.lbl.gov/fastbit/
8 Stockinger, K., Shalf, J., Bethel, W., Wu, K. “DEX: Increasing the Capability of Scientific Data Analysis Pipelines by Using Efficient Bitmap Indices to Accelerate Scientific Visualization,” International conference on Scientific and Statistical Database Management (SSDBM 2005), Santa Barbara, California, USA, June 2005. Available at crd.lbl.gov/~kewu/ps/LBNL-57023.pdf
9 www.parl.clemson.edu/pvfs2
10 www.mcs.anl.gov/romio
11 www.mcs.anl.gov/parallel-netcdf

Pages: 1 2 3 4 5 6

Reference this article
Shoshani, A., Altintas, I., Choudhary, A., Critchlow, T., Kamath, C., Ludäscher, B., Nieplocha, J., Parker, S., Ross, R., Samatova, N., Vouk, M. "Scientific Data Management: Essential Technology for Accelerating Scientific Discoveries," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/scientific-data-management-essential-technology-for-accelerating-scientific-discoveries/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.