August 2006
Trends and Tools in Bioinformatics and Computational Biology
Rick Stevens, Associate Laboratory Director, Computing and Life Sciences – Argonne National Laboratory, Professor, Computer Science Department – The University of Chicago

What are some notable accomplishments in applying CI to biology research?

There are a handful of systems that have fundamentally changed how biologists work. The most important has been the system developed by the National Center for Biotechnology Information1 including Entrez, which is a search engine (google like) that supports searching across many types of biological data. There are similar systems like this in Europe2 and Japan.3 These systems and systems like them have provided the global community access to sequence data (starting out as outgrowths from genome and protein sequence databases) and more recently to publications, annotations, linkage maps, expression data, phylogeny data, metabolic pathways, regulatory and signally data, compounds and molecular structures. Search techniques have expanded from keywords to computed properties (sequence similarity, and more generally “associations”) that enable one to find connections between biological or chemical entities. While these systems have enormous user bases and require considerable computing capabilities for indexing and integration, they are essentially client/server in nature, and the computing that an end user can request is closely controlled.

Approximately a decade ago a number of groups began to produce more flexible tools that support a more unstructured workflow, enabling the user to construct their own mini-environment to pursue computational approaches to problems. One of the first such systems was the Biology Workbench developed at the University of Illinois and now hosted at the University of California, San Diego.4 Other systems were developed to provide access to a specific type of data (e.g. microbial genomes) in well engineering data integrations. These systems are often associated with teams of curators. Three are particularly important: the Institute for Genomic Research’s Comprehensive Microbial Resource;5 the SEED, an annotation system developed by the Fellowship for the Interpretation of Genomes at the University of Chicago;6 and the DOE’s Joint Genome Institute’s Integrated Microbial Genomes resource.7 These systems provide the user with an integrated view of hundreds of genomes and provide a rich environment for discovery.

Are there some good road mapping documents available?

In the past couple of years there have been several worthwhile road-mapping documents written by the community. These reports in general attempt to identify the trends in the field and provide some structure for understanding directions. The first is a report from the NSF committee for building a cyberinfrastructure for the biological sciences;8 the second is the National Academy of Sciences Report on computing and biology.9 The third report is more oriented towards systems biology and is a program roadmap document developed by the DOE for their Genomes to Life program,10 which contains a section on computing and infrastructure to support the building out of systems biology, focused on microbial organisms, energy, and the environment. All three documents are worth reading to gain an understanding of where the field is going.

Pages: 1 2 3 4 5

Reference this article
Stevens, R. "Trends in Cyberinfrastructure for Bioinformatics and Computational Biology," CTWatch Quarterly, Volume 2, Number 3, August 2006. http://www.ctwatch.org/quarterly/articles/2006/08/trends-in-cyberinfrastructure-for-bioinformatics-and-computational-biology/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.