August 2006
Trends and Tools in Bioinformatics and Computational Biology
Rick Stevens, Associate Laboratory Director, Computing and Life Sciences – Argonne National Laboratory, Professor, Computer Science Department – The University of Chicago


The following table gives examples of high-impact problems that could be addressed in the next two to three years on an open access petascale platform and that leverage the methods have already been ported to the IBM BG/L platform.

Biology Problem Area @ 360 TF/s @1000 TF/s @ 5000 TF/s
Determining the detailed evolutionary history of each protein family ⇒ This will enable rational planning for structural biology initiatives and will provide a foundation for assessing protein function and diversity 3,000 hours to build reference database 300 hours to build reference database 60 hours to build reference database
Determining the frequency and detailed nature of horizontal gene transfers in prokaryotes ⇒ This will shed light on the molecular and genetic mechanisms of evolution by means other than direct “Darwinian” descent and will contribute to our understanding of the acquisition of virulence and drug resistance in pathogens and the means by which prokaryotes adapt to the environment 1,000 hours to study 200 gene families 1,000 hours to study 2000 gene families 1,000 hours to study 10,000 gene families
Automated construction of core metabolic models for all the sequenced DOE genomes ⇒ This will enable dramatic acceleration of the promise of the GTL program and the use of microbial systems to address DOE mission needs in energy, environment, and science One hour per organism, 100 hours per metagenome 10 organisms per hour, 10 hours per metagenome 50 organisms per hour, two hours per metagenome
Predict essential genes for all known sequenced micro-organisms ⇒ This will enable a broader class of genes and gene products to be targeted for potential drugs and to predict culturability conditions for environmental microbes 300 hours for 1,000 organisms
10 hours to predict culturability per organism
30 hours for 1,000 organisms, one hour to predict culturability per organism 30 hours for 5,000 organisms
Computational screening all known microbial drug targets against the public and private databases of chemical compounds to identify potential new inhibitors and potential drugs ⇒ The resulting database would be a major national biological research resource that would have a dramatic impact on worldwide health research and fundamental science of microbiology 2 M ligands per day per target (1 year to screen all microbial targets) 20 M ligands per day per target (~1 month to screen all microbial targets) 1 machine year to screen all known human drug targets
Model and simulate the precise cellulose degradation and ethanol and butanol biosynthesis pathways at the protein/ligand level to identify opportunities for molecular optimization ⇒ This would result in a set of model systems to be further developed for optimization of the production of biofuels Simulate in detail the directed evolution of individual enzymes Simulate the co-evolution and optimzation of a degradation or biosynthesis pathway of up five enzymes Simulate the optimization of a complete cellulose to ethanol or butanol production system of over a dozen enzymatic steps
Model and simulate the replication of DNA to understand the origin of and the repair mechanisms of genetic mutations ⇒ This would result in dramatic progress in the fundamental understanding of how nature manages mutations and understanding which molecular factors determine the broad range of organism susceptibility to radiation and other mutagens 30 ns simulation of DNA polymerase 10 ensembles of different DNA repair enzymes Complete polymerase mediated base pair addition step
Model and simulate the process of DNA transcription and protein translation and assembly ⇒ This would enable us to move forward on understanding post-transcription and post-translation modification and epi-genetic regulation of protein synthesis Validate current understanding of ribosomal function Explore splisosome function and the evolution of intron/exon functions Model the complete coupled processes of DNA transcription to protein translation including regulatory processes
Model and simulate the interlinked metabolisms of microbial communities ⇒ This project is relevant to understanding the biogeochemical cycles of extreme, natural and disturbed environments and will lead to the development of strategies for the production of bio-fuels and the development of new bio-engineered processes based on exploiting communities rather than individual organisms 20 organisms in a linked metabolic network 100 organisms in a linked metabolic network 200 organisms in a linked metabolic network
In silico prediction of mutations and activity, conformational changes, active site alterations One enzyme Five-enzyme pathway Eight enzyme pathway optimization

Pages: 1 2 3 4 5

Reference this article
Stevens, R. "Trends in Cyberinfrastructure for Bioinformatics and Computational Biology," CTWatch Quarterly, Volume 2, Number 3, August 2006. http://www.ctwatch.org/quarterly/articles/2006/08/trends-in-cyberinfrastructure-for-bioinformatics-and-computational-biology/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.