August 2006
Trends and Tools in Bioinformatics and Computational Biology
Folker Meyer, Argonne National Laboratory


In Figure 1, the reason for developing new algorithms and looking for more computation power becomes apparent: we cannot generate annotations fast enough as the speed of sequencing accelerates. So applying new bioinformatics techniques as well as high throughput computing provides a much-needed means of reducing the growing gap between the number of sequences and annotations. Today we are clearly limited in our ability to generate annotations fast enough.

This limitation is currently of interest to people working in basic science, with the advent of more and more complete genomic sequences for crop-plants, pathogens and ultimately individual human beings. The demand for precise and fast bioinformatics analysis of genomes, not only from bacteria but also plant and human, is going to grow fast.

Figure 1

Figure 1. Using a logarithmic scale, the growth of sequence databases and annotations. Numbers are taken from the respective database release notes of the databases Genbank(NCBI) and Swissprot(EBI).

As daunting as our limited ability to generate annotations seems, we have so far only discussed a fraction of the challenges posed by biology. Annotations cover only the static components of the genome. They are a description of the gene load.

Ever since we have learned that the human genome contains relatively few genes (estimates are changing but all are below 50,000) it has become clear that the dynamics of gene expression and regulation thereof hold the key to understanding the organisms in question.

As long as we are unable to fully enumerate, let alone describe, the functional elements in the respective genomes, we are a long way from understanding the full complexity hidden in the static and the dynamic components of the genome. Cyber Technologies will play a key role in furthering our understanding in understanding the data that we are currently amassing. While currently important insights into the respective organism’s lifestyles can be obtained from studying the dynamic components of life (gene expression and regulation), we are at the beginning of another data deluge. The NCBI presents, as part of their training material, a comparison of the growth of sequence and gene expression data,7 highlighting the fact that both are growing dramatically.

The analysis of the growing volume of gene expression data becoming available from the various post-genomics technologies will present an even greater challenge than the annotation problem we are faced with right now. A single gene expression experiment can generate data for thousands of genes at a time, while gene expression studies have the potential of helping us understand annotation much better. Initially, we are faced with more data that not only needs integration with the annotations but also exceeds the annotations in volume and complexity.

While we are currently faced with the problem of generating annotations for the sequences we are producing, the next steps are already well defined, and it is clear that there is serious need for computational support of biology that in turn will require large-scale computation. Biology is in the middle of a paradigm shift towards becoming a fully data driven science.

1 en.wikipedia.org/wiki/Metabolic_network_reconstruction_and_simulation
2 Catlett, C., Beckman, P., Skow, D., Foster, I. Creating and Operating National-Scale Cyberinfrastructure Services. CTWatch Quarterly, 2(2), May 2006. www.ctwatch.org/quarterly/articles/2006/05/creating-and-operating-nati...
3 GOLD – Genomes Online Database - www.genomesonline.org/
4 National Center for Biotechnology Information - www.ncbi.nlm.nih.gov/
5 www.ncbi.nlm.nih.gov/BLAST/
6 TeraGrid - www.teragrid.org/
7 www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Expression/exp45.html

Pages: 1 2

Reference this article
Meyer, F. "Genome Sequencing vs. Moore's Law: Cyber Challenges for the Next Decade," CTWatch Quarterly, Volume 2, Number 3, August 2006. http://www.ctwatch.org/quarterly/articles/2006/08/genome-sequencing-vs-moores-law/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.