With the advent of the Human Genome Project (HGP), a complete decoding of the essence of a cell's genetic makeup is within our reach. A major challenge of the HGP is to systematically manage the vast array of genomic data generated in a way that maximizes its use for the biomedical research community. This data, which has a large number of sources, is multifaceted, including genomic landmarks (DNA markers, map positions, and cytogenetic localizations), structural elements (large-insert clones and DNA sequence), and functional elements (genes, polymorphisms, and expression data). Inherent in any attempt to globally manage and deliver genomic information are 3 key challenges: 1) the compilation, curation, and management of a comprehensive and representative set of primary data; 2) a method to interrelate the data together relative to a common framework; and 3) the ability to deliver the information to users in a manner that is efficient, seamless, easily presentable, and simple to comprehend.

The enormity and complexity of human genomic and functional genomic information precludes the ability of a single database or information source to completely manage its entirety. Most currently available genomic information sources thus present an in-depth look at a subset of data that is restricted in some way. For example, a website centered around genomic sequence may not adequately represent expressed elements. However, different biomedical disciplines conceptualize genomes in a myriad of ways, and entry into genomic data varies dramatically depending upon the circumstances. Thus, a clinical geneticist looking for a disease locus implicated in a complex genetic trait will approach genomic information in a very different way than a cancer researcher characterizing a chromosomal abnormality or an evolutionary geneticist searching for shared orthologs of a particular gene. Given the multidimensionality both of a genome and approaches to its understanding, seamless integration of the underlying information is critical for proper presentation.

The eGenome project was started with a single objective-to provide a genomic resource that is simultaneously comprehensive and facile. eGenome's approach is to present all publicly available genomic information in two stages. The first stage represents critical, summarized information regarding a genomic feature or region. The second stage represents direct access to more in-depth information about the feature of interest. To do so, we have identified a large number of data sets that are available to biomedical researchers, integrated them relative to chromosomal position, managed them by cataloguing their critical components, and directly linked this information to external databases available elsewhere on the Internet. As an example, for a polymorphic marker that flanks a disease locus, eGenome can deliver the chromosomal localization of the marker, the DNA sequence, the corresponding cytogenetic position, a representative large-insert clone, juxtaposed markers and genes, primer sequences, and any corresponding expression data. In addition, eGenome creates a virtual data portal that provides direct website links to comprehensive information regarding this polymorphism. Over 50 additional databases are currently linked to eGenome. The information in eGenome can be accessed and viewed in a number of ways, including direct text searches, position-based searches, or graphically. As additional genomic and post-genomic data sets are made available, they will be made accessible through eGenome. Our hope is to enable eGenome to serve as a genomic catalog that identifies and delivers the most relevant information to biomedical researchers in the fastest and most comprehensible manner possible.

For more information, the eGenome methodology is described in greater detail here, and in publication form here. A summary of the data in eGenome can be found here.

 

Back to top

Except as otherwise indicated, Copyright 2005, The Children's Hospital of Philadelphia