|
With the advent of the Human Genome Project (HGP), a complete decoding
of the essence of a cell's genetic makeup is within our reach. A
major challenge of the HGP is to systematically manage the vast
array of genomic data generated in a way that maximizes its use
for the biomedical research community. This data, which has a large
number of sources, is multifaceted, including genomic landmarks
(DNA markers, map positions, and cytogenetic localizations), structural
elements (large-insert clones and DNA sequence), and functional
elements (genes, polymorphisms, and expression data). Inherent in
any attempt to globally manage and deliver genomic information are
3 key challenges: 1) the compilation, curation, and management of
a comprehensive and representative set of primary data; 2) a method
to interrelate the data together relative to a common framework;
and 3) the ability to deliver the information to users in a manner
that is efficient, seamless, easily presentable, and simple to comprehend.
The
enormity and complexity of human genomic and functional genomic
information precludes the ability of a single database or information
source to completely manage its entirety. Most currently available
genomic information sources thus present an in-depth look at a subset
of data that is restricted in some way. For example, a website centered
around genomic sequence may not adequately represent expressed elements.
However, different biomedical disciplines conceptualize genomes
in a myriad of ways, and entry into genomic data varies dramatically
depending upon the circumstances. Thus, a clinical geneticist looking
for a disease locus implicated in a complex genetic trait will approach
genomic information in a very different way than a cancer researcher
characterizing a chromosomal abnormality or an evolutionary geneticist
searching for shared orthologs of a particular gene. Given the multidimensionality
both of a genome and approaches to its understanding, seamless integration
of the underlying information is critical for proper presentation.
The
eGenome project was started with a single objective-to provide a
genomic resource that is simultaneously comprehensive and facile.
eGenome's approach is to present all publicly available genomic
information in two stages. The first stage represents critical,
summarized information regarding a genomic feature or region. The
second stage represents direct access to more in-depth information
about the feature of interest. To do so, we have identified a large
number of data sets that are available to biomedical researchers,
integrated them relative to chromosomal position, managed them by
cataloguing their critical components, and directly linked this
information to external databases available elsewhere on the Internet.
As an example, for a polymorphic marker that flanks a disease locus,
eGenome can deliver the chromosomal localization of the marker,
the DNA sequence, the corresponding cytogenetic position, a representative
large-insert clone, juxtaposed markers and genes, primer sequences,
and any corresponding expression data. In addition, eGenome creates
a virtual data portal that provides direct website links to comprehensive
information regarding this polymorphism. Over 50 additional databases
are currently linked to eGenome. The information in eGenome can
be accessed and viewed in a number of ways, including direct text
searches, position-based searches, or graphically. As additional
genomic and post-genomic data sets are made available, they will
be made accessible through eGenome. Our hope is to enable eGenome
to serve as a genomic catalog that identifies and delivers the most
relevant information to biomedical researchers in the fastest and
most comprehensible manner possible.
For
more information, the eGenome methodology is described in greater
detail here, and in publication form here.
A summary of the data in eGenome can be found here.
|