![]() |
|
|
|
As the Human Genome Program begins its final assault on determining the DNA sequence of all human chromosomes, an enormous, ever-increasing amount of genomic data is being generated. However, as this data is created from a large number of laboratories, it has become difficult for researchers to be able to easily manage and utilize this data. With this in mind, we began a project (eGenome) to try to bring together relevant data from many sources for a given chromosome and to present it in a manner that is easy to use and understand. eGenome creates what we call views of individual chromosomes, which can be thought of as multidimensional representations of chromosomes which encompass various perspectives: genetic, physical, functional, cytogenetic, and clinical. |
| What is eGenome? |
|
eGenome
is a sophisticated method for compiling, analyzing, and presenting information
about genomes. The result of this method is an integrated data set for
each specific chromosome. This
data set includes genomic elements; large-insert genomic clones; DNA sequences;
DNA variations; cytogenetic, genetic, and physical localizations of elements
on a chromosome; and information associated with each of these elements.
The data set resides in a relational database (CompDB), and
this data can be searched and viewed both textually and graphically.
In addition to the essential data for each genomic element that eGenome
itself contains, a large number of element-specific links to additional
information housed in other on-line databases is presented. Thus, eGenome
provides both an instantaneous summary of genes and other genomic elements,
and a comprehensive portal to additional information throughout the Internet.
This Website serves as an intermediary between the database itself and
the user. |
| Procedures |
|
The eGenome procedure consists of several linked methods: 1) compilation, 2) analysis, 3) integration, and 4) presentation. 1) COMPILATIONTo calculate comprehensive views of chromosomes, eGenome first compiles genomic data from various existing sources. There is no new data generation, only new data analysis. The current sources of data that eGenome uses include:
Although several of the data sets listed above have been placed individually into separate resources, only eGenome integrates and delivers the totality of this information via a single interface. Note that these data sets are derived from several different experimental procedures, including radiation hybrid, genetic linkage, physical, cytogenetic, and sequence-based mapping techniques. As such, each represents slightly different perspectives of a given chromosome, each with its own strengths and weaknesses. A major objective of eGenome is to smoothly integrate each of these perspectives, and thus include the strengths that each technique offers. Pooling these data sets together also has several other benefits, including achieving greater marker and clone coverage, making possible the creation of higher resolution and more highly supported maps, and allowing for systematic management of genomic information (such as keeping track of marker names). Back to procedures / Back to top 2) ANALYSIS Another goal of eGenome is to place all genomic data relative to a single unifying scale, which can be defined by the draft human genomic sequence. In addition, we triangulate localization data derived from other experimental techniques (genetic linkage, cytogenetic, and radiation hybrid analyses) relative to the sequence localizations. This procedure provides independent quality assurance that defined stretches of experimental or functional significance, such as a marker or gene, localize accurately within the genome. In addition, this allows for quick identification of elements which are discordant between independent localizations. Back to procedures / Back to top 3) INTEGRATIONOnce
sequence-based localizations were identified for most genomic elements,
we integrated these physical localizations with genetic linkage (GL),
radiation hybrid (RH), and cytogenetic-based positional information that
existed for many of these elements. For each of these additional localization
techniques, we created framework maps. Framework maps consist of a subset
of unique genomic markers whose linear order on the chromosome has been
determined with high statistical probability, usually 1,000:1 odds for
each adjacent pair. For both the RH and GL approach, a method ensuring
that the framework contains as many markers as possible (which maximizes
the overall resolution of the framework map and thus the entire view)
is used. This approach essentially builds a framework in successive rounds
of mapping, placing additional markers on the framework in each round
until no more can be placed with sufficient support. Cytogenetic frameworks
were built by correlating existing RH marker positions with experimentally
determined cytogenetic localizations. Cytogenetic bands were demarcated
based upon the distribution of known markers; cytogenetic band assignments for
all other markers were inferred based upon these predictions. For each
localization technique, the frameworks were then used to localize the
remainder of the elements relative to the appropriate framework. For example,
a polymorphic marker (X) could be placed between 2 adjacent framework
markers (A and B). This would mean that marker X is located between markers
A and B with a probability of >1,000:1. In this way, a genomic element
could have as many as four independently-derived localizations, and many
elements not able to be uniquely identified within the genomic sequence
could still be localized. Once the set of frameworks was established,
large-insert clones and sequence contigs can be easily annotated onto
the localization structure. As BAC's, PAC's and YAC's have been identified by using individual
markers as probes as well as determining their base pair positions in
sequenced clones, these clones can be directly annotated to the appropriate
elements. The same is true for SNP's and for EST clusters. Back to procedures / Back to top 4) PRESENTATIONThis Website provides a link between CompDB, which stores the information comprising a chromosome view, and you the user. The user defines a set of search criteria, and the database returns the requested information back to the user in the form of text or graphics. The best way to understand this process is to have some concept of how the data is organized in the database. Each genomic element(an RH marker, polymorphism, transcript, or bundle) has its own record, which also includes associated information such as name, primer sequences, cytolocation, associated large-insert clone, lab source, etc. For any search that finds only a single element, that element's record is translated into a web page which displays all of the information available for that element. This web page is further divided into a set of tabs, each of which displays a subset of the data pertaining to that element. The tabs correspond to specific data subcategories, such as "Position" and "Clones and Markers". These individual records also contain links to external databases, such as GenBank and UniGene, which lead directly to additional information about that specific element. For searches that identify more than one element, a summary table of all of the elements is instead shown. As with the individual record, the summary table is also divided into tabs that separate the elements by category. This summary table includes only some of the information available for each element as well as a link to each element's individual record. Both of these searches are text-based. An alternative viewing method is with a graphical return, where a search defines a region, all or a subset of element types within a region are found, and the data is translated into a graphical map of the defined region. These maps are viewed by the java applet Chromoscape. The graphics themselves can be customized by the viewer, and by clicking on an individual element within the graphic, the user can view the individual record for that element. Searches can be conducted by 3 different interfaces on this Website:
Simple text searches, where the user types information into a text
box (help) |
| How to use eGenome |
|
A description of eGenome's mission can be found in the Introduction section. Description of the actual computational process behind eGenome can be found in the Methods section. The eGenome Website consists of four sections. A complete site index can be found here. Detailed instructions for how to use eGenome are available in the help section. Navigation of the site is best accomplished using the title bar at the top of each page. Database
searches: Interfaces for extracting data from eGenome and CompDB. eGenome information: Description of the site and detailed explanations of the eGenome process. Introduction | Overview | Methods | About us | Contact us Data: Access to the raw data, numerical summaries of the chromosome view, and analysis of transcript density and cytogenetic banding patterns. Data summary | Data repository Index: Site-specific information and navigation. Site index | What's new | Acknowledgments eGenome Help:Detailed explanations of various site features. Help For more information, contact the eGenome staff.
|