Data for version 1.00 is from RHdb version 16; UniGene build 146; UCSC
sequence assembly 04-01; WICGR human map release 12 and SNP release
1; CEPHdb v9;. UWDMB, NCI-CCAP, CHORI, and CSMC large-insert clone data
sets as of 8-21-01; Genome Database, UWDMB, NCI-CCAP, RPCI, CSMC, and
MPIMG cytogenetic data sets as of 11-9-00; and the 11-06-01 NCBI Clone
Registry release.
Version
1.00 consists of eight file types, named elements, element_to_alias,
element_to_bundle, element_to_clone, element_to_cyto, element_to_genetic,
element_to_rh, and element_to_sequence. Each file type consists of a
subset of the entire eGenome data. The data sets are organized around
the common set of genomic elements. For each file type, the first two
columns list the eGenome genomic element internal identifier (the CVE
ID, consisting of the prefix CVE followed by the element number) and
the primary name chosen for the element (such as a D number or a gene
symbol). The contents and format of each file type is described in detail
below. Note that genomic element entries are not always unique within
a single file; if there are multiple entries in any column field for
an element, those entries are repeated on subsequent lines. For example,
a marker with two cytogenetic localizations will have two adjacent lines
listing the marker ID and name, but with different cytogenetic positions.
The
eight file types have been generated on a whole genome basis, which
can be found in the subdirectory allgenome/. In addition, a set of files
has been generated for each specific chromosome and can be found in
chrN/ (e.g. chr1/). Within the allgenome and specific chromosome directories
are subdirectories for each of the eight file types. Within the file
type subdirectories are .zip, .hqx, and .tgz compressed versions of
the particular file type. For example, the PC, Macintosh, and Unix-compatible
versions of the chromosome 7 element_to_genetic file can be found in
chr7/element_to_genetic/ as element_to_genetic.zip, element_to_genetic.hqx,
and element_to_genetic.tar, respectively. Each file is tab-delimited.
Each subdirectory also contains a text version of the file as .txt.
Description
of files
elements
Tab-delimited file. Basic data associated with each genomic element.
Column 1 (CVEID) is the eGenome genomic element internal identifier
(the CVE ID, consisting of the prefix CVE followed by the element number).
Column 2 (Primary_name) is the primary name chosen for the element (such
as a D number or a gene symbol), selected by the eGenome naming algorithm.
Column 3 (Description) is the descriptive line of text associated with
elements that are known to be transcribed; source is UniGene. Column
4 (Element_type) is the type of element, either an RH marker, an RH
framework marker, or a polymorphism. Column 5 (Expression_status) is
the expression status of the element, either transcribed, not transcribed,
or unknown. Column 6 (EST_cluster) is the UniGene EST cluster ID (Hs.#####)
to which the element has been assigned, if any. Column 7 (SNP) lists
one or more single nucleotide polymorphisms associated with the element.
Columns 8 and 9 (Primer1 and Primer2) list the forward and reverse primer
sequences used to PCR-amplify the element, respectively.
element_to_alias
Tab-delimited file. List of all aliases and external identifiers collected
for each element. Column 1 (CVEID) is the eGenome genomic element internal
identifier (the CVE ID, consisting of the prefix CVE followed by the
element number). Column 2 (Primary_name) is the primary name chosen
for the element (such as a D number or a gene symbol), selected by the
eGenome naming algorithm. Column 3 (Alias) lists an external identifier representing the element, in the form Datasource:ID (e.g. GDB:D1S228). Note that each identifier for an element is entered as a separate line in this table
.
element_to_bundle
Tab-delimited file. Description of the bundle assignments and RH positions
of each element grouped into a bundle. Column 1 (CVEID) is the eGenome
genomic element internal identifier (the CVE ID, consisting of the prefix
CVE followed by the element number). Column 2 (Primary_name) is the
primary name chosen for the element (such as a D number or a gene symbol),
selected by the eGenome naming algorithm. Column 3 (BundleID) is the
eGenome bundle internal identifier (the CVB ID, consisting of the prefix
CVB followed by the bundle number). Column 4 (Bundle_name) is the primary
name chosen for the bundle, selected by the eGenome naming algorithm.
Columns 5 and 6 (Cvmax_left and Cvmax_right) specify the maximum position
or interval span of bundles, recorded as left and right RH framework
positions, and listed by framework marker CVE IDs. Columns 7 and 8 (Max_left
and Max_right) specify the maximum position or interval span of bundles,
recorded as left and right RH framework positions, and listed by framework
marker primary names. Columns 9 and 10 (Max_cR_left and Max_cR_right)
specify the maximum position or interval span of bundles, recorded as
left and right RH framework positions, and listed by framework marker
centiRay positions. Columns 11 and 12 (Max_cytolocation_left and Max_cytolocation_right)
specify the maximum position or interval span of bundles, recorded as
left and right RH framework positions, and listed by the cytogenetic
positions of the framework markers. Columns 13 and 14 (Cvmin_left and
Cvmin_right) specify the minimum overlapping interval shared by all
markers within a bundle, recorded as left and right RH framework positions,
and listed by framework marker CVE IDs. Columns 15 and 16 (Min_left
and Min_right) specify the minimum overlapping interval shared by all
markers within a bundle, recorded as left and right RH framework positions,
and listed by framework marker primary names. Columns 17 and 18 (Min_cR_left
and Min_cR_right) specify the minimum overlapping interval shared by
all markers within a bundle, recorded as left and right RH framework
positions, and listed by framework marker centiRay positions. Columns
19 and 20 (Min_cytolocation_left and Min_cytolocation_right) specify
the minimum overlapping interval shared by all markers within a bundle,
recorded as left and right RH framework positions, and listed by the
cytogenetic positions of the framework markers. Note that some bundles
have no single overlapping position and are shown only with their maximum
positions.
element_to_clone
Tab-delimited file. Listing of all large-insert clones associated with
each element. Note that this file describes many-to-many relationships
between clones and elements. Many of the elements listed have multiple
clones associated with them. Therefore, each clone, clone type, clone
source, clone sequence, and element sequence position assignment for
an element is listed on a separate line with the identical element ID
values in columns 1 and 2. For example, an element with 2 clones has
two adjacent lines listing element "A" in the element ID column (column
1), the first with "clone 1" listed in column 3, and the second with
"clone 2" in column 3. Column 1 (CVEID) is the eGenome genomic element
internal identifier (the CVE ID, consisting of the prefix CVE followed
by the element number). Column 2 (Primary_name) is the primary name
chosen for the element (such as a D number or a gene symbol), selected
by the eGenome naming algorithm. Column 3 (Clone_name) is the name of
the large-insert clone(s) reported to contain this element. Column 4
(Clone_type) lists the type of large-insert clone, such as a BAC, PAC,
or YAC. Column 5 (Clone_source) lists the primary laboratory or group
from which the clone/element assignment was derived. Column 6 (Clone_sequence)
lists the GenBank sequence accession numbers for those clones whose
DNA sequences have been determined. Columns 7 and 8 (Sequence_position_in_clone_left
and Sequence_position_in_clone_right) list the left and right base pair
positions that the element matches in the clone sequence.
element_to_cyto
Tab-delimited file. Listing of all eGenome-determined and external assigned
cytogenetic localizations for elements in eGenome. Note that this file
describes many-to-many relationships between clones and elements. Many
of the elements listed have multiple clones associated with them. Therefore,
each clone, clone type, clone source, clone sequence, and element sequence
position assignment for an element is listed on a separate line with
the identical element ID values in columns 1 and 2. For example, an
element with 2 external cytogenetic positions has two adjacent lines
listing element "A" in the element ID column (column 1), the first with
"left position 1" listed in column 7, and the second with "left position
2" in column 7. Column 1 (CVEID) is the eGenome genomic element internal
identifier (the CVE ID, consisting of the prefix CVE followed by the
element number). Column 2 (Primary_name) is the primary name chosen
for the element (such as a D number or a gene symbol), selected by the
eGenome naming algorithm. Columns 3 and 4 (Cytolocation_left and Cytolocation_right)
list the cytogenetic band or band range determined for each element
from the eGenome cytogenetic band assignment algorithm. Column 5 (Other_cytolocation_source)
lists the primary laboratory or group(s) from which an externally-derived
cytogenetic/element assignment was derived. Column 6 (Other_cytolocation_clone)
lists the large-insert clone(s) that was used for an external cytogenetic
assignment. Columns 7 and 8 (Other_cytolocation_left and Other_cytolocation_right)
list the cytogenetic band or band range(s) determined for each element
from an external cytogenetic assignment.
element_to_genetic
Tab-delimited file. Describes the genetic linkage map positions of all
eGenome polymorphic elements. Column 1 (CVEID) is the eGenome genomic
element internal identifier (the CVE ID, consisting of the prefix CVE
followed by the element number). Column 2 (Primary_name) is the primary
name chosen for the element (such as a D number or a gene symbol), selected
by the eGenome naming algorithm. Column 3 (Chromosome) lists the chromosome
that eGenome has assigned the element to by linkage grouping. Columns
4 and 5 (Rh_cvposition_left and Rh_cvposition_right) specify the RH
position or interval span of an element, recorded as left and right
RH framework positions, and listed by framework marker CVE IDs. Columns
6 and 7 (Rh_position_left and Rh_position_right) specify the RH position
or interval span of an element, recorded as left and right RH framework
positions, and listed by framework marker primary names. Columns 8 and
9 (cR_position_left and cR_position_right) specify the RH position or
interval span of an element, recorded as left and right RH framework
positions, and listed by framework marker centiRay positions. Columns
10 and 11 (GL_cvposition_left and GL_cvposition_right) specify the genetic
linkage position or interval span of an element, recorded as left and
right genetic linkage framework positions, and listed by genetic linkage
framework marker CVE IDs. Columns 12 and 13 (GL_position_left and GL_position_right)
specify the genetic linkage position or interval span of an element,
recorded as left and right genetic linkage framework positions, and
listed by genetic linkage framework marker primary names. Columns 14
and 15 (cM_position_left and cM_position_right) specify the genetic
linkage position or interval span of an element, recorded as left and
right genetic linkage framework positions, and listed by framework marker
centiMorgan positions.
element_to_rh
Tab-delimited file. Describes the radiation hybrid map positions and
associated RH data of eGenome RH elements. Note that for chromosomes
with complete sequences, the RH positions of only those elements that
could not be identified in the genomic sequence have been calculated,
and the element_to_rh files for these chromosomes will list only these
elements. Column 1 (CVEID) is the eGenome genomic element internal identifier
(the CVE ID, consisting of the prefix CVE followed by the element number).
Column 2 (Primary_name) is the primary name chosen for the element (such
as a D number or a gene symbol), selected by the eGenome naming algorithm.
Column 3 (Rh_panel) is the radiation hybrid panel that the RH data for
the element was generated from. Column 4 (RHdb_ID) is the Radiation
Hybrid database record identifier number for the element RH typing.
Column 5 (Rh_vector) is the RH typing data set (vector) used by eGenome
for RH mapping. Column 6 (Chromosome) lists the chromosome that eGenome
has assigned the element to by linkage grouping. Columns 7 and 8 (Rh_cvposition_left
and Rh_cvposition_right) specify the RH position or interval span of
an element, recorded as left and right RH framework positions, and listed
by framework marker CVE IDs. Columns 9 and 10 (Rh_position_left and
Rh_position_right) specify the RH position or interval span of an element,
recorded as left and right RH framework positions, and listed by framework
marker primary names. Columns 11 and 12 (cR_position_left and cR_position_right)
specify the RH position or interval span of an element, recorded as
left and right RH framework positions, and listed by framework marker
centiRay positions.
element_to_sequence
Tab-delimited file. Contains relationships between eGenome elements,
the sequences from which they were derived, and their positions in the
human genomic sequence assemblies. Column 1 (CVEID) is the eGenome genomic
element internal identifier (the CVE ID, consisting of the prefix CVE
followed by the element number). Column 2 (Primary_name) is the primary
name chosen for the element (such as a D number or a gene symbol), selected
by the eGenome naming algorithm. Column 3 (Source_sequence) lists the
GenBank sequence accession number(s) from which the element was created.
Column 4 (Chromosome) lists the chromosome to which the element has
been assigned based upon a genomic sequence assignment. Columns 5 and
6 (Sequence_position_left and Sequence_position_right) list the left
and right base pair positions that the element matches in the UCSC sequence
assembly.