ReadMe file for eGenome v2.3, June 2005
eGenome is a procedure which creates high resolution, high confidence
"views" of human chromosomes. This procedure combines genetic linkage,
cytogenetic, structural, and expression-based data together onto a
single platform. Version 2.3 is inclusive for the entire human
genome except for chromosome Y. Details regarding the methodology can be found
here.
Data for version 2.3 is from RHdb version 16; UniGene build 175;
UCSC sequence assembly hg17 (Build 35); WICGR human map
release 12 and SNP release 1; CEPHdb v9;. UWDMB, NCI-CCAP, CHORI, and
CSMC large-insert clone datasets as of 2001-08-21; Genome
Database (GDB) STS dataset; UWDMB, NCI-CCAP, RPCI, CSMC, and MPIMG
cytogenetic datasets as of 2000-09-11; the NCBI Clone
Registry as of 2005-02-22; the NCBI UniSTS dataset as of 2004-09-30; and the
NCBI dbSNP dataset build 124.
Version 2.3 consists of eight file types, named elements,
element_to_alias, element_to_bundle, element_to_clone,
element_to_cyto, element_to_genetic, element_to_rh, and
element_to_sequence. Each file type consists of a subset of the entire
eGenome data. The datasets are organized around the common set of
genomic elements. For each file type, the first two columns list the
eGenome genomic element internal identifier (the CVE ID, consisting of
the prefix CVE followed by the element number) and the primary name
chosen for the element (such as a D number or a gene symbol). The
contents and format of each file type is described in detail
below. Note that genomic element entries are not always unique within
a single file; if there are multiple entries in any column field for
an element, those entries may be repeated on subsequent lines,
depending on the file. For example, a marker with two cytogenetic
localizations will have two adjacent lines listing the marker ID and
name, but with different cytogentic positions.
The eight file types have been generated on a whole genome basis,
which can be found in the subdirectory allgenome/. In addition, a set
of files has been generated for each specific chromosome and can be
found in chrN/ (e.g. chr1/). Within the allgenome and specific
chromosome directories are subdirectories for each of the eight file
types. Within the file type subdirectories are .zip and .gz
compressed versions of the particular file type. For example, the Windows-
and Unix-compatible versions of the chromosome 7
element_to_genetic file can be found in chr7/element_to_genetic/ as
element_to_genetic.zip and
element_to_genetic.gz, respectively. Each file is tab-delimited. Each
subdirectory also contains a text version of the file with the extension .txt.
Description of files
elements[.txt|.txt.gz|.txt.zip]
This file contains basic data associated with each genomic
element.
| Column | Field | Description |
| 1 | CVEID | eGenome genomic element internal
identifier (the CVE ID, consisting of the prefix CVE followed by the
element number |
| 2 | Primary_name | primary name chosen
for the element (such as a D number or a gene symbol), selected by the
eGenome naming algorithm |
| 3 | Description | descriptive
line of text associated with elements that are known to be
transcribed; source is UniGene |
| 4 | Element_type | type of
element, either an RH marker, an RH framework marker, a
polymorphism, or other marker |
| 5 | Expression_status | expression status of
the element, either transcribed, not transcribed, or unknown
| | 6 | EST_cluster | UniGene EST cluster ID (Hs.#####) to which the
element has been assigned, if any |
| 7 | SNP | unused; in the next release, this will
be 'true' if any sequence localization of this marker contains at least one
Single Nucleotide Polymorphism |
| 8, 9 | Primer1 and Primer2 | forward and reverse primer
sequences used to PCR-amplify the element, respectively |
element_to_alias[.txt|.txt.gz|.txt.zip]
This file contains all aliases and external identifiers
collected for each element.
| Column | Field | Description |
| 1 | CVEID | eGenome genomic
element internal identifier (the CVE ID, consisting of the prefix CVE
followed by the element number |
| 2 | Primary_name |
primary name chosen for the element (such as a D number or a gene
symbol), selected by the eGenome naming algorithm. |
| 3 | Alias | comma-separated list of external
identifiers representing the element, in the form Datasource:ID
(e.g. GDB:D1S228). |
element_to_bundle[.txt|.txt.gz|.txt.zip]
This file contains the bundle assignments and RH
positions of each element grouped into a bundle.
| Column | Field | Description |
| 1 | CVEID | eGenome genomic element internal identifier (the CVE ID,
consisting of the prefix CVE followed by the element number |
| 2 | Primary_name | primary name chosen for the element (such as a D
number or a gene symbol), selected by the eGenome naming
algorithm |
| 3 | BundleID | eGenome bundle internal
identifier (the CVB ID, consisting of the prefix CVB followed by the
bundle number) |
| 4 | Bundle_name | primary name chosen for
the bundle, selected by the eGenome naming algorithm |
| 5 and 6 | Max_left and
Max_right | maximum position or interval span of bundles,
recorded as left and right sequence positions |
| 7 and 8 | Max_cytolocation_left and Max_cytolocation_right | maximum
position or interval span of bundles, recorded as left and right sequence
positions, and listed by the cytogenetic positions of the
sequence positions |
| 9 and 10 | Min_left and
Min_right | minimum overlapping interval shared by all
markers within a bundle, recorded as left and right sequence
positions. Note that some bundles have no single overlapping
position and are shown only with their maximum positions. |
| 11 and 12 | Min_cytolocation_left and
Min_cytolocation_right | minimum overlapping interval
shared by all markers within a bundle, recorded as left and right
sequence positions, and listed by the cytogenetic positions of the
framework markers. |
element_to_clone[.txt|.txt.gz|.txt.zip]
This file contains all large-insert clones associated with
each element. Note that this file describes many-to-many relationships
between clones and elements. Many of the elements listed have multiple
clones associated with them. Therefore, each clone, clone type, clone
source, clone sequence, and element sequence position assignment for
an element is listed on a separate line with the identical element ID
values in columns 1 and 2. For example, an element with 2 clones has
two adjacent lines listing element "A" in the element ID column
(column 1), the first with "clone 1" listed in column 3, and the
second with "clone 2" in column 3.
| Column | Field | Description |
| 1 | CVEID | eGenome
genomic element internal identifier (the CVE ID, consisting of the
prefix CVE followed by the element number) |
| 2 | Primary_name | primary name chosen for the
element (such as a D number or a gene symbol), selected by the eGenome
naming algorithm |
| 3 | Clone_name | name of the large-insert clone(s) reported to
contain this element |
| 4 | Clone_type | type of
large-insert clone, such as a BAC, PAC, or YAC |
| 5 | Clone_source | primary laboratory or group from which the
clone/element assignment was derived |
| 6 | Clone_sequence | GenBank sequence accession
numbers for those clones whose DNA sequences have been determined |
| 7 and 8 | Sequence_position_in_clone_left and
Sequence_position_in_clone_right | left and right base pair
positions, respectively, that the element matches in the clone
sequence |
element_to_cyto[.txt|.txt.gz|.txt.zip]
This file contains all eGenome-determined and external
assigned cytogenetic localizations for elements in eGenome.
Many of the elements listed have multiple cytolocations associated
with them.
| Column | Field | Description |
| 1 | CVEID | eGenome genomic element internal
identifier (the CVE ID, consisting of the prefix CVE followed by the
element number) |
| 2 | Primary_name | primary name chosen for the element (such as a D
number or a gene symbol), selected by the eGenome naming
algorithm |
| 3 and 4 | Cytolocation_left and
Cytolocation_right | cytogenetic band or band range determined for
each element from the eGenome cytogenetic band assignment algorithm,
or an experimentally determined cytogenetic band
assignment |
| 5 | Source | 'epcr' if derived from sequence
position; 'RH' if derived from an RH position; 'GL' if derived from a
GL position; otherwise, primary laboratory or group(s) from which an
externally-derived cytogenetic/element assignment was derived |
| 6 | Clone | large-insert clone(s) used for an
external cytogenetic assignment (if Source is not equal to
epcr, RH, or GL) |
element_to_genetic[.txt|.txt.gz|.txt.zip]
This file contains the genetic linkage map positions of all
eGenome polymorphic elements.
| Column | Field | Description |
| 1 | CVEID | eGenome genomic
element internal identifier (the CVE ID, consisting of the prefix CVE
followed by the element number) |
| 2 | Primary_name | primary name chosen for the
element (such as a D number or a gene symbol), selected by the eGenome
naming algorithm |
| 3 | Chromosome | chromosome that eGenome has assigned the
element to by linkage grouping |
| 4 and 5 | Rh_cvposition_left
and Rh_cvposition_right | RH position or interval span of
an element, recorded as left and right RH framework positions, and
listed by framework marker CVE IDs |
| 6 and 7 | Rh_position_left
and Rh_position_right | RH position or interval span of an
element, recorded as left and right RH framework positions, and listed
by framework marker primary names |
| 8 and 9 | cR_position_left
and cR_position_right | RH position or interval span of an
element, recorded as left and right RH framework positions, and listed
by framework marker centiRay positions |
| 10 and 11 | GL_cvposition_left and GL_cvposition_right | genetic
linkage position or interval span of an element, recorded as left and
right genetic linkage framework positions, and listed by genetic
linkage framework marker CVE IDs |
| 12 and 13 | GL_position_left
and GL_position_right | genetic linkage position or
interval span of an element, recorded as left and right genetic
linkage framework positions, and listed by genetic linkage framework
marker primary names |
| 14 and 15 | cM_position_left and
cM_position_right | genetic linkage position or interval
span of an element, recorded as left and right genetic linkage
framework positions, and listed by framework marker centiMorgan
positions |
element_to_rh[.txt|.txt.gz|.txt.zip]
This file contains the radiation hybrid map positions and
associated RH data of eGenome RH elements.
| Column | Field | Description |
| 1 | CVEID | eGenome genomic element internal
identifier (the CVE ID, consisting of the prefix CVE followed by the
element number) |
| 2 | Primary_name | primary name chosen
for the element (such as a D number or a gene symbol), selected by the
eGenome naming algorithm |
| 3 | Rh_panel | radiation hybrid
panel that the RH data for the element was generated from |
| 4 | RHdb_ID | Radiation Hybrid database record
identifier number for the element RH typing |
| 5 | Rh_vector | RH typing
dataset vector used by eGenome for RH mapping |
| 6 | Chromosome |
chromosome that eGenome has assigned the element to by
linkage grouping |
| 7 and 8 | Rh_cvposition_left and
Rh_cvposition_right | RH position or interval span of an
element, recorded as left and right RH framework positions, and listed
by framework marker CVE IDs |
| 9 and 10 | Rh_position_left and
Rh_position_right | RH position or interval span of an
element, recorded as left and right RH framework positions, and listed
by framework marker primary names |
| 11 and 12 | cR_position_left
and cR_position_right | RH position or interval span of an
element, recorded as left and right RH framework positions, and listed
by framework marker centiRay positions |
element_to_sequence[.txt|.txt.gz|.txt.zip]
This file describes the relationships between eGenome elements, the
sequences from which they were derived, and their positions in the
human genomic sequence assemblies. Elements may have more than one
sequence position, as in the case where an STS is found more than once
in the assembly. Note that this file also contains the SNP's from
dbSNP in addition to the main eGenome marker dataset.
| Column | Field | Description |
| 1 | CVEID | eGenome
genomic element internal identifier (the CVE ID, consisting of the
prefix CVE followed by the element number) |
| 2 | Primary_name | primary name chosen for the
element (such as a D number or a gene symbol), selected by the eGenome
naming algorithm |
| 3 | Source_sequence | source of sequence ('UCSC',
for elements) or externally determined sequence position ('NCBI', for
SNPs) used for reporting the sequence position in eGenome
|
| 4 | Chromosome | chromosome to which the element
has been assigned based upon a genomic sequence assignment |
| 5 and 6 | Sequence_position_left and
Sequence_position_right | left and right base pair positions that the
element matches in the UCSC sequence assembly. SNP elements have just
a single sequence position; for such elements, column 6 is empty. |
|