ReadMe file for eGenome v2.3, June 2005

eGenome is a procedure which creates high resolution, high confidence "views" of human chromosomes. This procedure combines genetic linkage, cytogenetic, structural, and expression-based data together onto a single platform. Version 2.3 is inclusive for the entire human genome except for chromosome Y. Details regarding the methodology can be found here.

Data for version 2.3 is from RHdb version 16; UniGene build 175; UCSC sequence assembly hg17 (Build 35); WICGR human map release 12 and SNP release 1; CEPHdb v9;. UWDMB, NCI-CCAP, CHORI, and CSMC large-insert clone datasets as of 2001-08-21; Genome Database (GDB) STS dataset; UWDMB, NCI-CCAP, RPCI, CSMC, and MPIMG cytogenetic datasets as of 2000-09-11; the NCBI Clone Registry as of 2005-02-22; the NCBI UniSTS dataset as of 2004-09-30; and the NCBI dbSNP dataset build 124.

Version 2.3 consists of eight file types, named elements, element_to_alias, element_to_bundle, element_to_clone, element_to_cyto, element_to_genetic, element_to_rh, and element_to_sequence. Each file type consists of a subset of the entire eGenome data. The datasets are organized around the common set of genomic elements. For each file type, the first two columns list the eGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number) and the primary name chosen for the element (such as a D number or a gene symbol). The contents and format of each file type is described in detail below. Note that genomic element entries are not always unique within a single file; if there are multiple entries in any column field for an element, those entries may be repeated on subsequent lines, depending on the file. For example, a marker with two cytogenetic localizations will have two adjacent lines listing the marker ID and name, but with different cytogentic positions.

The eight file types have been generated on a whole genome basis, which can be found in the subdirectory allgenome/. In addition, a set of files has been generated for each specific chromosome and can be found in chrN/ (e.g. chr1/). Within the allgenome and specific chromosome directories are subdirectories for each of the eight file types. Within the file type subdirectories are .zip and .gz compressed versions of the particular file type. For example, the Windows- and Unix-compatible versions of the chromosome 7 element_to_genetic file can be found in chr7/element_to_genetic/ as element_to_genetic.zip and element_to_genetic.gz, respectively. Each file is tab-delimited. Each subdirectory also contains a text version of the file with the extension .txt.

Description of files

elements[.txt|.txt.gz|.txt.zip]

This file contains basic data associated with each genomic element.

ColumnFieldDescription
1CVEIDeGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number
2Primary_nameprimary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3Descriptiondescriptive line of text associated with elements that are known to be transcribed; source is UniGene
4Element_typetype of element, either an RH marker, an RH framework marker, a polymorphism, or other marker
5Expression_statusexpression status of the element, either transcribed, not transcribed, or unknown
6EST_clusterUniGene EST cluster ID (Hs.#####) to which the element has been assigned, if any
7SNPunused; in the next release, this will be 'true' if any sequence localization of this marker contains at least one Single Nucleotide Polymorphism
8, 9Primer1 and Primer2forward and reverse primer sequences used to PCR-amplify the element, respectively

element_to_alias[.txt|.txt.gz|.txt.zip]

This file contains all aliases and external identifiers collected for each element.

ColumnFieldDescription
1CVEIDeGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number
2Primary_name primary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm.
3Aliascomma-separated list of external identifiers representing the element, in the form Datasource:ID (e.g. GDB:D1S228).

element_to_bundle[.txt|.txt.gz|.txt.zip]

This file contains the bundle assignments and RH positions of each element grouped into a bundle.

ColumnFieldDescription
1CVEIDeGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number
2Primary_nameprimary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3BundleIDeGenome bundle internal identifier (the CVB ID, consisting of the prefix CVB followed by the bundle number)
4Bundle_nameprimary name chosen for the bundle, selected by the eGenome naming algorithm
5 and 6Max_left and Max_rightmaximum position or interval span of bundles, recorded as left and right sequence positions
7 and 8Max_cytolocation_left and Max_cytolocation_rightmaximum position or interval span of bundles, recorded as left and right sequence positions, and listed by the cytogenetic positions of the sequence positions
9 and 10Min_left and Min_rightminimum overlapping interval shared by all markers within a bundle, recorded as left and right sequence positions. Note that some bundles have no single overlapping position and are shown only with their maximum positions.
11 and 12Min_cytolocation_left and Min_cytolocation_rightminimum overlapping interval shared by all markers within a bundle, recorded as left and right sequence positions, and listed by the cytogenetic positions of the framework markers.

element_to_clone[.txt|.txt.gz|.txt.zip]

This file contains all large-insert clones associated with each element. Note that this file describes many-to-many relationships between clones and elements. Many of the elements listed have multiple clones associated with them. Therefore, each clone, clone type, clone source, clone sequence, and element sequence position assignment for an element is listed on a separate line with the identical element ID values in columns 1 and 2. For example, an element with 2 clones has two adjacent lines listing element "A" in the element ID column (column 1), the first with "clone 1" listed in column 3, and the second with "clone 2" in column 3.

ColumnFieldDescription
1CVEIDeGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number)
2Primary_nameprimary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3Clone_namename of the large-insert clone(s) reported to contain this element
4Clone_typetype of large-insert clone, such as a BAC, PAC, or YAC
5Clone_sourceprimary laboratory or group from which the clone/element assignment was derived
6Clone_sequenceGenBank sequence accession numbers for those clones whose DNA sequences have been determined
7 and 8Sequence_position_in_clone_left and Sequence_position_in_clone_rightleft and right base pair positions, respectively, that the element matches in the clone sequence

element_to_cyto[.txt|.txt.gz|.txt.zip]

This file contains all eGenome-determined and external assigned cytogenetic localizations for elements in eGenome. Many of the elements listed have multiple cytolocations associated with them.

ColumnFieldDescription
1CVEIDeGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number)
2Primary_nameprimary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3 and 4Cytolocation_left and Cytolocation_right cytogenetic band or band range determined for each element from the eGenome cytogenetic band assignment algorithm, or an experimentally determined cytogenetic band assignment
5Source'epcr' if derived from sequence position; 'RH' if derived from an RH position; 'GL' if derived from a GL position; otherwise, primary laboratory or group(s) from which an externally-derived cytogenetic/element assignment was derived
6Clonelarge-insert clone(s) used for an external cytogenetic assignment (if Source is not equal to epcr, RH, or GL)

element_to_genetic[.txt|.txt.gz|.txt.zip]

This file contains the genetic linkage map positions of all eGenome polymorphic elements.

ColumnFieldDescription
1CVEIDeGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number)
2Primary_nameprimary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3Chromosomechromosome that eGenome has assigned the element to by linkage grouping
4 and 5Rh_cvposition_left and Rh_cvposition_rightRH position or interval span of an element, recorded as left and right RH framework positions, and listed by framework marker CVE IDs
6 and 7Rh_position_left and Rh_position_rightRH position or interval span of an element, recorded as left and right RH framework positions, and listed by framework marker primary names
8 and 9cR_position_left and cR_position_rightRH position or interval span of an element, recorded as left and right RH framework positions, and listed by framework marker centiRay positions
10 and 11GL_cvposition_left and GL_cvposition_rightgenetic linkage position or interval span of an element, recorded as left and right genetic linkage framework positions, and listed by genetic linkage framework marker CVE IDs
12 and 13GL_position_left and GL_position_rightgenetic linkage position or interval span of an element, recorded as left and right genetic linkage framework positions, and listed by genetic linkage framework marker primary names
14 and 15cM_position_left and cM_position_right genetic linkage position or interval span of an element, recorded as left and right genetic linkage framework positions, and listed by framework marker centiMorgan positions

element_to_rh[.txt|.txt.gz|.txt.zip]

This file contains the radiation hybrid map positions and associated RH data of eGenome RH elements.

ColumnFieldDescription
1CVEID eGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number)
2Primary_name primary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3Rh_panelradiation hybrid panel that the RH data for the element was generated from
4RHdb_IDRadiation Hybrid database record identifier number for the element RH typing
5Rh_vectorRH typing dataset vector used by eGenome for RH mapping
6Chromosome chromosome that eGenome has assigned the element to by linkage grouping
7 and 8Rh_cvposition_left and Rh_cvposition_right RH position or interval span of an element, recorded as left and right RH framework positions, and listed by framework marker CVE IDs
9 and 10Rh_position_left and Rh_position_right RH position or interval span of an element, recorded as left and right RH framework positions, and listed by framework marker primary names
11 and 12cR_position_left and cR_position_rightRH position or interval span of an element, recorded as left and right RH framework positions, and listed by framework marker centiRay positions

element_to_sequence[.txt|.txt.gz|.txt.zip]

This file describes the relationships between eGenome elements, the sequences from which they were derived, and their positions in the human genomic sequence assemblies. Elements may have more than one sequence position, as in the case where an STS is found more than once in the assembly. Note that this file also contains the SNP's from dbSNP in addition to the main eGenome marker dataset.

ColumnFieldDescription
1CVEID eGenome genomic element internal identifier (the CVE ID, consisting of the prefix CVE followed by the element number)
2Primary_nameprimary name chosen for the element (such as a D number or a gene symbol), selected by the eGenome naming algorithm
3Source_sequencesource of sequence ('UCSC', for elements) or externally determined sequence position ('NCBI', for SNPs) used for reporting the sequence position in eGenome
4Chromosomechromosome to which the element has been assigned based upon a genomic sequence assignment
5 and 6Sequence_position_left and Sequence_position_rightleft and right base pair positions that the element matches in the UCSC sequence assembly. SNP elements have just a single sequence position; for such elements, column 6 is empty.