Supplementary Information: Human Chromosome 13
[HOME] [SUMBISSION FORM] [CONTACT] [TEAM] [UPDATES] [HELP] [RESULTS]
DNAHome Page


Predictions are available only in tabular format

  1. Combination of Genscan and similarity-based approach
  2. Combination of HMMgene and similarity-based approach


Annotations from the public domain are available here for Forward and Reverse strand.


Introduction:

In order to demonstrate the capability of EGPred we analyzed the partial human chromosome 13 that has been recently sequenced (Dunham et al., 2004). Human chromosome 13 is the largest acrocentric human chromosome and is estimated to contain 633 genes with a total of 4266 exons excluding those from the pseudogenes (Dunham et al., 2004). An initial analysis has been performed as described in Methods. A total of 96175021 bp was analyzed in a region from 17918001 to 114093021 bp to predict genes.

The genes were predicted using two different combinations implemented in EGPred for Genscan and HMMgene methods. Application of these two strategies on human chromosome 13 produced four sets of putative genes (two for each strand) available above and which is summarized in Table below. A total of 2125 multi-exon genes, 406 single exon genes and 2065 partial genes are predicted by the Genscan-based EGPred strategy. HMMgene-based EGPred strategy produced 4000 multi-exon genes, 220 single-exon genes and 2705 partial genes. Surprisingly, more than 70% of exons predicted by the EGPred are not reported in the annotation. Since EGPred uses similarity to protein sequences, a large fraction of predicted genes are likely to be protein coding. However, results suggest that all predictions from similarity-based approach are also predicted by ab initio approach. A considerable proportion of genes are estimated to be absent from the current databases therefore the predicted genes may also have potentially novel protein-coding genes. A direct computation of sensitivity and specificity of the program based on available public domain annotation for human chromosome 13 is impossible for two main reasons. First, is the overlapping transcripts for different genes (see public domain annotation file above), and secondly due to the fact that most publicl domain annotations are manually curated at a final stage based on available EST, cDNA or protein information. Since almost half of the genes and their products are not yet identified, such a curation will inadverently result in incomplete data. While EGPred is demonstrated to be reliable, the success of the program is critically dependent on the accuracy of underlying programs and continued improvements in gene prediction algorithms should improve future EGPred results.