第七届九源奖学金一等奖论文摘要(2)
4、ZCURVE: a new system for recognizing protein-coding genes
in bacterial and archaeal genomes (发表刊物:Nucl. Acids. Res.
2003, 31: 1780-1789)
A new system, ZCURVE 1.0, for finding
genes in bacterial or archaeal genomes based on the Z curve
has been proposed. The default minimum ORF length is 90 bp.
To evaluate the performance of the new system, ZCURVE 1.0 and
Glimmer 2.02 were run, respectively, for each of the 18 bacterial
or archaeal genomes available in GenBank Release 129.0, which
were not annotated by Glimmer system. Consequently, on average
98.41% and 98.21% (accuracy) of annotated genes were found,
meanwhile, 18.65% and 29.38% (additional prediction rate) of
additional genes were predicted by the former and latter, respectively.
The result shows that the average accuracy of both systems is
well matched, but the performance of ZCURVE 1.0 with respect
to the additional prediction rate is much better than that of
Glimmer 2.02. Additionally, the accuracy of gene start prediction
of ZCURVE 1.0 is also found to be better or at least comparable
with that of some existing systems. Since the method used by
the new system and the Markov-model-based methods lay stress
on global and local statistical characteristics of coding sequences,
respectively, they are complementary essentially. It is shown
that jointing applications of both systems greatly improve gene-finding
results.
For a typical genome, e.g., E. coli, the system ZCURVE 1.0 takes
about 2 minutes on a Pentium III 866 PC running the WINDOWS
operating system without any human intervention. The system
ZCURVE 1.0 is freely available from the website:http://tubic.tju.edu.cn/Zcurve_B/