Diogenes
Reliable prediction of protein-encoding regions in short genomic sequences
 

     online access       feedback       downloads

Diogenes sequence analysis
Diogenes is a sequence analysis code developed for use in high-throughput operations. It targets short genomic sequences, searching for and reporting likely protein-encoding regions.

It does not attempt to replicate standard gene-finding software, but rather complements them in the cases they are the weakest: dealing with short sequences on the order of 300 to 800 bp.

How it works
Diogenes operates on your sequences in the following way:
  • Identifies ORF candidates in all six reading frames
  • Assigns measures of coding potential to each candidate region and then
  • Decides if the potential is high enough to report as a likely protein encoding region. If it is, the prediction is reported.
Coding potential is measured using statistics gathered from organism-specific training sets.

About the predictions
Predictions made by Diogenes are regions that appear to encode a portion of a protein. Each prediction includes the following parameters:
strand
"+" means the prediction is with respect to the sequence as you provide it; "-" means the reverse complement was used
from, to
Locations of the prediction on the sequence with respect to the strand indicated; zero-based. When strand = "+" the first base of the sequence has location 0; when strand = "-" the first base of the reverse complement sequence has location 0. NOTE: This will change in the next release when we switch over to GFF-style coordinates for reporting.
p-value
A significance measure. In practice p = 1.0e-3 indicates a very, very weak prediction that you'd probably ignore; p = 1.0e-6 is decent; p = 1.0e-10 is strong; p = 0.0 is strongest. The p-value is obtained by comparing the computed score with the scores obtained from regions not encoding a protein, and it indicates the probability that a non protein-encoding region candidate would produce a score higher than the one reported.
predicted
The nucleotide sequence corresponding to the prediction
translation
The peptide sequence corresponding to the prediction
Citing
The Diogenes publication has not yet appeared. In the meantime please cite it in your work as:
Crow JA, Retzel EF. Diogenes -- Reliable prediction of protein-encoding regions in short genomic sequences, http://analysis.ccgb.umn.edu/diogenes (2005)
 try it now ...