Frequently Asked Questions:
All pages were tested only for Netscape 7, IE 6.
3. What is DNannotator good at?
4. What is annotated sequence, and what is un-annotated sequence?
5. Why do we need DNannotator?
6. What is the basic principle of DNannotator?
7. What is gb-header, why use gb-header?
8. What is the basic requirement for using this tool?
9. How can I view my annotation graphically?
10. Why I can't get the annotation I am looking for?
11. What is the limitation of DNannotator?
12. Why is there a limit on data size?
13. Can I have oversized data analyzed in DNannotator?
14. What is coming?
15. Subscribe to mailing list.
It's a tool for you, to do LARGE-SCALE (> 100 annotations) on SIZEABLE genomic region (> 1 Mb).
1. If you have some sequence annotations in a draft sequence and would like to have the same annotations in a new sequence version but don't want to go over the whole tedious and delicate manual annotation procedures again.
2. If you have hundreds of SNPs or primers to map/annotate into some genomic DNA sequences, no matter where the sequence came from.
Then, you may need to use DNannotator.
If you have only one annotations to make, in a small one gene region, don't bother to learn how DNannotator works. I recommend that you can use Vector NTI (not cheap, but easy to use) or Sequin, or even a simple text editor to open your sequence, and do the annotation manually. BUT, DNannotator will still work for you as it would supposedly not make any "typo" for your annotation.
Then, why "manual" tools can't do "large-scale annotation on sizeable region"? 1). you may have problem open the target sequence due to the sequence size; 2). Manual annotation can be time-consuming and error-prone, also extremely boring.
It's not a replacement of those tools such as Genotator, NIX, or ORNL Genome Analysis Pipeline etc, which perform gene prediction and searching against static public databases.
It's not a replacement of public annotation such as NCBI map viewer, UCSC Genome Browser or Sanger Center Ensembl. Because DNannotator does not intend to do searching against the whole public databases, neither does it intend to do whole genome annotation.
DNannotator does not provide graphic viewer. You have to take DNannotator's output and use viewer like Artemis or Genome Browser.
Annotation using your own source data and your own customized gDNA sequence.
Annotated sequence vs. Un-annotated sequence
Annotation means labels of features or notes for fragments/parts of a sequence, specifying what the fragments are, or what the fragments do, etc. For example, an annotation in Genbank format as:
exon 67408..67569
/vntifkey="61"
/label=gene1\exon2
states that bases 67408 to 67569 are a region of exon2 of gene1.
We are using a narrowly defined concept here. Both "annotated sequence" and "un-annotated sequence" are related to a certain kind of annotations or features.
For example, when we talk about map SNP "rs123" to a "sequence A", if
there is already an annotation to specify where rs123 is located in sequence
A, (here specifically, we are talking about the label in standard Genbank
format or in a feature table), then sequence A is regarded as "annotated" for SNP rs123.
On the contrary, if sequence B does not contain annotation for "rs123",
even thought it may contain other annotation, such as exons of gene xxx,
it is still regarded as "un-annotated" for SNP rs123. Therefore,
sequence B can be annotated by DNannotator for rs123.
Three major purposes for sequence annotation: 1. organize all sequence related data, such as gene structures, expression regulation elements, etc., especially those that can be submitted to public places and be exchanged with other researchers; 2. manage all lab related data, such as oligos, primers, and amplicons used in the lab. It's more project-specific data, but important for efficient administration of a research project like gene mutation screening or association linkage screening. 3. Preserve annotation when sequence is updated; DNannotator helps you to do batch annotation with your own data (primers, SNPs, exons) to any target gDNA sequence you want to use. The target gDNA can be those downloaded from public database, can be those assembled in your own way.
You may wish to use DNannotator,
when you have hundreds of primers, SNPs, exons, etc. scattered around in your computer and wish to have a common ground to organize them. With the common ground, you can see clearly all the positional relationships between the elements.
when you have hundreds of elements annotated in an old version of draft sequence, but a new version of sequence is coming out. You will want to move to the ground of the new version of sequence but don't want to re-do all the carefully performed manual annotations.
potentially, you can adapt this method to do homologue mapping between sequences from different species by adjusting the threshold of the BLAST filter.
For example, with one completely un-annotated Genbank format sequence
as a starting point, you can use DNannotator to add "features" of any SNP
into the sequence; later, you can use the output of the previous SNP mapping
analysis as un-annotated sequence for annotating "features" of new SNPs,
primers, or gene exons. This kind of cycle can continue looping as much
as you wish. New feature/annotation can accumulate easily in this
one sequence-based platform.
There is no need to worry about losing varieties of annotations that
have hardly accumulated, because a function of "annotation migration" takes
care of this. All annotations in an old version of sequence which were
not disrupted (disrupted annotation means that the new version of sequence
has a completely different organization at the annotated fragment place)
in the new version of sequence will be kept in that version. Therefore,
you can focus on creating new annotation based on the new version of sequence
without wasting time repeating all the analyses already performed.
1. For SNP, primer mapping, and annotation migration:
BLAST results provide all the homologous relationships between annotated and un-annotated sequence fragments. If looking only at those perfect matches, generally speaking, the matches indicate all the corresponding elements in two sequences. With accurate calculation, therefore, the annotation can be transferred to the un-annotated sequence based on the matching relationship.
2. For exon mapping:
Basically, we just provide a handy parser for Sim4 results. The program here converts the Sim4 results into Genbank format feature data.
BLAT-based exon mapping is also provided.
3. For STS mapping:
e-PCR is used to map "primer-pair information" based STS mapping. So, if you have the primer pairs which are used to amplify the markers, you'd better use this method to map the markers. In many instances, this method is more sensitive than BLAST-based approach. But the tradeoff of this method is that the annotation results rely on the accuracy of primers supplied.
In annotation migration, features extending large region, such as "gene", would use e-PCR approach.
gb-header and its related utilities
gb-header is the beginning part of Genbank format data, which includes contents of LOCUS, DEFINITION, ACCESSION, VERSION, SOURCE, REFERENCE, FEATURES, BASE COUNT and ends with ORIGIN. In other words, only sequence parts are excluded from gb-header.
A tool is provided for you to extract gb-header from standard Genbank format sequence data.
We use gb-header rather than whole Genbank data for annotation purpose, since gb-header is the place harboring all the feature data. Sequence data are analyzed separately in BLAST, which is used as basis to generate the new annotation.
By using gb-header alone, much small amount of data need to be uploaded for processing. It's obvious that sequence part occupies the majority of one Genbank format data. By doing this, the procedure can be speeded up, and much less amount of data need to be transferred over network. The GFF format proposed by Sanger as gb-header, provides only annotation data too.
If doing annotation many times with many batches of source data, you will generate a series of gb-headers differing only at "FEATURE" contents. A tool is supplied for merging all gb-headers into one. You just need to put all the gb-header files into one gzip archive, and submit it to DNannotator.
To make the pretty and clean gb-header, you can run a small function of DNannotator to re-organize and clean the final combined gb-header file.
Certainly, you may need to merge gb-header with its corresponding sequence body later after all annotations are accomplished. A small utility is implemented to do this for you.
If you want to use Artemis to view the annotation, and you wish to put the different categories of annotation into different layer, then, don't merge all the gb-header files. By reading-in them separately as individual "entry", you can get a better layer-by-layer view in Artemis. (More information about Artemis)
Artemis is recommended as one good choice for view annotation in Genbank format or gb-header.
To get further information about Artemis, please follow the link.
If you are doing custom annotation on Genome Browser's latest freeze, you can take the "custom track" data file and view it in Genome Browser.
Possible reasons of receiving unexpected results:
Assuming you already checked that you have proper hits in the BLAST results and all required input files are ready, the following errors may be the cause of problems:
DNannotator relies on the annotation you already have. Or, it will start from the very beginning, helping you with the fragments or elements you wish to annotate into the DNA sequence. The user needs to supply the correct information for correct annotation or annotation migration.
Since DNannotator relies on the BLAST match data, certain annotation may not be made by DNannotator in strict conditions. Manual annotation for those difficult scenarios is always required.
With better computing resource, we will lift the limit.
Functions to be implemented or improved
1. Implement a complete data set to support DAS system. Ideally, if you have DAS up-and-running, you will be able to directly import DNannotator output into your "reference sequence server" and "annotation server".
2. Gene prediction, transcription-factor binding sites prediction, etc will be implemented soon.
3. BLAT-match based annotation for STS.
4. annotation migration take gb-header too.
5. Modified data merging, so that .zip file rather than only .tar can be used to merge multiple gb-header file.
Can merge the sequence at the same time as a optional choice. So user does not need to use two functions to do one thing.
6. Automatic update UniSTS data.
7. Use Genbank accession number as cDNA annotation source.