DDBJ/GenBank/EMBL files
You can help EcoliWiki by editing the content of this page. For information about becoming a registered user and obtaining editing privileges, see Help:Accounts.
Contents
General Overview
DDBJ, GenBank, EMBL share a common file format. The file consists of a block of header information followed by a list of features.
Formal Definition / Specification
Versions
- Every line in an EMBL sequence file begins with a line type identifier, followed by three blank spaces, while GenBank files are unnumbered
Description
Header
Features
Features occur after a line:
FEATURES Location/Qualifiers
and have the general format of
Key Location
/qualifier
/qualifier
...
Keys
Keys are terms such as gene, CDS, mRNA. Most, but not all, of these can be mapped directly onto Sequence Ontology terms. Keys shown below are relevant to bacteria. <protect>
| Key | Examples | SO equivalent(s) | Comment |
|---|---|---|---|
|
attenuator |
alternative RNA structure in the trp leader |
SO:A sequence segment located within the five prime end of an mRNA that causes premature termination of translation. Comment:this may need adjustment; attenuators could be in the middle of a polycistronic mRNA. | |
|
CDS |
In SO, includes a start codon and a stop codon | ||
|
conflict |
SO shows this as being for polypeptide information | ||
|
D-loop |
|||
|
exon |
|||
|
gap |
|||
|
gene |
|||
|
iDNA |
|||
|
intron |
|||
|
mat_peptide |
|||
|
misc_binding |
|||
|
misc_difference |
|||
|
misc_feature |
|||
|
misc_recomb |
|||
|
misc_RNA |
|||
|
misc_signal |
|||
|
misc_structure |
|||
|
modified_base |
|||
|
mRNA |
|||
|
ncRNA |
|||
|
operon |
|||
|
oriT |
|||
|
precursor_RNA |
|||
|
prim_transcript |
|||
|
primer_bind |
|||
|
promoter |
|||
|
protein_bind |
|||
|
RBS |
|||
|
repeat_region |
|||
|
rep_origin |
|||
|
rRNA |
|||
|
sig_peptide |
|||
|
source |
|||
|
stem_loop |
|||
|
STS |
|||
|
terminator |
|||
|
tmRNA |
|||
|
tRNA |
|||
|
unsure |
|||
|
variation |
|||
|
3'UTR |
|||
|
5'UTR |
|||
|
-10_signal |
|||
|
-35_signal |
| ||
| edit table |
</protect>
Locations
The format supports several different kinds of locations <protect>
| Location type | Example | Comments |
|---|---|---|
|
Single base |
1000 |
Single integer |
|
Site between two bases |
123^124 |
Used for things like nuclease cleavage sites |
|
sequence span |
340..565 |
|
| edit table |
</protect>
Qualifiers
Associated Bioinformatics Tools
Usage examples
Information on Other Websites
References
See Help:References for how to manage references in EcoliWiki.