DDBJ/GenBank/EMBL files

From EcoliWiki
Jump to: navigation, search

You can help EcoliWiki by editing the content of this page. For information about becoming a registered user and obtaining editing privileges, see Help:Accounts.

General Overview

DDBJ, GenBank, EMBL share a common file format. The file consists of a block of header information followed by a list of features.

Formal Definition / Specification

Versions

  • Every line in an EMBL sequence file begins with a line type identifier, followed by three blank spaces, while GenBank files are unnumbered

Description

Header

Features

Features occur after a line:

FEATURES             Location/Qualifiers

and have the general format of

Key      Location
         /qualifier
         /qualifier
         ...

Keys

Keys are terms such as gene, CDS, mRNA. Most, but not all, of these can be mapped directly onto Sequence Ontology terms. Keys shown below are relevant to bacteria. <protect>

Key Examples SO equivalent(s) Comment

attenuator

alternative RNA structure in the trp leader

SO:0000140 attenuator

SO:A sequence segment located within the five prime end of an mRNA that causes premature termination of translation.

Comment:this may need adjustment; attenuators could be in the middle of a polycistronic mRNA.

CDS

SO:0000316: CDS

In SO, includes a start codon and a stop codon

conflict

SO:0001085:sequence_conflict

SO shows this as being for polypeptide information

D-loop

exon

gap

gene

iDNA

intron

mat_peptide

misc_binding

misc_difference

misc_feature

misc_recomb

misc_RNA

misc_signal

misc_structure

modified_base

mRNA

ncRNA

operon

oriT

precursor_RNA

prim_transcript

primer_bind

promoter

protein_bind

RBS

repeat_region

rep_origin

rRNA

sig_peptide

source

stem_loop

STS

terminator

tmRNA

tRNA

unsure

variation

3'UTR

5'UTR

-10_signal

-35_signal


</protect>

Locations

The format supports several different kinds of locations <protect>

Location type Example Comments

Single base

1000

Single integer

Site between two bases

123^124

Used for things like nuclease cleavage sites

sequence span

340..565
<345..500
1..>888
join(12..78,134..202) complement(34..126)


</protect>

Qualifiers

Associated Bioinformatics Tools

Usage examples

Information on Other Websites

References

See Help:References for how to manage references in EcoliWiki.