Database Overview

From EcoliWiki
Jump to: navigation, search

See also Category:Databases

Content & Features at different databases

These sections list information we want...add text to indicate where the information can be found.

Genomes

DNA sequences for complete and partial genomes are mostly at Genbank.

E. coli

Strain Genbank Accession Other Sources Notes
MG1655 Refseq: NC_000913

GenBank: U00096

Complete genome
W3110 Refseq: AC_000091

GenBank: AP009048

Complete genome

Phage

Plasmids

These should include F and F's (see Ch 129 of the E. coli Book)

Genetics

Genes

We use gene and locus interchangeably, but gene is perhaps a subset of locus (see Discussion page). Note that some customization will be needed for phage and plasmid genes vs. chromosomal genes.

Nomenclature

See also Help:Genetic_nomenclature

  • Standard name: e.g. lacZ, or rpoD
  • Synonyms: e.g. groP for dnaK, alt for rpoD
  • Accessions
    • To genomic sequence (bnum, JW, ECK)
    • To external databases (GenBank, Swissprot, ASAP, EcoCyc, etc.)

History and references for the above, e.g. "lac for lactose metabolism (ref)"


Gene Sequence

  • -10 signal
  • -35 signal
  • 5' UTR
  • 3' UTR
  • Attenuator
  • CDS
  • D-loop
  • Enhancer
  • Gene
  • Intron and exon
  • modified base
  • mRNA
  • operon
  • oriT
  • Precursor RNA
  • Primer Bind
  • Promoter
  • Protein Binding
  • RBS
  • Replication origin
  • Repeat Region
  • rRNA
  • stem loop
  • terminator (Rho dependent and independent)
  • tRNA

Comparisons

  • Conflict
  • Miscellaneous Difference
  • Old Sequence
  • Variation
  • Unsure


Non-SO Terms

  • 5' End of Message
  • att sites
  • Anti-terminator
  • nut sites
  • operator
  • oriC
  • 'shifty' sequences
  • translation start codon
  • transcriptional start site

Gene models

A gene model defines where a gene is on the genomic DNA. Different annotators may disagree about where a gene starts and stops. Need to look at Sequence Ontology features

Map information

  • Location in map units
  • Cotransduction frequencies to nearby markers
  • Presence or absence in different strains
    • Covered by deletions
    • Covered by F's
    • Hfr to transfer
    • Covered by ASKA clones
    • Covered by Kohara phage
    • Covered by Clark-Carbon clone

References and commentary

Alleles

Mutant alleles and links to strains containing these and references.

Phenotypes

Phenotypes for knockout and other alleles. Phenotypes should be annotated to a phenotype ontology to aid comparison among genes. Discussions have begun with SGD on a shared microbial phenotype ontology.

Gene Products

Proteins

Nomenclature

  • Protein Name - just gene product notation e.g. DnaK. (this will be pagename in wiki)
    • Protein name synonyms
  • Product description/name - e.g. Hsp70-family chaperone
    • Protein family (pfam, Interpro)
    • NCBI
    • From Uniprot
    • Other synonyms
  • Accessions/IDs/Crossrefs
    • Protein genbank ID (gi)
    • Uniprot/Swissprot/TrEmbl
    • Systematic gene name (PortEco ID)
    • Genome ID# (bnum JW ECK)
  • Source (organism, strain, substrain)

Sequence

  • Protein seqence/length
  • Annotation Version (Reference)
  • Codon adaptation index
  • Sequence features
    • Domains/Motifs (Interpro, Pfam, CDD)
    • Signal sequences
    • Other localization/post-translational modification motifs

References

Physical Properties

  • Calculated molecular weight/PI
  • Observed molecular weight/PI
  • Calculated extinction coefficient

Protein Structure

  • Structure (PDB)
  • Predicted
    • structure
    • coiled coils
    • other motifs
    • transmembrane segments/topology
  • Structure families (SCOP,CATH)

Protein Function

  • Function (GO has refs)
  • Enzyme activity EC number (add refs) - look at BRENDA
  • Pathway/process links
  • Ligands
  • Regulation (allosteric/feedback)
  • Interactions

Experimental Data

  • Proteomics study
  • Protein-protein interaction
  • Purification
  • Antibody
  • Localization

Evolutionary relationships

  • BLAST/FASTA/Homolog listing
  • Orthologous groups (COG)
  • Taxonomic distribution
    • Origin (horizontal vs vertical)
  • Ortholog lists (by whatever method)

RNAs

DNA elements

Operons

Regulons

Mobile Genetic Elements

Prophages

IS elements and Transposons

Plasmids

Experimental data

Expression

  • Microarrays
  • Proteomics
    • 2-D gels protein levels
    • Mass Spec
      • protein levels
      • protein modification
  • Metabolomics
  • Growth curves
  • Gene reporter fusions

Localization

  • GFP-fusions

Interactions

Ecology

  • Mouse colonization fitness assays

Query types

Query by gene name

Query inputs and outputs
Input Output How this is handled in different databases
gene name (e.g. lacY)
  • ASAP (participant of the NAR 06 paper) enter query as feature name from query page (link is to MG1655; other genomes available) returns page with link to features page.
  • GenoBase (participant of the NAR 06 paper)
  • EcoCyc (participant of the NAR 06 paper)
  • EcoGene (participant of the NAR 06 paper)
  • RegulonDB (?)
  • KEGG: AA and nucleotide sequences (imported from Genbank)
sequence
  • ASAP (participant of the NAR 06 paper)
  • GenoBase (participant of the NAR 06 paper)
  • EcoCyc (participant of the NAR 06 paper)
  • EcoGene (participant of the NAR 06 paper)
  • RegulonDB (?)
  • KEGG: AA and nucleotide sequences (imported from Genbank)
description

function

localization

process (GO definitions )

gene name (namespace1)

gene name (namespace2)
Some namespaces:

  • genbank ID
  • pdb ID
  • ecogene ID
gene name pathway
gene name experimental data
  • gene expression
  • protein interaction
gene name (species 1) gene name( species 2)
gene name
  • mutations
  • sequence difference
  • phenotype

Query by pathway

Query inputs and outputs
Input Output How this is handled in different databases
pathway
  • gene name
metabolic process
  • regulatory circuits
  • transcription factors/sites

Query by DNA location or sequence

Query inputs and outputs
Input Output How this is handled in different databases
DNA location Gene name
sequence related sequence (BLAST)

Query by Function

Query inputs and outputs
Input Output How this is handled in different databases
gene function (GO term)

same for process and other GO branches

gene name

Query by strain

Query inputs and outputs
Input Output How this is handled in different databases
strain ID
  • gene name
  • mutation
  • source