Database Overview
See also Category:Databases
Content & Features at different databases
These sections list information we want...add text to indicate where the information can be found.
Genomes
DNA sequences for complete and partial genomes are mostly at Genbank.
E. coli
Strain | Genbank Accession | Other Sources | Notes |
---|---|---|---|
MG1655 | Refseq: NC_000913
GenBank: U00096 |
Complete genome | |
W3110 | Refseq: AC_000091
GenBank: AP009048 |
Complete genome |
Phage
Plasmids
These should include F and F's (see Ch 129 of the E. coli Book)
Genetics
Genes
We use gene and locus interchangeably, but gene is perhaps a subset of locus (see Discussion page). Note that some customization will be needed for phage and plasmid genes vs. chromosomal genes.
Nomenclature
See also Help:Genetic_nomenclature
- Standard name: e.g. lacZ, or rpoD
- Synonyms: e.g. groP for dnaK, alt for rpoD
- Accessions
- To genomic sequence (bnum, JW, ECK)
- To external databases (GenBank, Swissprot, ASAP, EcoCyc, etc.)
History and references for the above, e.g. "lac for lactose metabolism (ref)"
Gene Sequence
- -10 signal
- -35 signal
- 5' UTR
- 3' UTR
- Attenuator
- CDS
- D-loop
- Enhancer
- Gene
- Intron and exon
- modified base
- mRNA
- operon
- oriT
- Precursor RNA
- Primer Bind
- Promoter
- Protein Binding
- RBS
- Replication origin
- Repeat Region
- rRNA
- stem loop
- terminator (Rho dependent and independent)
- tRNA
Comparisons
- Conflict
- Miscellaneous Difference
- Old Sequence
- Variation
- Unsure
Non-SO Terms
- 5' End of Message
- att sites
- Anti-terminator
- nut sites
- operator
- oriC
- 'shifty' sequences
- translation start codon
- transcriptional start site
Gene models
A gene model defines where a gene is on the genomic DNA. Different annotators may disagree about where a gene starts and stops. Need to look at Sequence Ontology features
Map information
- Location in map units
- Cotransduction frequencies to nearby markers
- Presence or absence in different strains
- Covered by deletions
- Covered by F's
- Hfr to transfer
- Covered by ASKA clones
- Covered by Kohara phage
- Covered by Clark-Carbon clone
References and commentary
Alleles
Mutant alleles and links to strains containing these and references.
Phenotypes
Phenotypes for knockout and other alleles. Phenotypes should be annotated to a phenotype ontology to aid comparison among genes. Discussions have begun with SGD on a shared microbial phenotype ontology.
Gene Products
Proteins
Nomenclature
- Protein Name - just gene product notation e.g. DnaK. (this will be pagename in wiki)
- Protein name synonyms
- Product description/name - e.g. Hsp70-family chaperone
- Protein family (pfam, Interpro)
- NCBI
- From Uniprot
- Other synonyms
- Accessions/IDs/Crossrefs
- Protein genbank ID (gi)
- Uniprot/Swissprot/TrEmbl
- Systematic gene name (PortEco ID)
- Genome ID# (bnum JW ECK)
- Source (organism, strain, substrain)
Sequence
- Protein seqence/length
- Annotation Version (Reference)
- Codon adaptation index
- Sequence features
- Domains/Motifs (Interpro, Pfam, CDD)
- Signal sequences
- Other localization/post-translational modification motifs
References
Physical Properties
- Calculated molecular weight/PI
- Observed molecular weight/PI
- Calculated extinction coefficient
Protein Structure
- Structure (PDB)
- Predicted
- structure
- coiled coils
- other motifs
- transmembrane segments/topology
- Structure families (SCOP,CATH)
Protein Function
- Function (GO has refs)
- Enzyme activity EC number (add refs) - look at BRENDA
- Pathway/process links
- Ligands
- Regulation (allosteric/feedback)
- Interactions
Experimental Data
- Proteomics study
- Protein-protein interaction
- Purification
- Antibody
- Localization
Evolutionary relationships
- BLAST/FASTA/Homolog listing
- Orthologous groups (COG)
- Taxonomic distribution
- Origin (horizontal vs vertical)
- Ortholog lists (by whatever method)
RNAs
DNA elements
Operons
Regulons
Mobile Genetic Elements
Prophages
IS elements and Transposons
Plasmids
Experimental data
Expression
- Microarrays
- Proteomics
- 2-D gels protein levels
- Mass Spec
- protein levels
- protein modification
- Metabolomics
- Growth curves
- Gene reporter fusions
Localization
- GFP-fusions
Interactions
Ecology
- Mouse colonization fitness assays
Query types
Query by gene name
Input | Output | How this is handled in different databases |
---|---|---|
gene name (e.g. lacY)
|
sequence | |
description |
||
function |
||
localization |
||
process (GO definitions ) |
||
gene name (namespace1) |
gene name (namespace2) | |
|
||
|
||
|
||
gene name | pathway | |
gene name | experimental data | |
|
||
|
||
gene name (species 1) | gene name( species 2) | |
gene name |
|
|
|
||
|
Query by pathway
Input | Output | How this is handled in different databases |
---|---|---|
pathway |
|
|
metabolic process |
|
|
|
Query by DNA location or sequence
Input | Output | How this is handled in different databases |
---|---|---|
DNA location | Gene name | |
sequence | related sequence (BLAST) |
Query by Function
Input | Output | How this is handled in different databases |
---|---|---|
gene function (GO term) same for process and other GO branches |
gene name |
Query by strain
Input | Output | How this is handled in different databases |
---|---|---|
strain ID |
|
|
|
||
|