Clusters of Orthologous Groups (COGs)
You can help EcoliWiki by editing the content of this page. For information about becoming a registered user and obtaining editing privileges, see Help:Accounts. <protect>
Phylogenetic classification of proteins encoded in complete genomes
not being updated at NCBI
- 1 About Clusters of Orthologous Groups (COGs)
- 2 Content
- 3 Using Clusters of Orthologous Groups (COGs)
- 4 Other sites with related content
- 5 Technology
- 6 Discussion
- 7 References
- 8 External Links
- 9 Discussion of Clusters of Orthologous Groups (COGs) on other websites
About Clusters of Orthologous Groups (COGs)
The Clusters of Orthologous Groups (COGs) of proteins were generated by comparing the protein sequences of complete genomes. Each cluster contains proteins or groups of paralogs from at least three lineages. The current COG database contains both prokaryotic clusters (COGs) and eukaryotic clusters (KOGs).
Although the COGS categorization of orthologs is very popular, NCBI does not seem to be maintaining it. However, other sites are using COG categories to classify genes from newly sequenced genomes. For example, see the Joint Genome Institute's Integrated Microbial Genomics Site (EcoliWiki documentation).
The COG database contains clusters from both Unicellular and Eukaryotic Organisms. There are currently 66 Unicellular organisms and a list can be found here. However, it should be noted that there is no functionality in the newest update of the unicellular organisms. Users can use the old version here. The old version contains COGs from 43 organisms. The eukaryotic organisms are new to the COG database and KOGs are limited to seven organisms: Arabidopsis thaliana, Caenorhabditis elegans, Drosophilia melanogaster, Homo Sapiens, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi.
The following statistics come from Tatusov RL et al (2003).
The COG database collection consists of 138,458 proteins from 66 genomes. These proteins form 4873 COGs. As noted above, these COGs, though listed in the new version, has no website function. The KOG sets contain 4,852 clusters of orthologs, utilizing 59,838 proteins. This is approximately 54% of the analyzed eukaryotic gene products.
The COGs and KOGs are classified into functional classifications. The functional categories are not listed on the new version of the COG website. They are designated in the papers and listed below for convience.
|A||RNA processing and modification|
|B||Chromatin Structure and dynamics|
|C||Energy production and conversion|
|D||Cell cycle control and mitosis|
|E||Amino Acid metabolis and transport|
|F||Nucleotide metabolism and transport|
|G||Carbohydrate metabolism and transport|
|L||Replication and repair|
|M||Cell wall/membrane/envelop biogenesis|
|O||Post-translational modification, protein turnover, chaperone functions|
|P||Inorganic ion transport and metabolism|
|U||Intracellular trafficing and secretion|
|R||General Functional Prediction only|
Using Clusters of Orthologous Groups (COGs)
It should be noted that the updated version of the COG database does not have functional interactions with COGs. However, the KOG clusters have interaction.
Browsing the COG database is easy, as there are links everywhere. However, there is no explanation as to what is on the links.
See Help:References for how to manage references in EcoliWiki.
The COG database: an updated version includes eukaryotes.
Tatusov RL et al BMC Bioinformatics. 2003 Sep 11;4:41. Epub 2003 Sep 11. Full Text
The COG database: a tool for genome-scale analysis of protein functions and evolution.
Tatusov RL et al Nucleic Acids Res. 2000 Jan 1;28(1):33-6. Full Text
A genomic perspective on protein families.
Tatusov RL et al Science. 1997 Oct 24;278(5338):631-7.Abstract