OrthoMCL-DB

From EcoliWiki
Jump to: navigation, search

You can help EcoliWiki by editing the content of this page. For information about becoming a registered user and obtaining editing privileges, see Help:Accounts.

<protect>

Link/URL:

OrthoMCL-DB

What:

A genome-scale algorithm for grouping orthologous protein sequences.

Who:

University of Penn.

</protect>


See Help:Database_table for how to add or edit information in this section of EcoliWiki.

About OrthoMCL-DB

Home Page

OrthoMCL-DB is an orthology database[1]. The OrthoMCL tool[2] is also used by other groups including P-POD.

Content

OrthoMCL DB houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages. OrthoMCL software is used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species.

Using OrthoMCL-DB

Browsing

Searching

Usage examples

To compute EcoliK12-Human orthologs:

 1. On this Phyletic Pattern Expression (PPE)query page, 
    submit this PPE query eco+hsa=2T to find out all the OrthoMCL groups which have both eco and hsa 
    (eco and hsa are the abbreviation names for E. coli and H. sapiens).
 2. Go to the Group Query History Page, select the above PPE query eco+hsa=2T, 
    click on "GROUP QUERY INTO SEQUENCE QUERY". This will convert all the sequences belonging to 
    these groups into a sequence query.
 3. Do a sequence accession query using eco as taxon_abbreviation, which will find all  
    E.coli sequences
 4. Do the same thing as 3 for human using hsa
 5. In sequence query history page, select 2 and 3, merge them using intersection, you will get all 
    eco genes satisfying your query
 6. In sequence query history page, select 2 and 4, merge them using intersection, you will get all 
    hsa genes satisfying your query
 7. In sequence query history page, you can save the gene ids together with their group ids on query 
    5 and query 6.
 Once you have saved the gene ids from 7 above, write scripts to find the common groups to both 
 species & then identify the putative orthologs.

To link to a page in OrthoMCL DB with the protein of your interest:

  1. Determine what identifier (NCBI GI numbers, Ensembl IDs, etc) goes with your species of 
     interest. OrthoMCL DB uses GI numbers for E.coli and Ensembl Protein IDs for Human.
  2. Say we want to know if OrthoMCL DB has any entry for the E.coli GI 16130640, one 
     would build the OrthoMCL DB link as:
  http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=gi|16130640.
  3. An example with Ensembl ID  ENSP00000317668:
http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=ENSP00000317668
  4. If you build a link with an identifier that is not present, it will return a page "No Results  
     Found". An example: http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=ENSP00000317800

To find out the Ortholog Cluster in which a particular protein is found:

  1. Continuing with the above E.coli GI 16130640 
    example, lets say we wanted to find out the OrthoMCL cluster which has the E.coli GI 16130640. 
  2. To build a link, one would have to append &groupredirect=1 to the OrthoMCL Link 
    which was used to determine if that protein existed in OrthoMCL DB.
  3. Therefore, the link to the OrthoMCL cluster having E.coli GI 16130640 would be: 
  http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=gi|16130640&groupredirect=1

Add links to additional pages describing success stories here.

Other sites with related content

Technology

Snipped from http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=orthomcl#Software

 To satisfy the request to run ortholog clustering without depositing data into GUS database, the
 stand-alone version of OrthoMCL was developed as a stand-alone PERL package, which is available at 
 http://orthomcl.cbil.upenn.edu/ORTHOMCL/.

 Stand-alone OrthoMCL requires protein FASTA files for each genomes, and it calls for an 
 all-against-all BLAST analysis. Alternatively, OrthoMCL can start analysis by reading a BPO file 
 (Blast Parsing Out) which describes genes paired by BLAST matches, the E-value, and the identity 
 percentage and the related HSP information.
 
 As a perl package stand-alone OrthoMCL doesn't need compilation. However it requires some softwares 
 and perl modules to run:
    Softwares: 1. BLAST (NCBI-BLAST, WU-BLAST, etc.) 2. MCL (Markov Clustering algorithm), available
    at http://micans.org/mcl/;
    Perl Modules: 1. Bio::SearchIO (part of BioPerl, http://bioperl.org) 2. Storable 

Web Services/API

Discussion

External Links

OrthoMCL-DB URL:http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi

Discussion of OrthoMCL-DB on other websites

References

See Help:References for how to manage references in EcoliWiki.

  1. Chen, F et al. (2006) OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34 D363-8 PubMed
  2. Li, L et al. (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13 2178-89 PubMed