OrthoMCL-DB
You can help EcoliWiki by editing the content of this page. For information about becoming a registered user and obtaining editing privileges, see Help:Accounts.
<protect>
Link/URL: | |
---|---|
What: |
A genome-scale algorithm for grouping orthologous protein sequences. |
Who: |
University of Penn. |
edit table |
</protect>
See Help:Database_table for how to add or edit information in this section of EcoliWiki.
Contents
About OrthoMCL-DB
OrthoMCL-DB is an orthology database[1]. The OrthoMCL tool[2] is also used by other groups including P-POD.
Content
OrthoMCL DB houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages. OrthoMCL software is used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species.
Using OrthoMCL-DB
Browsing
Searching
Usage examples
To compute EcoliK12-Human orthologs:
1. On this Phyletic Pattern Expression (PPE)query page, submit this PPE query eco+hsa=2T to find out all the OrthoMCL groups which have both eco and hsa (eco and hsa are the abbreviation names for E. coli and H. sapiens). 2. Go to the Group Query History Page, select the above PPE query eco+hsa=2T, click on "GROUP QUERY INTO SEQUENCE QUERY". This will convert all the sequences belonging to these groups into a sequence query. 3. Do a sequence accession query using eco as taxon_abbreviation, which will find all E.coli sequences 4. Do the same thing as 3 for human using hsa 5. In sequence query history page, select 2 and 3, merge them using intersection, you will get all eco genes satisfying your query 6. In sequence query history page, select 2 and 4, merge them using intersection, you will get all hsa genes satisfying your query 7. In sequence query history page, you can save the gene ids together with their group ids on query 5 and query 6. Once you have saved the gene ids from 7 above, write scripts to find the common groups to both species & then identify the putative orthologs.
To link to a page in OrthoMCL DB with the protein of your interest:
1. Determine what identifier (NCBI GI numbers, Ensembl IDs, etc) goes with your species of interest. OrthoMCL DB uses GI numbers for E.coli and Ensembl Protein IDs for Human. 2. Say we want to know if OrthoMCL DB has any entry for the E.coli GI 16130640, one would build the OrthoMCL DB link as: http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=gi|16130640. 3. An example with Ensembl ID ENSP00000317668: http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=ENSP00000317668 4. If you build a link with an identifier that is not present, it will return a page "No Results Found". An example: http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=ENSP00000317800
To find out the Ortholog Cluster in which a particular protein is found:
1. Continuing with the above E.coli GI 16130640 example, lets say we wanted to find out the OrthoMCL cluster which has the E.coli GI 16130640. 2. To build a link, one would have to append &groupredirect=1 to the OrthoMCL Link which was used to determine if that protein existed in OrthoMCL DB. 3. Therefore, the link to the OrthoMCL cluster having E.coli GI 16130640 would be: http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=sequenceList&in=Accession&q=gi|16130640&groupredirect=1
Add links to additional pages describing success stories here.
Technology
Snipped from http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=orthomcl#Software
To satisfy the request to run ortholog clustering without depositing data into GUS database, the stand-alone version of OrthoMCL was developed as a stand-alone PERL package, which is available at http://orthomcl.cbil.upenn.edu/ORTHOMCL/. Stand-alone OrthoMCL requires protein FASTA files for each genomes, and it calls for an all-against-all BLAST analysis. Alternatively, OrthoMCL can start analysis by reading a BPO file (Blast Parsing Out) which describes genes paired by BLAST matches, the E-value, and the identity percentage and the related HSP information. As a perl package stand-alone OrthoMCL doesn't need compilation. However it requires some softwares and perl modules to run: Softwares: 1. BLAST (NCBI-BLAST, WU-BLAST, etc.) 2. MCL (Markov Clustering algorithm), available at http://micans.org/mcl/; Perl Modules: 1. Bio::SearchIO (part of BioPerl, http://bioperl.org) 2. Storable
Web Services/API
Discussion
External Links
OrthoMCL-DB URL:http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi
Discussion of OrthoMCL-DB on other websites
References
See Help:References for how to manage references in EcoliWiki.