Help:Annotation Pipeline

From EcoliWiki
Jump to: navigation, search

Overview

Each quarter EcoliWiki incorporates the E. coli annotations from EcoCyc into our website. Upon each EcoCyc release the annotations are parsed and loaded into the wiki using automated scripts. This is reflected in the montly annotation file submitted to the Gene Ontology each month.

Detailed Description

Obtain the latest EcoCyc release
Quarterly annotations are fetched from EcoCyc and parsed for validity.
Calculate the changes
The most recent release is compared to the previous release and two files are generated:
  1. differences.lost.gaf
  2. differences.gained.gaf
These two files contain the annotatations (in GAF 1.0 format) that have been lost or gained since the last release, respectively.
Use a script to add/remove annotations from EcoliWiki
A PHP script adds and removes the annotations from the wiki based on the file.
Specifically, does the following things:
  1. looks in the differences.lost.gaf file
  2. determines which gene products need annotations removed
  3. visits those pages in the wiki
  4. finds the annotation to remove from the Gene Ontology table
  5. saves the page to commit the edit
  6. does the same steps for adding annotations

Known Issues

  • A number of old annotations are still in EcoliWiki. These were added early in the life of EcoliWiki, and were not tagged with the appropriate metadata that allows us to add/delete annotations. These are spurious and do not contribute to the total set of annotations, as these annotations could be out of date or incorrect.

See Also