You can help EcoliWiki by editing the content of this page. For information about becoming a registered user and obtaining editing privileges, see Help:Accounts.
See Help:Databases for more information about databases in EcoliWiki.
CollecTFcompiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records.
CollecTF is under continuous revision, operating on a dual curation process that combines in-house curation by undergraduate students and direct submissions by authors.
CollecTF can be leveraged for team-based, active learning activities in Genetics and Biochemistry courses. Please email the collecTF team for further information.
Data in CollecTF is preiodically pushed to NCBI RefSeq complete genomes
CollecTF compiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. CollecTF integrates multiple sources of data automatically and openly, allowing users to dynamically redefine binding motifs and their experimental support base. Most importantly, CollecTF entries are periodically submitted to NCBI for integration into RefSeq complete genome records as db_xref link-out features embedded in genome annotations. Experimentally-validated binding sites for transcription factors are therefore embedded in genome annotations as bound_moiety tags, providing information on the available experimental evidence, its published sources and a link to the full curation record on CollecTF. This approach maximizes the visibility of transcriptional regulation data by enriching the annotation of RefSeq files with regulatory information, addressing a decades-old deficit in the NCBI annotation pipeline. Hence, CollecTF operates both as a standalone database and as an open portal for entering of regulatory information into RefSeq genomes, generating a sustainable model that encourages direct author submissions in combination with in-house validation and curation of published literature.
CollecTF uses exclusively a single data source: published experimental evidence on TFBS. Sites may be supported by evidence of binding and/or evidence of their regulation of gene expression. In silico evidence is also collected, but only as complement to experimental evidence. CollecTF defines three site types: motif associated, variable motif associated and non-motif associated, based on the availability of known binding patterns associated with reported sites.
CollecTF can be browsed at three different levels
NCBI Taxonomy - Browse the database using up-to-date NCBI taxonomy. Just click on each taxonomical unit to expand it and see its associated information, and link out to species-specific reports.
TF family - Browse the database by transcription factor families. Click on each family to expand it and see its associated information, and link out to TF-family or TF-based reports.
Technique - Browse CollecTF by experimental evidence. Click on different groups of experimental techniques to expand them and see their associated information, and link out to technique.
Access to TFBS data in CollecTF is fully customizable. Users can select the specific types of experimental support for reported TFBS, aggregate information for multiple clades or TFs and compare the dynamically computed TF-binding motifs using different statistics.
Search in CollecTF is fully customizable. Just select a taxonomic unit (e.g. the Vibrio genus), a transcription factor family or instance (e.g. LexA) and the set of experimental techniques that reported sites should be backed by and click Search. Reports and exporting Search results can be seen as individual reports (one report per TF/species) or as ensemble reports (multiple TF/species). Click Export to access the data in machine readable format.
Add links to additional pages describing success stories here.
Accurate curation of TFBS data is accomplished by using a well-defined submission pipeline in which submitters detail relevant data on the TF and the reported TFBS. Mapping between reported data and a reference genome in the NCBI RefSeq database is established and used to validate the precise location of reported sites and their regulatory effects on genes.
Data in CollecTF is periodically submitted to the NCBI RefSeq database, where it is integrated into the annotation of complete genome sequences. CollecTF entries are annotated as /protein_bind features within GenBank format files. These features contain a link to the full CollecTF record, as well as direct links to the PubMed articles providing experimental support for the TFBS.
- Kiliç, S et al. (2014) CollecTF: a database of experimentally validated transcription factor-binding sites in Bacteria. Nucleic Acids Res. 42 D156-60 PubMed
[link CollecTF] URL:http://collectf.umbc.edu