PMID:3612791
Citation |
Berg, OG and von Hippel, PH (1987) Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193:723-50 |
---|---|
Abstract |
We present a statistical-mechanical selection theory for the sequence analysis of a set of specific DNA regulatory sites that makes it possible to predict the relationship between individual base-pair choices in the site and specific activity (affinity). The theory is based on the assumption that specific DNA sequences have been selected to conform to some requirement for protein binding (or activity), and that all sequences that can fulfil this requirement are equally likely to occur. In most cases, the number of specific DNA sequences that are known for a certain DNA-binding protein is very small, and we discuss in detail the small-sample uncertainties that this leads to. When applied to the binding sites for cro repressor in phage lambda, the theory can predict, from the sequence statistics alone, their rank order binding affinities in reasonable agreement with measured values. However, the statistical uncertainty generated by such a small sample (only 6 sites known) limits the result to order-of-magnitude comparisons. When applied to the much larger sample of Escherichia coli promoter sequences, the theory predicts the correlation between in vitro activity (k2KB values) and homology score (closeness to the consensus sequence) observed by Mulligan et al. (1984). The analysis of base-pair frequencies in the promoter sample is consistent with the assumption that base-pairs at different positions in the sites contribute independently to the specific activity, except in a few marginal cases that are discussed. When the promoter sites are ordered according to predicted activities, they seem to conform to the Gaussian distribution that results from a requirement for maximal sequence variability within the constraint of providing a certain average activity. The theory allows us to compare the number of specific sites with a certain activity to the number that would be expected from random occurrence in the genome. While strong promoters are "overspecified", in the sense that their probability of random occurrence is very low, random sequences with weak promoter-like properties are expected to occur in very large numbers. This leads to the conclusion that functional specificity is based on other properties in addition to primary sequence recognition; some possibilities are discussed. Finally, we show that the sequence information, as defined by Schneider et al. (1986), can be used directly (at least in the case of equilibrium binding sites) to estimate the number of protein molecules that are specifically bound at random "pseudosites" in the genome.(ABSTRACT TRUNCATED AT 400 WORDS) |
Links | |
Keywords |
Base Composition; Base Sequence; Binding Sites; Biological Evolution; DNA; DNA-Binding Proteins; Models, Genetic; Operator Regions, Genetic; Promoter Regions, Genetic; Repressor Proteins; Statistics as Topic; Viral Proteins; Viral Regulatory and Accessory Proteins |
edit table |
Significance
You can help EcoliWiki by summarizing why this paper is useful
Useful Materials and Methods
You can help Ecoliwiki by describing the useful materials (strains, plasmids, antibodies, etc) described in this paper.
Annotations
<annotationlinks/>
EcoliWiki Links
AlignACE
Category:Bioinformatics_Tools
Category:Databases
Category:DNA-binding_site_Databases
Category:E._coli_Databases
References
See Help:References for how to manage references in EcoliWiki.