PMID:3612791

Citation	Berg, OG and von Hippel, PH (1987) Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193:723-50
Abstract	We present a statistical-mechanical selection theory for the sequence analysis of a set of specific DNA regulatory sites that makes it possible to predict the relationship between individual base-pair choices in the site and specific activity (affinity). The theory is based on the assumption that specific DNA sequences have been selected to conform to some requirement for protein binding (or activity), and that all sequences that can fulfil this requirement are equally likely to occur. In most cases, the number of specific DNA sequences that are known for a certain DNA-binding protein is very small, and we discuss in detail the small-sample uncertainties that this leads to. When applied to the binding sites for cro repressor in phage lambda, the theory can predict, from the sequence statistics alone, their rank order binding affinities in reasonable agreement with measured values. However, the statistical uncertainty generated by such a small sample (only 6 sites known) limits the result to order-of-magnitude comparisons. When applied to the much larger sample of Escherichia coli promoter sequences, the theory predicts the correlation between in vitro activity (k2KB values) and homology score (closeness to the consensus sequence) observed by Mulligan et al. (1984). The analysis of base-pair frequencies in the promoter sample is consistent with the assumption that base-pairs at different positions in the sites contribute independently to the specific activity, except in a few marginal cases that are discussed. When the promoter sites are ordered according to predicted activities, they seem to conform to the Gaussian distribution that results from a requirement for maximal sequence variability within the constraint of providing a certain average activity. The theory allows us to compare the number of specific sites with a certain activity to the number that would be expected from random occurrence in the genome. While strong promoters are "overspecified", in the sense that their probability of random occurrence is very low, random sequences with weak promoter-like properties are expected to occur in very large numbers. This leads to the conclusion that functional specificity is based on other properties in addition to primary sequence recognition; some possibilities are discussed. Finally, we show that the sequence information, as defined by Schneider et al. (1986), can be used directly (at least in the case of equilibrium binding sites) to estimate the number of protein molecules that are specifically bound at random "pseudosites" in the genome.(ABSTRACT TRUNCATED AT 400 WORDS)
Links	PubMed
Keywords	Base Composition; Base Sequence; Binding Sites; Biological Evolution; DNA; DNA-Binding Proteins; Models, Genetic; Operator Regions, Genetic; Promoter Regions, Genetic; Repressor Proteins; Statistics as Topic; Viral Proteins; Viral Regulatory and Accessory Proteins
edit table

Significance

You can help EcoliWiki by summarizing why this paper is useful

Useful Materials and Methods

You can help Ecoliwiki by describing the useful materials (strains, plasmids, antibodies, etc) described in this paper.

Annotations

EcoliWiki Links

AlignACE
Category:Bioinformatics_Tools
Category:Databases
Category:DNA-binding_site_Databases
Category:E._coli_Databases

References

See Help:References for how to manage references in EcoliWiki.

PMID:3612791

Contents

Significance

Useful Materials and Methods

Annotations

EcoliWiki Links

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Local Resources

Other Resources

Tools