Talk:ATCC 8739

From EcoliWiki
Jump to: navigation, search

When I started looking at the draft assembly of what was then being called Escherichia coli B (accession AAWW00000000) I had my doubts about its identiifcation. When the GenBank entry was updated, stating: "Originally submitted as Escherichia coli B, but later determined to be Escherichia coli C (ATCC 8739)," I emailed Lonnie Ingram and Paul Richardson, outlining my concerns: In essence, ATCC 8739 is E. coli Crooks, not E. coli C. I did some digging, and found nothing to suggest that this is E. coli C as most of the world knows it.

(1) E. coli C was (and is) widely used in phage work.

The earliest citation I can find is G. Bertani & J. J. Weigle (1953) Host controlled variation in bacterial viruses. J Bacteriol 65, 113-121, where it is described as "E. coli, strain C (no. 122 of the National Collection of Type Cultures, London)". It is still in that collection: http://www.hpacultures.org.uk/products/bacteria/detail.jsp?refId=NCTC+122&collection=nctc

ATCC has it as ATCC 23461: http://www.atcc.org/ATCCAdvancedCatalogSearch/ProductDetails/tabid/452/Default.aspx?ATCCNum=23461&Template=bacteria (they actually have several deposits of E. coli C, all identified as bacteriophage hosts: ATCC 12141, ATCC 13706, and ATCC 23461).

(2) E. coli Crooks is one I hadn't heard of, but upon digging I find it in the literature with a number of variations of that name, including Crookes strain, strain Crookes, Crook's, Crookes', Crookes' strain, and "the well-known Crookes strain" ... While ATCC calls it Crooks: < http://www.atcc.org/ATCCAdvancedCatalogSearch/ProductDetails/tabid/452/Default.aspx?ATCCNum=8739&Template=bacteria NBRC in Japan calls it Crookes, and they are the same strain: http://www.nbrc.nite.go.jp/NBRC2/NBRCCatalogueDetailServlet?ID=NBRC&CAT=00003972 The earliest citation I can find is I. C. Gunsalus & D.B. Hand (1941) The use of bacteria in the chemical determination of total vitamin C. J Biol Chem 141, 853-858, where it is described as "A strain [of Bacterium coli] designated as 'Crookes,' originally presented to the authors through the kindness of Dr. Esselen". Later Gunsalus' wife referred to "Escherichia coli, strains B and Crooks's" in C. F. Gunsalus & J. Tonzetich (1952) Transaminases for pyridoxamine and purines. Nature 170, 162.

(3) Escherichia coli C strains can grow on ribitol and D-arabitol, and the genetic loci responsible for pentitol catabolism in E. coli C, designated rtl and atl, are in GenBank with accession numbers AY005817 and AF378082. These sequences do not seem to be present in AAWW00000000.

(4) The nucleotide sequence of the rpoD gene from Escherichia coli C was previously determined (GenBank Accession U23083), and it had a 30-bp deletion relative to that of E. coli K-12. AAWW00000000 lacks this deletion. (5) In M. J. Buettner, E. Spitz, & H. V. Rickenberg (1973) Cyclic adenosine 3',5'-monophosphate in Escherichia coli. J Bacteriol 114, 1068-1073 it is stated that E. coli Crookes strain is deficient for cAMP phophodiesterase. A check using the E. coli K-12 cpdA sequence as a query against the AAWW00000000 sequences did indicate a 3 bp deletion, although I don't know what the impact on activity would be:

Query = EG12187 cpdA (828 nt) >WIS_EcoB_DRAFTv1_ctg121 Length = 427812

Score = 1550 bits (782), Expect = 0.0 Identities = 817/828 (98%), Gaps = 3/828 (0%)
Strand = Plus / Plus


Query: 1 ttggaaagcctgttaacccttcctctggctggtgaggccagagtcaggattttacaaatt 60

             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156218 ttggaaagcctgttaacccttcctctggctggtgaggccagagtcaggattttacaaatt 156277


Query: 61 accgacactcacctgtttgcacaaaagcacgaagccctgttaggggtaaacacctgggag 120

             ||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156278 accgatactcacctgtttgcacaaaagcacgaagccctgttaggggtaaacacctgggaa 156337


Query: 121 agttaccaggcggtgctggaggcgattcggccacaccagcacgaattcgacctgattgtc 180

             || |||||||||||||||||||||||||| ||||||||||||||||||||||||||||||

Sbjct: 156338 agctaccaggcggtgctggaggcgattcgtccacaccagcacgaattcgacctgattgtc 156397


Query: 181 gcgacaggtgatttagcgcaggatcaatcctctgcggcctatcagcatttcgctgaaggc 240

             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156398 gcgacaggtgatttagcgcaggatcaatcctctgcggcctatcagcatttcgctgaaggc 156457


Query: 241 atcgcaagttttcgtgcgccctgcgtctggctgccgggcaaccacgatttccagcccgcg 300

             ||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||

Sbjct: 156458 atcgcaagttttcgtgcgccctgcgtctggctgcctggcaaccacgatttccagcccgcg 156517


Query: 301 atgtacagcgcgttacaggatgcgggtatctccccggcgaagcgcgtgtttattggtgag 360

             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156518 atgtacagcgcgttacaggatgcgggtatctccccggcgaagcgcgtgtttattggtgag 156577


Query: 361 caatggcaaatcctgttgctggatagccaggtgtttggcgtgccgcacggtgagctgagc 420

             |||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||

Sbjct: 156578 caatggcaaatcctgttgctggatagccaggtgtttggcgtgccgcacggtgagctaagc 156637 Query: 421 gagtttcagcttgagtggctggaacgtaaactggccgatgcgccagaacgccatacgtTG 480

             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156638 gagtttcagcttgagtggctggaacgtaaactggccgatgcgccagaacgccatacgt-- 156695 Query: 481 Ctgctgctgcatcatcatccgctacctgcgggttgtagttggctcgatcaacacagtctg 540

              |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156696 -tgctgctgcatcatcatccgctacctgcgggttgtagttggctcgatcaacacagtctg 156754


Query: 541 cgtaacgcgggcgaactggataccgtgctggcgaagtttccgcacgtcaaatacttgctg 600

             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156755 cgtaacgcgggcgaactggataccgtgctggcgaagtttccgcacgtcaaatacttgctg 156814


Query: 601 tgcggtcatattcatcaggagctggatctcgactggaatggtcgccgcctgctggcaacg 660

             |||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||

Sbjct: 156815 tgcggtcatattcatcaggagctggatctcgactggaacggtcgccgcctgctggcaacg 156874


Query: 661 ccgtcgacctgtgtgcagtttaagccgcactgttccaactttacgctggataccatcgcg 720

             |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||

Sbjct: 156875 ccgtcgacctgtgtgcagtttaagccgcactgttccaactttacgctggacaccatcgcg 156934


Query: 721 cccggctggcgtactctcgagttacatgctgatggcacgctgaccaccgaggtgcatcgc 780

             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156935 cccggctggcgtactctcgagttacatgctgatggcacgctgaccaccgaggtgcatcgc 156994


Query: 781 ctggcggacacacgtttccaacctgataccgcttcagaaggctactga 828

             ||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 156995 ctggcggacacacgtttccaacctgataccgcttcagaaggctactga 157042

======

A couple of weeks later, having not heard from Lonnie or Paul, I sent a copy of the above information to Bob Landick, since the GLBRC was interested. He replied with the following bon mot: "The saga continues for a strain variously identified as E coli B (Ohta et al., 1991), E. coli W (Jarboe & Ingram, 2007), and E. coli C (ATCC#8739) Š Who'd have thunk it so hard to identify a strain in the age of HT-sequencing."

======

A couple of months later the completed genome was released (GenBank CP000946), so I repeated the analyses I had made of the WGS draft assembly. All my prior conclusions still held, and I thought that the sequence was quite possibly Escherichia coli str. Crooks, but not Escherichia coli str. C. I contacted JGI via a web page form, and Cc'ed NCBI Genomes. That seemed to light a fire, and I heard (finally) from Lynne Goodwin at JGI as well as Lonnie. They confirmed that the initial identification of the sequenced strain as Escherichia coli B was an error, that any pointers to that identification (such as "Culture collection: ATCC 11303") should be ignored, and that the identification as ATCC 8739 is thought to be correct.

But this still begs the question of how "ATCC 8739" was correlated with "Escherichia coli C" when the evidence suggests otherwise. Nowhere other than GenBank entry CP000946 (and JGI's site) is that connection made -- it is not supported by ATCC, nor by anything I can find in the literature.

I pointed out that names like "Escherichia coli B" and "Escherichia coli C" and "Escherichia coli K-12" mean something to people in microbiology/genetics/molecular biology, and should not be applied to other strains of E. coli -- it results in confusion when one tries to correlate older data with newly sequenced genomes. Another example would be E. coli MRE600, which was used for a lot of RNA work -- it was not derived from E. coli K-12 (nor B, nor C, nor O157:H7, nor ...), and thus any extrapolation from a published E. coli sequence to MRE600 must be made with caution. One of the immediate take-home lessons from the wealth of E. coli sequence data is that the organism we call "Escherichia coli" is comprised of a diverse set of variations on a theme. Investigating that variation is confounded by misidentification of the sequenced representatives. I suggested that this strain would more correctly be called something along the lines of Escherichia coli str. Crooks (ATCC 8739), assuming that was what was actually sequenced.

Lonnie replied: "Thank you for your helpful comments and interest. We did not intend to mis name or create a new name. The safest solution will be to simply refer to this strain by ATCC number. This should minimize further confusion. I will certainly do this in further publications and add a sentence of clarification."

As of today, JGI http://genome.jgi-psf.org/finished_microbes/escco/escco.info.html is still calling it Escherichia coli C str. ATCC 8739 But NCBI is now calling it Escherichia coli ATCC 8739 uniformly GenBank: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=169752989 Genome Project: http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=18083 Taxonomy: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=481805 RefSeq: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=ShowDetailView&TermToSearch=22063


Dr. Guy Plunkett III Senior Scientist, Laboratory of Genetics & Genome Center Senior Curator, ERIC (Enteropathogen Resource Integration Center) ASAP Annotation Curator, E. coli Genome Project

Laboratory of Genetics voice: (608) 890-0189 University of Wisconsin fax: (608) 262-2976 425G Henry Mall Rm 5428 email: guy@genome.wisc.edu Madison, WI 53706-1580 web: http://www.genome.wisc.edu/information/gplunkett.html

            "Life is too short, and DNA too long."
             -- Michael Crichton, Jurassic Park

ERIC: http://www.ericbrc.org/ ASAP: https://asap.ahabs.wisc.edu/annotation/php/logon.php EcGP: http://www.genome.wisc.edu/