Previous Article | Next Article ![]()
Journal of Bacteriology, March 2003, p. 2017-2021, Vol. 185, No. 6
0021-9193/03/$08.00+0 DOI: 10.1128/JB.185.6.2017-2021.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Timothy E. Thate, and Nancy L. Craig*
Howard Hughes Medical Institute, Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland
Received 5 July 2002/ Accepted 10 December 2002
|
|
|---|
|
|
|---|
The original E. coli K-12 strain was isolated in 1922 (3). A K-12 derivative called MG1655, which was cured of an endogenous lambda phage by UV induction and cured of its conjugal plasmid by growth in the presence of acridine orange, was sequenced and is the basis of a commercially available whole-genome array (7). The MG1655 derivative MC4100 was constructed over 25 years ago when it was used to isolate gene and protein fusions to the lacZ gene product, ß-galactosidase, through the use of bacteriophage derivatives (8). MC4100 provided an important host in early gene expression work (reviewed in reference 5). We chose to use one of the newest tools designed for genome-wide gene expression analysis to analyze this strain used in some of the earliest expression analysis experiments. Well beyond providing a proof-in-principle for mapping deletion endpoints, the accurate description of MC4100 is important for the modern understanding of E. coli. For largely historical reasons, MC4100 has been the strain background of choice for many genetic experiments. The genetic nature of MC4100 will continue to be important in relating work with this strain to the sequenced MG1655 K-12 strain. Additionally, the genetic nature of MC4100 will help investigators to decide if MC4100 is an appropriate strain for a given experiment.
The original K-12 strain went through numerous alterations, including X-ray, UV, and chemical mutagenesis, as well as being a genetic recipient in multiple crosses with various E. coli K-12 and E. coli B derivatives to generate the strain MC4100. While the physical analysis of the chromosome by pulsed-field gel electrophoresis identified three deletions, the actual extent of these deletions has remained unclear (10, 14-16).
We used the E. coli whole-genome array to identify the deletion endpoints in a related strain derivative. The Sigma/Genosys Panorama E. coli array consists of PCR-amplified DNA products corresponding to the 4,290 open reading frames of strain MG1655 applied as duplicate 10-ng spots on a nylon membrane. Our strategy was to identify deletions in a non-MG1655 E. coli strain by isolating and radioactively labeling chromosomal DNA from MC4100 and probing the E. coli MG1655 gene array. As described below, most genes were found to be present based on a ratio of the intensities of 1 (MC4100 intensity/MG1655 intensity). Low ratios indicate putative deletions that could be confirmed by PCR-mediated sequence analysis of the region.
To study MC4100 [F- araD139
(argF-lac)U169 rspL150 relA1 flbB5301 fruA25 deoC1 ptsF25], we used a valine-resistant derivative called NLC28 (12). Given the near-total identity of the strains, the derivative is referred to as MC4100 throughout the text. The MC4100 strain was subsequently obtained from the E. coli genetic stock center, and PCR analysis indicated that the deletions were the same as those in NLC28.
Chromosomal DNA was isolated from strains grown in Luria broth, treated with RNase, subjected to phenol-chloroform treatment, and suspended in Tris-EDTA, pH 8.0 (2, 19). DNA was sheared to 1-kb fragments by sonication and radioactively labeled with [
-32P]dCTP by random DNA labeling (Roche), and unincorporated nucleotides were removed with Sephadex G-50 nick spin columns (Pharmacia). Panorama E. coli gene arrays (Sigma/Genosys) were probed overnight with 33 ng of DNA (
25 million cpm) at 65°C. The array blots were washed and probed according to the manufacturer's recommendations in a hybridization volume of 15 ml. The arrays were visualized by phosphorimaging with the Molecular Dynamics Storm system, and the data were assembled with ArrayVision 6.0 software (Imaging Research Inc.).
Multiple open reading frame deletions can be identified in the K-12 derivative MC4100. Deletions were identified in MC4100 by calculating a ratio of the normalized spot intensity from MC4100 to MG1655, i.e., MC4100 normalized intensity/MG1655 normalized intensity. Spot intensity was normalized by dividing the background corrected intensity by the overall average intensity for the blot. If a gene is found in both MC4100 and MG1655, the value should equal 1. When the MC4100/MG1655 ratios were plotted in gene order, we found that most values were about 1 (Fig. 1). However, we found over 100 MG1655 open reading frames that appeared to be missing in MC4100 based on low intensity ratios. Examination of the ratios plotted in gene order pinpointed the three multiple open reading frame deletions already known in MC4100 and showed their extent, ykfD-b0350, b1137-mcrA, and fruB-yeiR (Fig. 1; Table 1). The array data also suggested a previously unknown single open reading frame deletion in the fim genes that was confirmed by PCR (see below).
![]() View larger version (38K): [in a new window] |
FIG. 1. Probing the MG1655 genome array with labeled chromosomal DNA from MC4100 provides a sensitive mechanism for detecting deletions. Deletions were identified in MC4100 by calculating a ratio of the change in spot intensity from results found with MG1655, i.e., MC4100 intensity/MG1655 intensity. Array data were exported from ArrayVision into MS Excel for analysis. Duplicate open reading frame values were averaged to give 4,290 data points. Data were normalized within each blot by dividing the intensity of each open reading frame by the average spot intensity for all 4,290 of the individual open reading frames on the blot. For analysis, intensity ratios were calculated and graphed; the intensity of each normalized spot from MC4100 was determined as a ratio to MG1655. Each strain was blotted twice from DNA from separate labeling reactions. Four separate ratios were calculated from the two trials of each strain, and these values were averaged and used to assign putative deletions. The MG1655 sequence version M52 and a list of all of the MG1655 open reading frames were downloaded from the E. coli Genome Project at the University of WisconsinMadison. The y axis shows the ratio of the intensity of the 4,290 MG1655 open reading frames listed in gene order for the 100-min chromosome on the x axis. If a gene is found in both MC4100 and MG1655, the value should equal 1. The reference lines indicate 1.5 standard deviations below the mean and the inverse of this value above the mean. PCR amplification and sequencing confirmed the ykfD-b0350, b1137-mcrA, fruB-yeiR, and fimB deletions. PCR amplification indicates that yaiE, yccE, and acs (shown in parentheses) are still present in MC4100 compared to the MG1655 genome sequence.
|
|
View this table: [in a new window] |
TABLE 1. Deletions in MC4100
|
Interdigitated within the open reading frames that were shown to be missing by PCR were some open reading frames with ratios that were close to the value expected for genes that are present, e.g., within 1 standard deviation of the mean. These "false-positive" values are most easily explained by technical reasons such as cross-hybridization with similar genes in the genome, although we cannot rule out the unlikely possibility that certain genes moved to new positions in the chromosome. Homologous sequences could come from a variety of sources and allow cross-hybridization sufficient for the false positives identified within the deletions. The six IS1 elements found in MG1655 are all on the array, making the missing IS1 elements within the MC4100 deletions appear to be present. Paralogs could also allow cross-hybridization to give false-positive values within deletions: for example, the putative permease YagG (b0270), which appeared to be present within one of the deletions, could cross-hybridize with seven other known and putative permeases found on the MG1655 array, ranging from 25 to 45% identity across the genes. Because cross-hybridization could also stem from contributions from many different open reading frames found throughout the chromosome, a one-for-one documentation comparing each false positive with another open reading frame is not possible.
A deletion of the b1137-mcrA genes indicates that MC4100 lacks the e14 element found in MG1655. e14 is a genetic element that can move into and out of the chromosome and is found in a specific location in the E. coli K-12 genome. Multiple putative missing open reading frame values grouped in the region of the e14 element found in MG1655 (6) (b1137-mcrA in Fig. 1 and Table 1). Amplification and sequencing with PCR primers specific to genes flanking the region (icdA and b1160) confirmed that 15,203 bp including the e14 element were missing in MC4100 (Table 1). The loss of e14 is further supported by the observation that MC4100 does not possess the restriction and modification system normally ascribed to e14 (17; M. Sibley and L. Raleigh, personal communication).
It remains unclear how the e14 element was lost. The e14 element could have excised and been lost, or the intervening region could have been lost by host-mediated homologous recombination between the 166-bp near-perfect direct repeats that flanked the element in MG1655. DNA sequencing indicates that the near-perfect repeat in mcrA was maintained and not the repeat found in the icdA gene.
A deletion of the fruB-yeiR genes likely accounts for the fruA25 allele of MC4100. Multiple putative missing open reading frames fell in consecutive genes including a portion of the fruBKA operon encoding functions for fructose transport and catabolism (fruB-yeiR in Fig. 1 and Table 1). A deletion had previously been suggested to be associated with the fruA25 allele of MC4100 (15). PCR primers were designed for the open reading frames flanking the putative deletion, fruK and b2174, and sequencing of the resulting PCR product confirmed a 6,678-bp deletion by comparison to the MG1655 genome sequence. The deletion removes all of fruB and the first of 29 amino acids of the coding region from the 312-amino-acid FruK protein. Therefore, the fruA25 allele actually leaves the fruA gene intact but removes the fruBKA promoter.
The genome array can detect a single open reading frame deletion. Inspection of the array data indicated a few single open reading frames that gave low ratio values, yaiE, yccE, acs, and fimB (Fig. 1). Using PCR, we confirmed that fimB, the open reading frame which gave the lowest ratio value, did indeed have a deletion: DNA sequencing indicated that a 1,018-bp deletion removed 533 bp of the fimB gene along with 5 bp of the adjacent fimE gene. The fimB-fimE deletion was associated with an IS1 insertion. IS1 insertion has previously been shown to sometimes be associated with the deletion of adjacent DNA sequence (21). Our ability to detect the fimB deletion indicates that using chromosomal DNA to probe whole-genome blots provides a sensitive tool for detecting deletions that are less than a kilobase in size that include a portion of an open reading frame. Of additional significance, we found that by detecting a deletion in fimB we were able to identify the insertion of foreign DNA sequences. This suggests that array techniques may also be utilized for the detection of heterologous sequences such as pathogenicity and fitness islands. Unlike techniques such as restriction fragment length polymorphism, this array technique could identify newly inserted DNA even if there was not a net change in the size of a given region. Because the 1,018-bp deletion of the MG1655 sequence was replaced with 767 bp of heterologous IS1 DNA sequence, a net physical drop of only 251 bp was realized; a 251-bp deletion would likely be missed with most restriction mapping techniques involving rare cutting enzymes.
There were some single open reading frames that gave high ratios of MC4100/MG1655 intensity (Fig. 1). There are multiple possible explanations for high ratio values. Overrepresented values could indicate that the strain in question contains more copies of these open reading frames than does MG1655. While it seems unlikely, we cannot formally rule out the chance that they result from amplification of single open reading frames. Amplification of certain genes can occur when a gene is under strong selection (1, 22). Additionally, it is formally possible that these genes were deleted in the isolate of MG1655 that we obtained from the American Type Culture Collection.
Experiments with MC4100 suggest the limits of the Panorama array for detecting deletions. PCR with primers flanking yaiE, yccE, and acs resulted in the same sizes of fragments for the strains MG1655 and MC4100, indicating that no deletion of these genes occurred. It is unclear why some open reading frames display low ratio values. Knowing the lowest MC4100/MG1655 ratios where no deletion really exists is important in extending this array technology to other non-MG1655 E. coli strains. Using the array technology to predict putative missing genes in unsequenced E. coli strains would require establishing an appropriate threshold for ratio values. The threshold limit for assigning putative deletions should err on the side of having some false negatives to minimize the chance of missing deletions. Based on results with MC4100, a threshold of 1.5 standard deviations below the mean seems appropriate, because this is just within the minimum value obtained with the three false-negative open reading frames yaiE, yccE, and acs (Fig. 1). This threshold would introduce an extra three open reading frames as potential false negatives.
Analysis of the deletion endpoint identified by PCR also suggests the portion of a gene that must be missing to register by the array technique (Table 2). Our results suggest that deletions that remove only a portion of a gene can be detected. However, very small deletions leaving 80% of an open reading frame intact will appear to be present. Because the results are calculated as a ratio of signals in comparing a tester strain to MG1655, the percentage of the open reading frame that is missing, and not the actual number of base pairs, is important. Therefore, smaller deletions will be detectable in small open reading frames that might go missing in larger open reading frames.
|
View this table: [in a new window] |
TABLE 2. How much of a gene must be missing to register as missinga
|
|
View this table: [in a new window] |
TABLE 3. Genes deleted in MC4100 by functional class
|
We thank Mary Berlyn, Lise Raleigh, and Marion Sibley for providing strains and information. We thank the members of the Craig lab for comments on the manuscript.
Present address: Department of Microbiology, Cornell University, Ithaca, New York 14853. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»