Previous Article | Next Article ![]()
Journal of Bacteriology, October 2006, p. 7297-7305, Vol. 188, No. 20
0021-9193/06/$08.00+0 doi:10.1128/JB.00664-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom,1 Infectious Disease Section and Research Service, Department of Medicine, Hines Veterans Affairs Hospital and Loyola University Stritch School of Medicine, Hines, Illinois 60141,2 Department of Veterinary Science and Microbiology, University of Arizona, Tucson, Arizona 85721,3 Centre for Food Safety, University College Dublin, Belfield, Dublin 4, Ireland,4 Anaerobe Reference Laboratory, NPHS Microbiology Cardiff, University Hospital of Wales, Cardiff CF14 4XW, United Kingdom,5 Bacterial Microarray Group, Medical Microbiology, Department of Cellular and Molecular Medicine, St. George's, University of London, Cranmer Terrace, London SW17 0RE, United Kingdom6
Received 10 May 2006/ Accepted 30 July 2006
|
|
|---|
|
|
|---|
C. difficile is known to produce a number of factors that contribute to its virulence, including two related exotoxins, toxin A (TcdA) and toxin B (TcdB), which are part of a 16-kb pathogenicity locus (PaLoc) where toxin production is negatively controlled by TcdC (30). A minority of strains produce a binary toxin (CdtA/CdtB), but its role in disease is unclear (10, 11). However, production of these toxins alone cannot explain C. difficile pathogenesis. In recent years, increasing numbers of strains have been reported from several countries with truncated versions of toxin A and/or toxin B (10, 31).
A plethora of techniques has been used to type C. difficile, many of which have confirmed the transmission of the organism in hospital environments (1). Commonly used methods are toxinotyping based upon variations in the PaLoc sequence (28), pulsed-field gel electrophoresis (PFGE) (9), PCR ribotyping (26) and restriction endonuclease analysis (REA) (4). These methods have generally been efficient at grouping strains and in particular have been used to distinguish the recently emerged hypervirulent strains as toxinotype III, North American PFGE type 1, REA group BI, or PCR ribotype 027 (generally referred to as BI/NAP1/027) (20, 24, 32). However, these methods have limited discriminatory potential to elucidate the phylogenetic relationships of all strains in a given study. For example, the discriminatory power of PCR ribotyping is not absolute; ribotype 001, the most commonly occurring ribotype in humans, can be subtyped by PFGE (8), and 20 distinct BI group types have been found by REA (24). Additionally, traditional typing systems do not provide information on the genes/genetic loci specific to strains from different sources.
Microarray technology, allied to complex mathematical analysis to determine phylogeny, has provided a sensitive and robust method to examine the genetic relatedness of bacterial populations (2, 6). The genetic relationships described by Bayesian phylogeny of a DNA-DNA microarray data set can then be correlated with the known phenotype and ecological behavior of each bacterial strain in the analysis; this is particularly useful in studying the epidemiology and host association of pathogens (6, 16). Comparative genomic DNA microarray analysis has been used to investigate several bacterial species in relation to pathogenesis and host specificity. Comparison of strains isolated from different hosts as well as virulent and avirulent strains can reveal predicted coding DNA sequences (CDSs) that may be important for virulence, pathogen-host interactions, and transmission (2, 6, 16). To date, microarray analysis of defined cohorts of strains to determine genetic relatedness has not been undertaken for C. difficile.
In this study, we carried out whole-genome analysis of 75 well-characterized isolates of C. difficile from humans with a range of disease outcomes and from several animal sources, using a whole-genome microarray based on the recently sequenced genome of C. difficile 630. Combining DNA microarray data with sensitive Bayesian-based algorithms has yielded new insights into the population structure of C. difficile, revealing information on the evolution and origin of the pathogen as well as several potential determinants of survival and virulence.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. C. difficile strains
|
60°C, an amplicon size range from 50 to 800 bp, and an optimum size of 600 bp. Selection was based on BLASTN analysis of the PCR products against genes; all 10 PCR products for each target sequence were compared to the sequence of each gene in the gene pool, and the longest product with the least similarity (or no similarity) to any other sequence in the gene pool was selected. This approach maximizes sensitivity and minimizes cross-hybridizations. Additionally, multiple reporters were designed to some genes, including eight for tcdA, seven for tcdB, three for cdtA, four for cdtB, and two for each gene involved in S-layer formation. Amplification of microarray reporter elements. PCR primers were synthesized by MWG Biotech (Ebersberg, Germany) and supplied in a 96-well format to enable high-throughput amplification using a liquid handling and PCR amplification robot (RoboAmp 9600; MWG Biotech). PCRs were performed with 10 ng DNA template, 5 U HotStar Taq DNA polymerase (QIAGEN), 0.5 µM primers, 1.5 mM MgCl2, and 200 mM deoxynucleoside triphosphates. Thermocycling was performed using denaturation of 95°C for 15 min, 40 cycles of 95°C for 1 min, 52°C for 1 min, and 72°C for 1 min, followed by a final extension of 72°C for 5 min. Subsequent rounds of PCR amplification with modified conditions were performed until a single product of predicted size was obtained for all genes that were not amplified under standard conditions. Additional validation was undertaken by sequencing 5% of the amplified genes. Microarrays were constructed by robotic spotting of the PCR products in duplicate on UltraGAPS aminosilane-coated glass slides (Corning), using MicroGrid II (BioRobotics, United Kingdom) (14). The microarrays were postprint processed according to the slide manufacturer's instructions, using hydration and UV irradiation, and stored in a dark, dust-free environment.
Hybridizations. Hybridizations were performed as previously described (7, 13, 16) with 2 to 3 µg of test genomic DNA labeled with Cy3-dCTP and 2 µg Cy5-dCTP with labeled C. difficile 630 genomic DNA as a common reference for all hybridizations. Microarray slides were prehybridized in 3.5x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate), 0.1% sodium dodecyl sulfate (SDS), and 10 mg/ml bovine serum albumin at 65°C for 20 min before a wash in distilled water for 1 min and a subsequent wash for 1 min in isopropanol. Test strain-labeled DNA was mixed with reference strain-labeled DNA, purified using a MiniElute kit (QIAGEN), denatured at 95°C, and mixed to achieve a final volume of 23 µl hybridization solution of 4x SSC and 0.3% SDS. Using a 22- by 22-mm LifterSlips (Eyrie Scientific), a microarray was hybridized overnight, sealed in a humidified hybridization chamber (Telechem International), and immersed in a water bath at 65°C for 16 to 20 h. Slides were washed once in 400 ml 1x SSC and 0.06% SDS at 65°C for 2 min and twice in 400 ml 0.06x SSC for 2 min. Microarrays were scanned using a 418 array scanner (Affymetrix) and intensity fluorescence data acquired using BlueFuse (BlueGnome). Test strains were hybridized up to three times on microarrays that have duplicate sets of reporters representing the C. difficile genome.
Microarray data analysis and comparative phylogenomics. Data were initially processed and normalized using GeneSpring 7.2 (Silicon Genetics). Values below 0.01 were set to 0.01. The measured intensity for each CDS was divided by its control channel value in each sample; if the control channel was below 0.01, then 0.01 was used instead. If both the control channel and the signal channel were below 0.01, then no data were reported. Data were divided by the 50th percentile of all genes that had a raw measurement above 0.01 and were not flagged as low confidence (P < 0.1). The designation of CDSs in each strain as present, divergent, or absent was determined by the use of GACK software (16). GACK calculated an estimated probability of presence (EPP) value for each gene. A gene was designated present if it had a calculated EPP of 100%, divergent if it had an EPP between 0% and 100%, and deleted if it had an EPP of 0%. 0% EPP indicates a 0% chance of being falsely assigned as a divergent gene, and 100% EPP indicates a minimum assurance that a gene was present (19). The GACK output for all genes was used for phylogeny inference calculated using a Bayesian phylogenetic algorithm (MrBayes v3.1.1, http://mrbayes.CSIT.FSU.EDU). MrBayes requires binary data so divergent genes were reclassified as present. The Bayesian model used four-chain Markov chain Monte Carlo and 16-category gamma distribution with 1 million iterations with a heat of 0.7 as described previously (2). Phylogenetic trees were sampled every 40th iteration, and tree structure convergence was statistically assessed across all potential phylogenies (except an initial 10,000 tree burn-in). The final (1,000,000th) trees produced by separate runs were statistically assessed for convergence. Phylogeny inference was based on a conservative estimation of gene loss.
Microarray data accession numbers. Fully annotated microarray data have been deposited in BµG@Sbase (accession number E-BUGS-41; http://bugs.sgul.ac.uk/E-BUGS-41) and also ArrayExpress (accession number E-BUGS-41).
|
|
|---|
![]() View larger version (25K): [in a new window] |
FIG. 1. Phylogenetic relationship of strains associated with different clinical outcomes and animal sources represented as four major clades (HY, A B+, HA1, and HA2). Strains are designated at the end of the branches and are colored according to the animal source from which the C. difficile strain was isolated. Black, human; blue, mouse; green, bovine; red, swine; light blue, equine. Branches with ** have a P value of 1.0 and represent 100% of all phylogenies showing a given topology. * indicates a P value of 0.98.
|
The genomes of strains in the hypervirulent clade characteristically had a number of deletions compared to those of strain 630, with the exception of BI-9, which appears in clade HA1. BI-9 does not have a characteristic apparent deletion at the end of tcdB, specific to the hypervirulent strains and previously unreported (Fig. 2). Alternatively, substantial divergence in gene sequence can result in loss of hybridization signal and therefore appear as a deletion on the microarray. The microarray results may indicate a novel 3' end for tcdB in these strains. Interestingly, the hypervirulent strains have been described as high expressors of toxins A and B. This has been ascribed to a point mutation in tcdC (24). However, this would not be detected by the microarrays used in this study. Table 2). shows the genes absent from all hypervirulent strains (by GACK and McClade analyses), with the exception of BI-14 (HY outlier) and BI-9 (HA1). Given the recent recognition that gene loss or "black holes" may contribute to increased virulence in pathogens (pathoadaptation) (5), these deletions may be significant in terms of the increased virulence of these strains and therefore are worthy of further investigation.
![]() View larger version (92K): [in a new window] |
FIG. 2. Selected gene map on toxin PaLoc (tcdD, tcdB, tcdE, tcdA, and tcdC). A horizontal bar indicates array competitive genomic hybridization of a single strain, and a vertical color bar represents the presence (yellow lines) or absence/high divergence (blue lines) of each gene from CD0659 (tcdD), on the left, to CD0664 (tcdC), on the right. In the clade blocks, dark blue represents strains in the A B+ clade, light blue represents strains in the HY clade, yellow represents strains in the HA1 clade, and red represents strains in the HA2 clade.
|
|
View this table: [in a new window] |
TABLE 2. HY-specific deletions
|
Toxin-defective clade. All 14 A B+ strains grouped in a tight subclade that was part of a larger clade that included seven other strains, with a subclade of A B strains (M3, M13, and M23) and four animal isolates that were more distantly related. The A B+ strains were from outbreaks of CDAD in Ireland, the United Kingdom, and the United States, again confirming the wide geographical distribution of an epidemic C. difficile clone. Similar observations have been made when other collections of A B+ strains have been examined by independent typing methods, such as MLST (22). The hypervirulent and A B+ isolates cluster into two independent highly homogeneous phylogenetic lineages. Taken together, these results suggest a low genetic diversity of the hypervirulent and A B+ variant strains and of the wide geographical spread of these lineages. Also, all 14 A B+ strains have a version of CTn5 that lacks CD1864. Table 3 shows a list of genes absent from all A B+ strains except strain CF5.
|
View this table: [in a new window] |
TABLE 3. A B+-specific deletions (except strain CF5)
|
Genes/genomic islands that relate to niche adaptation and potential virulence. Whole-genome comparisons of all 75 strains revealed several loci that are deleted or are highly divergent in several strains that could be important in niche adaptation and potential virulence (Fig. 3). Among these are flagellin-related genes that are likely to be important in motility (Fig. 4). In the 630 genome, two loci encode potential flagellum-associated proteins (CD0226-CD0240 and CD0245-CD0271), between which lies a third interflagellar locus of four genes of unknown function (CD0241-CD0244). All A B+ strains have retained the three flagellum-associated loci (excluding CD0252-CD0255). The full gene complement was retained in only 7 of the other 62 strains, including three murine (JGS355, JGS356, and JGS360), two equine (JGS692 and JGS6047), and two human (CD1 and T7) strains (Fig. 4). All other strains have lost the first locus (CD0226-CD0240) and interflagellar locus (CD0241-CD0244). All three loci relating to potential flagellin biosynthesis are absent in HA2 strains. These observations on the flagellin gene complements in C. difficile suggest that motility and chemotaxis are unlikely to be essential in the survival and virulence of the organism in the human host.
![]() View larger version (116K): [in a new window] |
FIG. 3. Whole-genome analysis of all 75 strains. A vertical color bar indicates array competitive genomic hybridization of a single strain, and a horizontal line represents the presence (yellow lines) or absence/high divergence (blue lines) of each gene from CD0001 (top) to CD3679 (bottom). Selected genomic islands of interest are labeled at the sides. In the clade blocks, dark blue represents strains in the A B+ clade, light blue represents strains in the HY clade, yellow represents strains in the HA1 clade, and red represents strains in the HA2 clade.
|
![]() View larger version (125K): [in a new window] |
FIG. 4. Selected gene map on flagellin-associated genes. A vertical color bar indicates array competitive genomic hybridization of a single strain, and a horizontal line represents the presence (yellow lines) or absence/high divergence (blue lines) of each gene from CD0226 (top) to CD0271 (bottom). F1 indicates flagellar loci CD0226-CD0240, F2 indicates interflagellar loci CD0241-CD0244, and F3 indicates flagellar loci CD0245-CD0271. In the clade blocks, dark blue represents strains in the A B+ clade, light blue represents strains in the HY clade, yellow represents strains in the HA1 clade, and red represents strains in the HA2 clade.
|
Antibiotic resistance-related genes. C. difficile 630 contains 36 potential drug resistance-associated genes, the majority of which are common to all strains tested. However, gene absence generally falls into specific clades. Lantibiotic resistance loci CD0643-CD0646 and CD1349-CD1352 were absent exclusively from all HA2 strains. The putative antibiotic resistance ABC transporter gene that encodes daunorubicin resistance (CD0456) was absent from all HA2 strains and the majority of HA1 strains (except B-one, K14, and JGS692). However, it was present in all hypervirulent strains, all A B+ strains, and four A B strains (CD1, M3, M13, and M23). A candidate streptogramin A acetyltransferase (CD2226) that may encode streptogramin resistance A was present in all strains except BI-9, the outlier in the hypervirulent clade.
Toxin-related genes. Surprisingly, DNA from all A B+ strains failed to hybridize with the first two tcdB reporters but did hybridize with all eight tcdA reporters (Fig. 2). Analysis of published tcdB sequence from CF2 (A B+) (29) identifies numerous point mutations in the region of the first two tcdB reporters. Therefore, the tcdB apparent deletions may be due to sequence divergence beyond the specificity of the microarray. Interestingly, CF2 tcdB was virtually identical to tcdB from C. difficile strain 8864, which has been described as having a 5' end similar to that of the toxin gene of Clostridium sordellii (3). The explanation for why the A B+ strains have apparent intact toxin A genes is unknown. However, all eight strains that were classified as A B clearly lacked evidence for toxin B (all reporters nonhybridizing) and toxin A (first five reporters nonhybridizing) (Fig. 2). A B strains are represented in three of the four clades, suggesting that the absence of toxins is not a feature of clonality and that the PaLoc can readily be lost.
Core gene set. Using only genes designated present (EPP of 100%), an unusually low core gene content of 19.7% was derived for all 75 strains. Table 4 gives estimates of the core gene set for all of the strains represented in the respective clades. These core genes encode mainly metabolic, biosynthetic, cellular, and regulatory processes. However, many potential virulence determinants are also conserved, indicating that they are indispensable for C. difficile to cause disease in humans. These included genes that are likely to encode a capsule (CD2769-CD2780), a type IV pilus (CD3294-CD3297 and CD3503-CD3513), and fibronectin binding proteins (CD1304 and CD2592).
|
View this table: [in a new window] |
TABLE 4. Conserved reportersa
|
Given the emergence of hypervirulent strains, the continued use of broad-spectrum antibiotics (including fluoroquinolones), and the rising numbers of immunocompromised and elderly patients, the incidence of CDAD is unlikely to recede. This study is the first genomic microarray comparison of multiple C. difficile strains and, through Bayesian-based algorithms, was able to group strains into four independent clades. This method has also identified many genetic loci that contribute to the formation of each clade, thereby identifying several potential determinants that may help to explain niche adaptation and the differences in pathogenicity observed.
The U.S. Department of Veterans Affairs Research Service provided support for D.N.G. This work was supported by a Medical Research Council grant to B.W.W.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»