ABSTRACT
The Bacillus cereus group of bacteria is a group of closely related species that are of medical and economic relevance, including B. anthracis, B. cereus, and B. thuringiensis. Bacteria from the Bacillus cereus group encode three large, highly conserved genes of unknown function (named crdA, crdB, and crdC) that are composed of 16 to 35 copies of a repeated domain of 132 amino acids at the protein level. Bioinformatic analysis revealed that there is a phylogenetic bias in the genomic distribution of these genes and that strains harboring all three large genes mainly belong to cluster III of the B. cereus group phylogenetic tree. The evolutionary history of the three large genes implicates gain, loss, duplication, internal deletion, and lateral transfer. Furthermore, we show that the transcription of previously identified antisense open reading frames in crdB is simultaneously regulated with its host gene throughout the life cycle in vitro, with the highest expression being at the onset of sporulation. In B. anthracis, different combinations of double- and triple-knockout mutants of the three large genes displayed slower and less efficient sporulation processes than the parental strain. Altogether, the functional studies suggest an involvement of these three large genes in the sporulation process.
INTRODUCTION
The Bacillus cereus group of bacteria comprises Bacillus cereus, B. anthracis, B. thuringiensis, B. weihenstephanensis, B. mycoides, and B. pseudomycoides. These species are ubiquitous in the environment and are of economic and medical importance. B. cereus is an opportunistic human pathogen that can cause food poisoning and diverse types of infections (3, 9, 29, 50). While B. anthracis, a mammalian pathogen, is the causative agent of anthrax (35), B. thuringiensis is pathogenic to insects and commonly used as a biopesticide, although it can also cause infections in humans (16, 23, 48). B. cereus sensu lato bacteria are genetically and genomically closely related and have been suggested to represent variants of the same species on the basis of genetic and genomic evidence (21, 44). While their chromosomes are similar and syntenic, the different pathogenic properties and host specificities are often determined by plasmids (28, 44, 45). Altogether, the B. cereus group population has been well studied and can be divided into seven major phylogenetic clusters that correlate with ecological and toxicological properties and in which strains from the various “species” are intermixed (18, 19, 54).
B. cereus sensu lato organisms are Gram-positive, rod-shaped bacteria that produce endospores under specific stresses or when nutrients become limiting. Spores are dormant structures that can resist harsh conditions, such as heat, desiccation, chemicals, or radiation, and that can survive in the environment for up to decades (41). Spores are highly organized structures made up of multiple protein components arranged in several layers that are assembled following a complex, highly regulated developmental process involving a large number of genes (22). For B. anthracis, the spore can enter the host via the skin or by inhalation or ingestion and then germinates inside the host into vegetative cells that produce the anthrax toxin components and the capsule (8, 35). Diarrheal disease caused by certain B. cereus strains may occur when spores germinate in the intestinal tract (50). In previous DNA microarray studies, five waves of gene expression have been defined by statistical clustering in the B. anthracis life cycle in vitro (2, 31). Three of the largest and most highly conserved genes in the B. cereus group, BT9727_1474, BT9727_3414, and BT9727_3305 (B. thuringiensis subsp. konkukian 97-27; B. anthracis orthologs BA1618, BA3721-5, and BA3601, respectively, hereby named conserved repeat domain proteins CrdA, CrdB, and CrdC), have been reported to be upregulated in the 4th (BA1618, BA3723) and/or 5th (BA3601, BA3721-BA3722, BA3724-BA3725) wave of gene expression (2). Genes upregulated in those waves are postulated to be involved in spore-associated processes. Furthermore, CrdC (named Crd by Read and colleagues [37]) was shown to react with anti-B. anthracis spore rabbit immune sera, indicative of its presence in the spores. Additionally, two putative antisense open reading frames (ORFs), ORFX and ORFY, that are located on the reverse strand of BCE3690-BCE3693, a group II intron-interrupted ortholog of crdB in B. cereus ATCC 10987, have previously been identified (55).
The aim of the present study was to investigate these three large genes at the transcriptional and functional levels with respect to their impact on sporulation. In addition, extensive bioinformatic and phylogenetic analyses were performed to gain further insight into their evolutionary history. Knockout mutants of the three genes were constructed in an attenuated B. anthracis strain and analyzed with regard to their influence on sporulation efficiency.
MATERIALS AND METHODS
Sequence homology searches and alignment.Orthologs of each of the three large genes crdA, crdB, and crdC of B. thuringiensis subsp. konkukian 97-27 in 85 available B. cereus group genomes were taken from the Integrated Microbial Genomes (IMG) database (33). In the IMG database, orthologs are defined by reciprocal best hits between the protein sequences in BLASTP (1) analyses. Missing orthologs due to frameshifts or other mutations were identified by TBLASTN searches (all default parameters except no filtering for low-complexity regions [-F F option] were used). The repetitive domain structures of CrdA, CrdB, and CrdC and the domain boundaries were identified by comparing a given protein sequence against itself in a dot plot using the SEAVIEW4 program (17) and manually aligning the repeated regions. Domains of all three proteins from B. thuringiensis subsp. konkukian 97-27 were aligned using the MUSCLE (version 3.6) program (11), followed by manual adjustments. This multiple-sequence alignment (including a total of 70 domains and 151 amino acids [aa] in length) was then used to build a profile hidden Markov model (HMM) to search the UniProt database (60) for proteins containing homologous domains. The programs hmmbuild and hmmsearch from the HMMER3 software package (10) were used for building the HMM and searching the UniProt database with the HMM, respectively (using default parameters). The -A option of hmmsearch was used to generate an alignment of the matching domain sequences. A similar analysis was conducted using a multiple-sequence alignment of the N termini of CrdA, CrdB, and CrdC from all sequenced B. cereus group strains (including a total of 140 sequences and 354 aa in length) to search the UniProt database for proteins containing homologous termini.
Phylogenetic analyses.A total of 5,469 amino acid sequences showing full-length matches to the 132-aa repeated domain of CrdA, CrdB, and CrdC were selected from the multiple-sequence alignment generated by hmmsearch. Sequences covering at least 100 of the 132 residues were defined as full-length domains. This set of aligned sequences was then used for reconstructing a phylogenetic tree of all domain sequences. The tree was built in the SEAVIEW program using the BioNJ method (15) applied to a matrix of pairwise distances among all sequences. Distances were computed as the percentage of amino acid differences (observed distance), and positions with gaps (insertions and deletions) were removed specifically for each pairwise comparison. In addition, using the same method, trees were built for each specific 132-aa domain copy in CrdA, CrdB, and CrdC on the basis of a MUSCLE program alignment of only the sequences of the domains in the B. cereus group. Coloring of trees according to protein, domain, or phylogenetic cluster was done using the TreeDyn program (5).
A whole-genome phylogeny of the 85 sequenced B. cereus group strains was reconstructed by phylogenomic analysis on the basis of the concatenation of chromosomal regions conserved in all strains (a total of 1,452,373 bp, gaps removed) as previously described (see the supplemental material in reference 53). Statistical support values for the groupings (internal branches) were assessed by 1,000 bootstrap replicates (13).
RNA isolation.B. cereus ATCC 10987 and B. thuringiensis subsp. konkukian 97-27 were grown on Luria-Bertani (LB) agar plates at 30°C. Single colonies were incubated overnight in 5 ml of brain heart infusion (BHI) medium (Difco) containing 5% glycerol and subsequently used to inoculate 500 ml modified growth medium (MGM) (27). Bacteria were then grown at 300 rpm and 37°C, and the optical density at 600 nm (OD600) was used to measure bacterial growth. Samples (5 to 20 ml) were taken from the bacterial culture in the mid-exponential (after 3 h), the early stationary (5 h), and late stationary (8 h and 24 h) phases for RNA isolation and mixed with an equal amount of ice-cold methanol, harvested by centrifugation (3,600 rpm, 4°C, 5 min), shock frozen in nitrogen, and stored at −80°C until further preparation. RNA extractions were conducted from three biological replicates (samples taken from independent cultures on different days).
Frozen samples were thawed on ice and resuspended in 1 ml RLT buffer (Qiagen) (with added β-mercaptoethanol), and the mixture was added to a Precellys tube containing 100 μl glass beads (VK01). The cells and spores were lysed twice at 5,200 rpm for 20 s each time using a bead beater (Precellys 24 instrument; Bertin Technologies). This type of technique has previously been used to lyse spores (26, 32, 36, 43). Following lysis, the mixture was centrifuged at 13,000 rpm for 1 min in a tabletop centrifuge (Biofuge pico; Heraeus Instruments) to remove beads and debris. After that the RNeasy minikit (Qiagen) procedure was followed. The RNA quality was assessed by measuring the ratio of the absorbance at 260 nm to the absorbance at 280 nm and by visualization on an RNA gel.
Real-time quantitative PCR (qPCR) and reverse transcription-PCR (RT-PCR).One microgram total RNA was used for the reverse transcriptase reaction (Superscript III; Invitrogen). The Superscript III protocol was followed and included a negative sample without added reverse transcriptase for each reaction. The first-strand cDNA synthesis with the gene-specific primers was carried out at 60°C. Gene-specific RT primers used are listed in Table S1. As the RT primer for the reference genes, 1 μl (50 μM) random hexamer (Applied Biosystems) was employed.
Antisense and sense primers (see Table S2 in the supplemental material) were designed using the primer3 software available at the Whitehead Institute for Biomedical research website (http://frodo.wi.mit.edu/primer3/), and the specificity of those primers was further assessed by matching them against the complete genome sequences.
qPCRs were performed on a LightCycler 480 real-time PCR system (Roche) in a 96-well microtiter plate format and a final volume of 20 μl using 2 μl cDNA (diluted 1:4), 10 μl LightCycler 480 SYBR green I Master (Roche), 2 μl each gene-specific forward and reverse primer, and 4 μl RNase/DNase-free water. The cycling conditions were as follows: 5 min of polymerase activation at 95°C, followed by 40 cycles of denaturation at 95°C for 10 s, annealing at 60°C for 30 s, and elongation at 72°C for 8 s. Genes for a GatB/YqeY domain-containing protein (BT9727_4045/BCE_4389) and UDP-N-acetylglucosamine 2-epimerase (BT9727_4878/BCE_5307) were used as reference genes for B. thuringiensis subsp. konkukian 97-27 and B. cereus ATCC 10987 (46). All samples and negative controls for every time point were amplified in duplicate.
For the RT-PCR, the RT reaction was carried out as described above. Then, 2 μl cDNA and the negative control were amplified by PCR with 1 μl each forward and reverse primer at 5 μM, 10 μM nucleotide mix (Invitrogen), and 1 U Dynazyme (Finnzymes) in a total reaction volume of 50 μl. PCR was run with a 2-min denaturation step at 94°C, followed by 40 cycles of a 30-s denaturation step at 94°C, a 30-s annealing step at 60°C, and a 30-s extension step at 72°C, followed by a final extension step of 7 min at 72°C (see Tables S1 and S2 in the supplemental material).
PCR screening of B. cereus group strains.A set of 61 strains from the Laboratory for Microbial Dynamics (LaMDa) collection at the University of Oslo was screened for the presence of crdA, crdB, and crdC orthologs in the strains' genomes. The strains chosen are from clinical/patient sources or originate from soil or dairy samples and belong to various phylogenetic clusters of the B. cereus group population. Using all B. cereus group genomes, primers for the two genes were designed such that they gave only one hit each in most strains (with 3 mismatches or less for primers of 20 bp or 4 mismatches or less for the longer primers of 24 bp) and multiple hits in as few strains as possible (see Table S3 in the supplemental material). These primers were used to amplify genomic DNA from all the tested strains. DNA isolation was performed using a genomic DNA kit (Qiagen). PCR was carried out using a denaturation program of 5 min at 94°C, followed by 32 cycles of 30 s denaturation at 94°C, 40 s annealing at 59°C, and 40 s elongation at 72°C. The final extension step was for 7 min at 72°C.
Bacillus anthracis colony PCR and PCR amplification.A single colony was picked and resuspended in 50 μl Milli-Q water, heated for 15 min at 100°C, and then centrifuged at 13,000 × g for 15 min in a tabletop centrifuge. Five microliters of the supernatant was subsequently used per PCR mixture. Alternatively, the PCR mixture was directly inoculated with a minimum of bacteria harvested from a colony with a toothpick. The PCR was performed using either Taq polymerase or High Fidelity Taq polymerase and a denaturation program of 5 min at 94°C, followed by 30 cycles of 30 s of denaturation at 94°C, 30 s of annealing at 50°C, and 30 s/kb or 1 min/kb of elongation at 72°C with the Taq polymerase or the High Fidelity Taq polymerase, respectively. The final extension step was for 10 min at 72°C.
Conjugation (mating procedure).Escherichia coli HB101 harboring the helper (mobilizing) plasmid pRK24 was used in conjugal transfer experiments.
The recombinant shuttle plasmids were transferred from E. coli to B. anthracis RPG1 or single or double mutants by conjugation. The filter mating system was used as previously described (42). For the selection of the transconjugants, the cells were resuspended and plated on selective medium containing appropriate antibiotics, depending on the gene-replacing cassette (kanamycin [Kan], 40 μg/ml; erythromycin [Erm], 5 μg/ml; spectinomycin [Spc], 100 μg/ml).
Construction of B. anthracis knockout strains.The cassettes used for constructing the mutants with deletions are a spectinomycin cassette obtained by digesting the pUC1318-Spec by SmaI, an erythromycin cassette obtained by digesting the pUC1318-Erm by SmaI, and a kanamycin cassette obtained by digesting pAT21 by ClaI (4, 39, 59).
The genes or gene fragments were amplified using the following primer couples: for BA_3725, 3725-am5′/3725-av3′; for BA_1618, 1618-am5′/1618-am3′ and 1618-av5′/1618-av3′; and for BA_3601, 3601-am5′/3601-am3′ and 3601-av5′/3601-av3′ (see Table S4 in the supplemental material).
The fragments were PCR amplified and inserted into pGEM-T Easy, giving rise to pABKL10, pGEM-1618am, pGEM-1618av, pGEM-3601am, and pGEM-3601av, respectively.
DNA fragments were extracted by digesting pGEM-1618av and pGEM-3601av by SmaI/SacI and SmaI/SphI, respectively, followed by a gel migration and purification of the DNA band using a Qiagen kit. These fragments were then inserted into pGEM-1618am and pGEM-3601am digested by SmaI/SacI and SmaI/SphI, yielding pGEM-1618am-av and pGEM-3601am-av, respectively.
The Spc and Erm cassettes were inserted in pABKL10 and pGEM-3601am-av digested by HindIII/EcoRV and blunted and digested by SmaI, respectively, yielding pGEMΔ3725 and pGEMΔ1618E, respectively. The deleted alleles were extracted from pGEMΔ3725 and pGEMΔ1618E after digestion by SacI and BstXI/SphI and then blunted and cloned into pAT113 and pBAK digested by SacI and SacI/SphI and blunted, giving rise to pAT113Δ3725 and pBAKΔ1618E, respectively (34, 57). pATΔS28-3601 was obtained by inserting the 3601am-av fragment, extracted from pGEM-3601am-av by an SphI-SacI digestion and blunted, into pATΔS28 digested by EcoRI and blunted. The Kan cassette was inserted into pATΔS28-3601 digested by SmaI, giving rise to p3601SK (40).
Each of the plasmids contains an upstream fragment and a downstream fragment of approximately 1 kb surrounding an antibiotic resistance cassette. Plasmid pAT113Δ3725 contains an upstream fragment ending 560 nucleotides (nt) after the stop codon of the ORF preceding crdB (the translation initiation codon of crdB is differentially defined by different annotators) and a downstream fragment starting 4,550 nt after the first GTG codon of the crdB ORF. Plasmid pBAKΔ1618E contains an upstream fragment ending 60 nt after the ATG translation initiation codon of crdA and a downstream fragment starting 14,980 nt after the ATG. Plasmid p3601SK contains an upstream fragment ending 35 nt after the stop codon of the ORF preceding crdC and a downstream fragment starting 140 nt after the stop codon of crdC.
The single mutant strains RabKL1 (from which crdB is deleted), RabKL2 (from which crdA is deleted), and RabKL3 (from which crdC is deleted) were obtained after conjugation of E. coli HB101(pRK24) harboring pAT113Δ3725, pBAKΔ1618E, and p3601SK, respectively, with the RPG1 strain and selection with the appropriate antibiotic (56, 58).
The RabKL12, RabKL13, and RabKL23 double mutant strains were obtained by conjugation between E. coli HB101(pRK24) harboring the inactivation plasmid pBAKΔ1618E and strain RabKL1 and p3601SK and strains RabKL1 and RabKL2, respectively.
The triple mutant RabKL123 was obtained by conjugation between E. coli HB101(pRK24) harboring the inactivation plasmid p3601SK and strain RabKL12.
PCR was applied to check for the correct insertion and the loss of the genes (see Table S4 in the supplemental material).
Determination of total bacterial and spore counts.Strains were streaked on BHI medium plates. After overnight growth at 37°C, 1 ml BHI medium was inoculated with one colony and incubated at 37°C for 20 min. For each strain, the OD600 was determined and 5 ml of BHI medium was inoculated at an OD of 0.2. Cultures were incubated at 37°C, and each day decimal dilutions, up to 10−8, were made in water and 1 ml of the non-heat-treated and heat-treated (65°C for 20 min) samples of all dilutions was spotted on BHI agar plates in order to calculate the amount of spores and the total bacterial count. The plates were incubated at 37°C for 16 to 24 h. Sporulation was also analyzed by phase-contrast light microscopy (magnification, ×40).
RESULTS
crdA, crdB, and crdC represent some of the largest genes found in the B. cereus group of bacteria, being 15,054, 7,563, and 7,473 bp long, respectively, in B. thuringiensis subsp. konkukian 97-27. They are located on the chromosome in the same position relative to flanking genes, and their respective sequences are highly conserved between strains within the B. cereus group (>90% identity at the amino acid level), whereas the overall similarity between the three gene products is weaker (38 to 40% amino acid identity). These genes code for proteins that are annotated as hypothetical proteins, repeat domain proteins, or cell surface proteins. Additionally, significant hits matching the bacterial adhesion domain defined in the Interpro database (Interpro identification no. IPR008966; http://www.ebi.ac.uk/interpro/ [25]) were previously reported (55) in orthologs of CrdB and were also obtained for CrdA. Furthermore, scanning of B. cereus group genomes with SPAAN software (47), which is specifically designed to identify adhesins and adhesin-like proteins, revealed that the three proteins are among the four or five proteins that usually give significant predictions (>95% confidence) for a given genome.
Comprehensive bioinformatic analysis identified a number of features of crdA, crdB, and crdC, which will be detailed in the sections below. Overall, the features cover three main aspects: (i) the proteins encoded by these genes contain repetitive copies of a 132-aa motif that is widespread in large proteins from prokaryotes, (ii) there is a differential distribution of the three genes in B. cereus group genomes, and (iii) the three genes have had a complex evolutionary history leading to a phylogeny different from that of the host strains.
A diverse set of large proteins with a common 132-amino-acid domain in Bacteria and Archaea.Amino acid sequence comparison and multiple-sequence alignment of CrdA, CrdB, and CrdC showed that these large proteins are made up of 35, 19, and 16 copies of a 132-aa homologous domain, respectively (average sequence identity, 40 to 50% among the domains within and between proteins; Fig. 1; see Fig. S1 in the supplemental material). The DUF11 (domain of unknown function; Pfam identifier PF01345) and TIGR01451 domains (which are largely overlapping), reported in the Pfam (http://www.pfam.org/) (14) and TIGRfam ([49]; http://www.tigr.org/TIGRFAMs/) databases, respectively, are part of the 132-aa domain, but they correspond to only a 53-aa N-terminal part of it. A profile HMM search of the UniProt sequence database using all copies of the whole 132-aa domain from CrdA, CrdB, and CrdC as queries revealed that this motif is part of a large set of proteins from phylogenetically diverse Bacteria and Archaea (553 matching proteins) (see Table S5 in the supplemental material). These proteins have different sizes, i.e., different numbers of domains, but are generally long (>1,000 aa), and their annotations are mainly putative cell surface or cell wall proteins, conserved repeat domain proteins, or uncharacterized proteins. This set also includes additional proteins from the B. cereus group (Fig. 2; see Table S5 in the supplemental material). Noticeably, the BCAH187_A4121 protein (present in 10 B. cereus group strains) corresponds to a single, stand-alone copy of the 132-aa domain. All the proteins encoding the 132-aa domain are usually not homologous to CrdA, CrdB, and CrdC over their entire lengths, as they also include unrelated regions.
Chromosomal organization and gene structure of crdA, crdB, and crdC. The sense genes crdA (A) (chromosomal location is shown with ×2 zoom), crdB (B), and crdC (C) are shown as blue arrows in the 5′ to 3′ direction. The sequences encoding the repeated domains (132 aa) are illustrated as black-outlined boxes. In crdA, discontinuity was introduced (12 domains were left out); crdA (BT9727_1474) is flanked by BT9727_1471 (hypothetical protein) and BT9727_1475 (acetyltransferase), crdB (BT9727_3414) is flanked by BT9727_3413 (hypothetical protein) and BT9727_3416 (cytochrome c defective protein), and crdC is flanked by BT9727_3304 (acetyltransferase) and BT9727_3306 (short-chain dehydrogenase). The locations of the antisense ORFs X and Y in crdB are represented as red and green arrows, respectively. A size bar for orientation is included.
Unrooted whole-genome phylogenetic tree of 85 sequenced B. cereus group strains. The names of the strains are color coded according to the presence of orthologs of the three large genes crdA, crdB, and crdC. In addition, for each isolate the presence of supplementary genes that carry one or several homologous copies of the common 132-aa domain is indicated by colored filled circles. The seven major clusters of the B. cereus group population are indicated by Roman numerals (I to VII). Although none of the 85 genomes are assigned to cluster II, the branching point of this cluster is indicated for reference with Table 1 (cluster II would emerge in the vicinity of the node circled with a dashed line, by extrapolation from the B. cereus group supertree available at the HyperCAT database [http://mlstoslo.uio.no]). The 19 B. anthracis isolates form a clonal complex and are represented here as a single lineage. The tree is based on the concatenation of the sequences conserved among the chromosomes of all strains (a total of 1,452,373 bp, with gaps removed). All nodes in the tree have a bootstrap support of >99%, on the basis of 1,000 replicates.
The domain organization of the CrdA, CrdB, and CrdC proteins is very compact, with usually 0 to 5 amino acid residues separating the domains. Interestingly, the N termini of CrdA and CrdC (i.e., the first ∼340 aa, or ∼1,000 nt) contain no repeated domains and are homologous to each other (48% amino acid sequence identity) (Fig. 1). In contrast, CrdB appears to have lost the N terminus during evolution (see below).
Biased distribution of crdA, crdB, and crdC in the sequenced B. cereus group genomes.A total of 85 genome sequences of the B. cereus group were available at the time of study. Interestingly, reconstruction of the whole-genome tree and examination of the presence of genes show that there is a phylogenetic bias in the distribution of the crdA, crdB, and crdC genes and their orthologs in the B. cereus group (Fig. 2). crdC and crdA are the most common genes out of the three, being found in 80 and 79 out of the 85 genomes, respectively. In contrast, crdB is present in only 46 genomes and is restricted (with two exceptions; see below) to strains belonging to cluster III of the B. cereus group population (18). This also means that the latter 46 genomes are the only ones that encode all three genes (Fig. 3). None of the three genes are found in the divergent B. pseudomycoides group (cluster I).
Venn diagram of the distribution of crdA, crdB, and crdC in 85 sequenced B. cereus group strains.
The occurrence pattern of the three genes has also been extended and confirmed by PCR screening. Whereas screening for the presence of orthologs of crdB has been carried out previously (55), in this study crdA and crdC have been targeted in 61 diverse strains. Similar to the tendency shown by the sequenced strains, nearly all the strains belonging to clusters III, IV, and VI in the B. cereus group population harbored these two genes. When taking into account the results from the previous screening for crdB, we can confirm that strains harboring all three genes are mostly restricted to cluster III, as shown by the sequenced strains. In clusters II (to which no sequenced genome is currently assigned) and V, strains possessed either one or two genes (Table 1).
PCR screening of B. cereus group strains for the presence of crdA and crdC
Complex phylogeny and evolutionary history of crdA, crdB, and crdC.Detailed bioinformatic analysis revealed that crdA, crdB, and crdC have had a complex evolutionary history, including incongruent phylogenies and several events of duplication, internal deletion, lateral transfer, and rearrangement, as will be described in the subsections below.
Phylogenetic analysis of the 132-aa domains.Phylogenetic analysis of all homologous copies of the 132-aa domain identified in UniProt revealed three main patterns: (i) for the orthologs of CrdA, CrdB, and CrdC from multiple B. cereus group strains, the domains of a given protein generally cluster together, i.e., are more closely related to each other than to domains of the other two proteins. However, there are a few exceptions, suggesting that there may have been exchanges of domains between the three large genes (see Fig. S2 in the supplemental material). (ii) When looking at the relationships of the domain sequences within a given protein, each specific domain copy forms its own cluster containing the corresponding copies from the various B. cereus group strains, and the clusters of domain copies are well separated by long branches in the tree (see Fig. S3 in the supplemental material). (iii) Comparing the phylogeny of the 132-aa domains with that of the strains indicates that the relationships between the domain sequences do not follow the strain relatedness (see Fig. S4 to S6 in the supplemental material).
Duplication of crdA in particular B. cereus group strains.Sequence comparisons and homology searches revealed that the gene encoding the longest of the three proteins, CrdA, is actually present in two highly similar copies (93% amino acid sequence identity) in a number of sequenced B. cereus group strains. In the B. cereus type strain, ATCC 14579, the two paralogs (BC2639/BC1592) can be found on the chromosome, whereas in the emetic B. cereus strains (F4810/72, H3081.97), B. cereus Q1, and B. thuringiensis subsp. kurstaki BMB171, one copy is located on the largest plasmid in each strain. Other instances of duplicated crdA genes have been detected in nine unfinished genomes. The strains encoding two copies of this gene are distributed sporadically in the B. cereus group tree, indicating that the acquisition of a second copy has occurred independently in multiple strains (Fig. 2).
Events of lateral transfer and rearrangement of crdB.Genomic data also provide some insights into the evolutionary history of the crdB (BT9727_3414) locus. All 46 sequenced B. cereus group strains that encode this gene also encode BT9727_3413, and most of them, with the exclusion of B. anthracis strains and close relatives, also carry BT9727_3412, a paralog of BT9727_3413 (see Fig. S7A in the supplemental material). The latter two genes are located immediately upstream of crdB in the chromosome but are oriented in the opposite DNA strand, are not found in any other strains, and encode hypothetical proteins. Remarkably, BT9727_3412 and BT9727_3413 are homologous and of the same size as the N termini of CrdA and CrdC (35 to 38% amino acid sequence identity). Profile HMM searches of UniProt revealed that similar N termini are harbored by 178 of the 553 bacterial proteins that contain the common 132-aa domain and are distributed throughout the phylogenetic tree of these proteins, suggesting that this terminus was probably present in these proteins before their evolutionary separation (see Fig. S2 and Table S5 in the supplemental material). Therefore, the organization of the BT9727_3412-4 locus suggests that BT9727_3412 or BT9727_3413 corresponds to the sequence encoding the N terminus of crdB that has been flipped and recoded as a separate gene during evolution, and this gene was then subsequently duplicated into two paralogs. All but 2 of the 46 sequenced strains that encode BT9727_3412-4 belong to cluster III in the B. cereus group phylogeny (Fig. 2). The phylogenies of the BT9727_3412 and BT9727_3413 sequences as well as those encoding the N termini of CrdA and CrdC are generally incongruent with the B. cereus group genomic tree (see Fig. S7B to E in the supplemental material). Taken together, this implies that the acquisition of the BT9727_3412-4 locus and these genomic rearrangement events must have occurred in the phylogenetic ancestor of cluster III strains and that BT9727_3412 was subsequently lost in the lineage leading to B. anthracis and its close relatives.
Two strains harboring BT9727_3412-4, B. cereus ATCC 10876 and B. thuringiensis subsp. pakistani T13001, are part of cluster IV (Fig. 2). Phylogenetic analysis indicates that the sequences of BT9727_3412, BT9727_3413, and crdB from strains ATCC 10876 and T13001 are closely related to those of cluster III and do not form a separate group (data not shown). This, together with the gene distribution data, suggests that these two strains have acquired the genomic region containing BT9727_3412-4 through a horizontal transfer from a cluster III strain. During the evolution of the locus, a subset of the strains also acquired a group II intron that has inserted in the reverse orientation within the crdB gene (55), and a 1-nt deletion has created a frameshift leading to a premature stop codon (at position 226) specific to the B. anthracis strains belonging to the A and B branches (55).
Internal deletion and frameshift mutation within crdC in B. anthracis strains.Another interesting observation is a deletion of 396 bp in the B. anthracis orthologs of crdC. This deletion, which is unique to the B. anthracis strains, matches exactly the size of the 132-aa conserved repeated domain at the amino acid level and therefore results in a protein that is not disrupted but lacks a domain. Remarkably, however, the deleted region does not correspond to the boundaries of any of the repeated domains but overlaps domains 10 and 11. The region is flanked on both sides by an identical sequence motif (which is present in non-B. anthracis strains as well), which could suggest that the deletion occurred by a specific recombination event at that motif in B. anthracis.
Transcriptional regulation of crdB sense and antisense ORFs in modified growth sporulation medium.The presence of antisense ORFs in the group II intron harboring the ortholog of crdB in B. cereus ATCC 10987 (55) and the previously published transcriptional analysis of the sense strand in the B. anthracis crdB ortholog (2, 31) prompted us to compare the transcriptional profiles of the sense and antisense transcripts in B. thuringiensis subsp. konkukian 97-27 and B. cereus ATCC 10987 throughout the life cycle, from vegetative cells to spores (see Fig. S8 in the supplemental material). We measured gene expression levels of both sense and antisense transcripts of crdB during growth of B. thuringiensis subsp. konkukian 97-27 in MGM by real-time qPCR using strand-specific primers. A continuous and strong upregulation of the transcription levels was observed between 3 h and 8 h (onset of sporulation) for both the sense (up to 3,000-fold) and antisense (up to 100-fold) transcripts (Fig. 4). At 24 h (nearly complete sporulation), sense and antisense mRNA levels were comparable to those in the samples at 5 h (Fig. 4). Compared to sense transcription levels, antisense transcription levels were generally lower (Fig. 4C). Whereas the sense transcript level was about half of the antisense mRNA level after 3 h of incubation of B. thuringiensis subsp. konkukian 97-27 in MGM, the ratio of the sense mRNA level/antisense mRNA level increased to about 100 at 8 h. At 24 h, this ratio was down to about 2.
Transcriptional regulation of crdB and its antisense ORFs in MGM during the complete life cycle of B. thuringiensis subsp. konkukian 97-27. The figure shows the transcriptional regulation of crdB (A) and its antisense transcript ORF Y (B) in MGM on the basis of real-time qPCR experiments after 3 h, 5 h, 8 h, and 24 h in culture, covering the bacterium's complete life cycle. Transcriptional regulation is reported as log2 fold change and related to the initial mRNA level at 3 h, which was set to 1. (C) Log2 relative ratio between the sense and antisense transcripts for each time point. Error bars indicate standard deviations.
Similar experiments were conducted with B. cereus ATCC 10987. Here, significant transcription was not detected either in the sense or in the antisense direction until 8 h of growth in MGM. At 8 h, however, similar amounts of sense and antisense mRNA were measured in B. cereus ATCC 10987 and B. thuringiensis subsp. konkukian 97-27.
Functional studies: sporulation assay in B. anthracis.To study the function of the three large genes, different combinations of knockout mutants were constructed for a B. anthracis strain, RPG1 (Table 2). Particular attention was given to the sporulation process, as these genes are most highly expressed at the onset of sporulation, as shown by microarray and real-time qPCR experiments (data not shown) (2). A B. anthracis strain was chosen for the experiments since the generation of knockout mutants in both B. thuringiensis subsp. konkukian 97-27 and B. cereus ATCC 10987 is difficult. For each of the three genes, most of the coding sequence was deleted (see Materials and Methods).
B. anthracis strains and constructed knockout mutants of the three large genes
Before we compared the kinetics of sporulation, growth curves of parental and mutant strains were acquired in BHI medium for at least 24 h and showed no difference in growth between the parental and the mutant strains (see Fig. S9 in the supplemental material). Subsequently, sporulation kinetics and efficiencies for the parental and the mutant strains were studied by light microscopy for 10 days and sporulation efficiency tests (at 26, 50, and 74 h after inoculation). For the parental and the single mutant strains, the amount of spores was in the same order of magnitude as the total number of CFU at day 2 (50 h after inoculation), indicating nearly full sporulation (Table 3; see Fig. S11 in the supplemental material). In contrast to the wild-type and single mutant strains, spore counts and microscopy analysis indicated that neither the double mutants (except RabKL12) nor the triple mutant strain sporulated fully in the course of the study (Table 3; see Fig. S11 in the supplemental material). The sporulation yield of the double and triple mutants was 2 to 3 orders of magnitude lower than the total amount of CFU, meaning that below 1% of the viable cells at day 1 (26 h after inoculation) had sporulated (except strain RabKL12). In addition to the lack of sporulation, we observed a dramatic decrease in the number of CFU from nonheated samples, which reflects the total bacterial count. These findings suggest a loss of cell viability of the bacteria that did not sporulate. Taken together, the multiple mutant strains showed a decrease in both sporulation kinetics and efficiency compared to the wild-type strain.
Total bacterial and spore counts of B. anthracis wild-type and knockout strains grown in BHI medium
DISCUSSION
Genomic distribution and evolutionary history of the three large genes crdA, crdB, and crdC.In this paper we have conducted bioinformatic and functional analyses of three of the largest genes found in the bacteria of the B. cereus group, crdA, crdB, and crdC. We have shown that these genes are made up of repetitive copies of a common homologous domain constituting 132 amino acids at the protein level (Fig. 1; see Fig. S1 in the supplemental material) and that this domain is in fact found in hundreds of generally long proteins from diverse Bacteria and Archaea (see Table S5 in the supplemental material). Interestingly, as shown by phylogenetic analysis of the 132-aa-domain sequences, the domains from each of the CrdA, CrdB, and CrdC proteins in the B. cereus group generally cluster together and there is little intermixing of the domains in the tree (see Fig. S2 and S3 in the supplemental material). These phylogenetic results, together with the compact repetitive nature of the three proteins, indicate that each of these large genes has evolved by successive duplication, with little recombination between genes and no recombination between different domain copies within genes. This, added to the fact that for each protein the number of domain copies is the same in virtually all B. cereus group strains, suggests that the domain organization and phylogenetic separation of these genes occurred before the divergence of the B. cereus group strains that harbor them. Even though there are indications that some recombination events have occurred between genes, the clustering of the domain sequences also suggests that these events have taken place before the separation of the strains. Remarkably, whereas there has been no exchange of different domains within a given gene, the fact that for every domain copy, as well as for the N-terminal sequences, the phylogeny of the domain sequences does not follow that of the B. cereus group strains suggests that there have been extensive copy-specific recombination events and/or convergent evolution (i.e., events occur between copies x and between copies y of different strains, but not between copies x and y; see Fig. S4 to S7 in the supplemental material).
A remarkable feature of the three genes and their paralogs is that they are distributed in specific phylogenetic clusters within the B. cereus group population (Fig. 2). Even though the biological significance of this distribution bias is not known, it indicates that genes have been acquired at different times in evolution. In fact, from the genomic distribution of the three genes, it appears that their evolutionary history in the B. cereus group has been somewhat flexible or complex (Fig. 2). In addition to acquisition of genes in specific clusters, there have been events of gene loss in individual strains, lateral gene transfers, gene rearrangements, independent intragenomic duplications, and internal gene mutations or deletions.
Potential function of the three large genes.Little is known about the biological function of the three large genes so far, apart from the fact that during the life cycle of B. anthracis in vitro they are upregulated, with the highest expression levels being in waves of gene expression enriched in genes that are involved in sporulation (2, 31), and the fact that crdC is translated and reacts against anti-B. anthracis spore sera (37). Our real-time qPCR transcriptional experiments confirmed that the three genes are most highly expressed at the onset of sporulation in the late stationary phase of growth (8-h time point) also in B. thuringiensis subsp. konkukian 97-27 (see results for crdB in Fig. 4; data not shown). Moreover, using the Virtual Footprint tool of the PRODORIC database (http://prodoric.tu-bs.de/) (38), we found matches to the SpoIIID binding boxes of B. subtilis (12, 20, 24, 30, 52, 62) in the upstream regions of all three genes (see Fig. S10 in the supplemental material). In B. subtilis and likely also in other endospore formers such as bacilli and clostridia, where it is highly conserved, SpoIIID is one of the three main transcriptional regulators that govern gene expression in the mother cell during sporulation (22). On the basis of these observations and to further study the involvement of the three genes in the sporulation process, we conducted knockout studies in the B. anthracis RPG1 strain.
In sporulation assays conducted in BHI medium, single-gene mutants showed a slight delay in sporulation (Table 3; see Fig. S11 in the supplemental material). In contrast, the knocking out of two (crdC and crdA or crdB) or all three genes resulted in considerably stronger negative effects on sporulation kinetics, with less than 1% sporulation, as shown in the sporulation efficiency test (Table 3). Furthermore, even after 10 days of inoculation, the double mutant RabKL13 hardly sporulates and the triple mutant RabKL123 lacks released spores, as visible in Fig. S11 in the supplemental material. Taken together, these results suggest that there is probably some functional redundancy between these genes, the consequence being that any one of them can be deleted, but the deletion of more than one has consequences on sporulation.
So what might be the function and the localization of these three gene products? When all three genes are knocked out, sporulation still takes place. This indicates that the presence of these genes is not absolutely essential to the sporulation process, correlating well with our observation that strains from the divergent cluster I (B. pseudomycoides) naturally lacking the three genes crdA, crdB, and crdC sporulate in LB medium (data not shown). Most proteins that have been shown to be directly involved in either assembly or production of a subset of spore coat components are small, like SpoIVA or CotE (6, 51, 63), whereas CrdA, CrdB, and CrdC are large repetitive proteins. Interestingly, two proteins (P29a and P29b) consisting of duplicated copies of the common 132-aa domain are part of the appendages at the surface of Clostridium taeniosporium spores (61). On the other hand, related proteins containing numerous copies of this domain can be found not only in Gram-positive spore-forming bacterial species like Clostridium or Bacillus but also in very diverse nonsporulating species like Methanothermobacter, Enterococcus, or Vibrio (see Table S5 in the supplemental material). This type of protein is thus not specific for sporulation and may be used for different purposes in different organisms. It has previously been suggested that such repeat domain proteins might be well suited for macrostructure assembly due to their ability to bind more molecules of surrounding proteins than would structural proteins without duplicated structures (61). It could be that these proteins are commonly used to build extracellular matrices or structures at the surface of the cell or the spore in various organisms.
The loci encoding crdA, crdB, and crdC are complex. As shown previously, the crdB locus harbors two ORFs that are positioned in the opposite orientation (55) and that were shown in this study to be transcribed and simultaneously regulated with the host crdB gene at the onset of sporulation in the late stationary phase (Fig. 4; see Fig. S8 in the supplemental material). Antisense ORFs could also be identified in crdA and crdC, and their transcription was experimentally confirmed (data not shown; see Fig. S10 in the supplemental material). Furthermore, an SpoIIID binding box is present upstream of all antisense ORFs, as shown in Fig. S10 in the supplemental material. Further work is needed to determine the precise functions of crdA, crdB, and crdC and their antisense transcripts in the sporulation process and to understand whether and how the sense and antisense transcripts are functionally or mechanistically linked.
ACKNOWLEDGMENTS
We thank M. Moya and P. Sylvestre (Institut Pasteur) and Simen M. Kristoffersen (University of Oslo) for helpful discussions and A. Guichoux and J. Hugues (Institut Pasteur) and Ewa Jaroszewicz (University of Oslo) for technical help.
This work was supported by grants from the Norwegian Research Council (grant 146534 and Functional Genomics FUGE II). S.D. was supported by the DGA.
FOOTNOTES
- Received 17 May 2011.
- Accepted 25 July 2011.
- Accepted manuscript posted online 5 August 2011.
† Supplemental material for this article may be found at http://dx.doi.org/10.1128/JB.05309-11.
- Copyright © 2011, American Society for Microbiology. All Rights Reserved.