| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Olga Vinnere,
Alex Mira,
Dirk Repsilber,
Kristina Näslund, and
Siv G. E. Andersson*
Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University, Uppsala 752 36, Sweden
Received 5 April 2006/ Accepted 8 August 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Species of the genus Bartonella are particularly attractive for studies of pathogen genome diversity, niche adaptation, and virulence because of their unique ecological characteristics: the approximately 20 known species have colonized a wide range of animal hosts and insect vectors (8, 11, 16, 68). More than 10 species and subspecies have been isolated from human clinical samples (14), but humans serve as the natural reservoir for only two of these: Bartonella quintana, the louse-borne agent of trench fever, and Bartonella bacilliformis, the agent of Carrion's disease complex. Bartonella henselae, which is closely related to B. quintana, naturally infects cats but may incidentally infect humans and cause a variety of symptoms including cat scratch disease (CSD) and bacillary angiomatosis. For both species, disease symptoms depend on the immune status of the host, and a variety of vasoproliferative disorders of the skin and the internal organs may develop following infections in immunocompromised individuals (37, 56).
A very low level of genotypic heterogeneity was found for a collection of B. quintana strains, in line with a recent emergence of this bacterium as a human pathogen (25). The B. henselae strains are divided into two 16S rRNA gene (rrs) genotypes (type I/Houston-1 and type II/Marseille) which correspond to two distinct human serotypes (19, 42). Both genotypes are present worldwide, but the Marseille genotype appears to be dominant in the European cat population, whereas the Houston-1 genotype is more common in Asia (7). However, the relative prevalence of the two genotypes in felines also shows regional variations (23, 31, 33).
Several studies have suggested that the Houston-1 genotype is overrepresented among human samples, as observed, for example, in Australia (17), Germany (59, 60), and The Netherlands (6). However, a study of isolates from CSD patients in France showed no overrepresentation of the Houston-1 strain (69). A more detailed characterization of Australian isolates by multilocus sequence typing (MLST) revealed at least seven distinct genotypes in the feline population, only a few of which were recovered among the human samples, suggesting that some sequence types may be predisposed to cause human infections (35). Hence, it was hypothesized that a few specific genotypes contribute disproportionately to the disease burden in humans (17, 35).
It has also been debated whether the Houston-1 genotype is more pathogenic for humans than the Marseille variant (59, 69). A recent comparison of individuals infected with the Marseille and Houston-1 isolates revealed no difference in clinical characteristics (69). However, another study of human immunodeficiency virus (HIV)-infected patients suggested that hepatosplenic vascular proliferative lesions are more common in patients with infections of the genotype I strains (12). Small sample sizes taken from geographically isolated regions may explain some of the discrepancies in the results. Another limitation is that the use of typing methods, such as MLST, provides no information about the variability of potential virulence traits in the B. henselae population.
The availability of genome sequence data for the human pathogen B. quintana and the human Houston-1 strain of B. henselae (2) now enables the genomic diversity for the species as a whole to be explored. As in other human pathogens, such as Rickettsia prowazekii (5, 53) and Bordetella pertussis (54), the emergence of B. quintana as a human pathogen has been associated with accelerated rates of sequence loss rather than with unique gene acquisitions (2). Thus, only remnants of a B. henselae prophage of 55 kb and genomic islands (GEIs) of 72, 34, and 9 kb could be identified in the B. quintana genome (2). The extent of sequence loss in the so-called chromosome II-like segment, which may have been integrated from an extra replicon and has a coding content of only 40 to 50%, was also more pronounced in B. quintana. A recent study of the gene content in Bartonella koehlerae, a feline-associated species with a phylogenetic placement near B. henselae, revealed a partial retention of GEIs, indicative of independent deletion events in divergent lineages (43).
The aim of this study was to examine the genomic variability of B. henselae strains and place it in the context of the population structure of the species. We describe gene content and genome structure variation for 38 B. henselae strains, including both feline and human isolates. We hypothesize that the observed genomic variation, which includes deletions and rearrangements in areas that encompass the GEIs, serves an important role in host colonization and immune evasion.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
Analysis of MLST data. The sequences were assembled and edited with Phred, Phrap, and Consed (21, 22, 28) and compared to the known sequence types using BLAST (3). Alleles were named as described in reference 35, and polymorphic sites were numbered according to their position in the corresponding gene of the sequenced Houston-1 genome, since some of the nucleotide numberings in reference 35 did not agree with the GenBank entries to which they were supposed to refer. For each isolate, the combination of alleles at each of the loci examined was used to define the sequence type (ST), according to the methods in reference 35. For gltA, only one of the polymorphic sites (C1026T) reported in reference 35 for allele 2 was observed, the other (G648A) being located outside of the amplified region; the observed allele was nevertheless considered as allele 2. In addition, for nlpD, the polymorphism reported previously (35) (G1453A) was apparently mistyped, since 36 of the 37 strains had an A at the corresponding position (allele 1) while one isolate had a G (allele 2). For visualization of the MLST data, neighbor joining of the aligned concatenated allele sequences was performed using the PHYLIP programs dnadist, neighbor, seqboot, and consense (24) with 1,000 bootstrap replicates.
Microarray comparative genome hybridization (CGH). Array probes (PCR products) were generated as reported previously (43) and spotted in six replicates (by Niclas Olsson at Uppsala University, Uppsala, Sweden) or three replicates (by Annelie Waldén at the Royal Institute of Technology, Stockholm, Sweden). Slides were UV cross-linked at 30 mJ/cm2 (slides scanned with ScanArray) or 250 mJ/cm2 (slides scanned with GenePix). DNA labeling was performed as previously reported (43), except that for the 5- and 10-day comparisons only 1 µg genomic DNA was used. Hybridization of labeled test strain and reference DNA was performed at 50°C overnight in hybridization solution containing 3x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) (ScanArray slides) or 5x SSC (GenePix slides), 0.1% sodium dodecyl sulfate, and 0.1 mg/ml sonicated salmon sperm DNA. Two or three hybridizations were performed for each strain.
Genomic DNA from the Houston-1Ref subculture was used as the reference in the microarray hybridizations against genomic DNA from the 38 B. henselae strains as well as in hybridizations against DNA isolated from different subcultures of the Houston-1 strain. These hybridizations consistently gave low M values for a 10-kb region located at kb 1060 and 1070 of the sequenced genome. PCR and sequencing revealed a 10-kb tandem duplication flanked by a short repeated sequence in the Houston-1Ref strain. We also discovered a 10-kb deletion in the Houston-1Ref strain in a segment upstream and within the badA gene, which codes for an adhesin required for the induction of vasculoproliferative disorders (57, 70). The microarray data further suggested that Houston-1980517 was identical in gene content to Houston-1Seq, whereas the Houston-1ATCC isolate seems to be an intermediate that contains the deletion upstream of badA but not the tandem duplication of the kb 1060 to 1070 region. Additionally, Houston-1Ref differed from Houston-1Seq by a recombination event across the tuf genes, as verified by PCR and sequence analysis. Finally, the pulsed-field gel electrophoresis (PFGE) analysis suggested translocations across HGIa and HGIb in the different subcultures of the Houston-1 strain.
For the comparative microarray hybridizations of genomic DNA isolated after 5 and 10 days of culture, genomic DNA from the Houston-1Ref strain was used as the reference in one set of experiments, whereas DNA from the Houston-1ATCC subculture isolated after 5 days of growth was used in a second hybridization. The median M values in the second experiment were corrected by adding the M values from a separate hybridization of genomic DNA from the Houston-1ATCC strain against Houston-1Ref subculture before computing the mean of the replicate hybridizations.
Analysis of CGH data.
Scanning and image analysis were performed with a ScanArray 4000 scanner and the ScanArray Express software (Perkin-Elmer, Inc.) as reported previously or with a GenePix 4100A scanner and the GenePix Pro 5.1 software (Axon Instruments, Molecular Devices) at a resolution of 10 µm. The channel used for the reference strain is referred to as Ch1, and the test strain is referred to as Ch2. Photomultiplier tube voltage was manually adjusted to balance intensities in the two channels while avoiding a high number of saturated spots. Spots meeting any of the following criteria were considered bad and removed from analysis: spots flagged as bad or not found during quantification; Ch1 spot median below 5 (ScanArray slides) or 40 (GenePix slides) times the Ch1 background median; more than 10% saturated pixels in either channel; less than 95% of spot pixels having intensity higher than background intensity plus 1 standard deviation; less than 90% of spot pixels having an intensity higher than background intensity plus 2 standard deviations; background-corrected Ch1 spot median intensity below 0.2 times the overall slide median Ch1 intensity; or the PCR used to generate the probe had failed (no visible band on gel). Median background intensities were subtracted from the median spot intensities, and M values (log2 ratios) were computed as log2(Ch2/Ch1). Normalization was performed as described in reference 43, with the exception that lowess normalization was performed when plots of M versus log2Ch1 or M versus A [where A = 0.5 · (log2Ch1 + log2Ch2)] indicated density-dependent dye bias. Median M values from all replicate spots and arrays were computed for each strain and used to classify each probe as absent (M
2), uncertain (2 < M
1), or present (M > 1). Status of genes and intergenic regions was similarly decided by median M values of all corresponding probes.
Probes were ordered by their position in the Houston-1 genome and sorted into regions of contiguous probes with the same pattern of absence/presence in the strains. Probes with uncertain status were considered compatible with both absence and presence, depending on the status of the surrounding probes, and if a region contained both uncertain probes and probes with known status for a certain strain, the status of the region was set to that of the known probes. For phylogenetic analysis, only regions comprising at least two probes, absent in at least one strain and for which results were retrieved for all strains, were used. Maximum parsimony analyses of the CGH data were performed using the PHYLIP programs pars, seqboot, and consense (24) with 1,000 bootstrap replicates and 100 random orderings of strains on a matrix containing absence (0), presence (1), or uncertain (?) status scores of the selected regions.
PFGE hybridization. PFGE-restriction fragment length polymorphism analysis was performed using NotI, AscI, and SgfI restriction endonucleases separately, as well as in a combination of NotI and AscI in a double digest. Experimental procedures for cultivation, digestion, and gel electrophoresis were as described previously (43). PFGE gels were depurinated in 0.2 N HCl for 10 min, and then blotting was performed using a Hybond N+ membrane (GE Healthcare) following the alkaline transfer protocol according to instructions of the manufacturer.
DNA fragments serving as probes were amplified using the primers listed in our Table 1 of our supplementary information at the website http://www.egs.uu.se/molev/suppl.data/J.Bacteriol. PCR conditions were as follows: enzyme activation step of 95°C for 10 min with subsequent 34 cycles of denaturation at 95°C for 30 s, annealing at 52°C for 90 s, and extension at 72°C for 120 s, followed by the final extension at 72°C for 10 min and a holding step at 8°C. Obtained PCR products were first purified using a QiaQuick PCR purification kit (QiaGen) and then labeled and probed onto membranes using the Gen Images AlkPhos direct labeling and detection system with chemiluminescence detection with CDP-Star (GE Healthcare). Hybridizations were performed overnight at 60 to 70°C followed by washes at 60 to 72°C with the manufacturer's recommended buffer, which contains 50 mM NaH2PO4. The signal was detected by exposure using Hyperfilm ECL.
Analysis of PFGE data.
The sizes of the PFGE bands were estimated manually by comparison to a low-range
ladder and yeast chromosome PFGE markers (New England Biolabs). The observed PFGE blot pattern was manually edited for double bands, incomplete restriction, and cross-hybridization to multiple bands in individual digests. If probes in the tested strain hybridized to the same band as neighboring probes in the sequenced genome, the fragment was considered to be syntenic with that of the sequenced genome; if the hybridization pattern of neighboring probes differed, a translocation, inversion, or gain or loss of restriction site was inferred. When a single probe or a consecutive stretch of probes differed in hybridization pattern, a translocation was inferred, although mechanistically the same pattern would also be explained by two inversions. When two probes had switched location, two inversions were inferred.
To infer the putative genome structures and identify rearrangement events, an interactive program (XPulSee) was written in Perl/Tk. The program displays the structure (including positions of probes and restriction sites) of the sequenced genome, as well as the hypothetical structure of the tested strain. The observed PFGE blot patterns obtained with the different enzymes are shown alongside the predicted pattern for the sequenced Houston-1 strain and the hypothetical structure of the tested strain. By using the program to interactively generate different rearrangements (inversions, translocations, and gains or losses of restriction sites) of the hypothetical test strain structure and compare the expected PFGE profile to the observed one, we identified putative rearrangements that would explain the observed PFGE pattern. Although the data were not always sufficient to distinguish between slightly different hypotheses or determine the exact number or types of events, the approach helped identify regions of rearrangements.
Nucleotide sequence and microarray accession numbers. The sequence of the novel rpoB allele has been submitted to GenBank under the accession number AM295003, and the spacer sequences have been submitted under accession numbers AM295307 to AM295311. The microarray data have been deposited in the ArrayExpress database of the European Bioinformatics Institute under the accession number E-TABM-88.
| RESULTS |
|---|
|
|
|---|
Population structure inferred by MLST analysis. An analysis of the distribution of polymorphisms in nine selected core genes (batR, eno, ftsZ, gltA, groEL, nlpD, ribC, rpoB, and rrs) showed that the selected strains represent six (Tables 2 and 3) of the seven previously identified sequence types (35). Like the sequenced Houston-1 strain, 23 of the 38 isolates belonged to ST1, including 7 of the 11 human isolates. The rpoB allele of the Marseille URLLY8 strain displayed an extra single-nucleotide polymorphism (here called C1733T) not previously recorded and was classified into a novel type, ST8.
|
Microarray comparative genome hybridization. The variability in gene content within the B. henselae population was estimated by hybridizing genomic DNA from each isolate to a B. henselae microarray containing probes for 1,367 genes and 112 noncoding regions of the sequenced Houston-1 strain (Fig. 1). Neighboring probes with the same pattern of absence/presence across the strains were sorted into regions, each of which represents a putative insertion/deletion event. From the 1,269 genes (1,645 probes and 1,443 regions including pseudogenes and spacers) for which results were retrieved, a total of 105 different B. henselae genes were reported absent (145 probes and 111 regions including pseudogenes and spacers). The number of missing probes per strain ranged from 0 to 86 (Table 4). Up to 34 indel events per strain relative to the Houston-1 strain were estimated, of which 22 included at least two probes and gave results in all strains (Table 4).
|
|
Clustering by MLST and CGH data. To place the variability in gene content within the population structure of the species, we first compared the number of genes and regions inferred to be absent from strains of different sequence types. No or only minor differences were observed among the various ST1 and ST2 strains; only 1.9 and 2.5 probes, on average, are missing in these two clades relative to the Houston-1 isolate (Table 4). Interesting exceptions are the ST1c isolates IndoCat-2 and IndoCat-11, which have lost 19 and 14 probes, respectively, in up to 13 deletion events. Members of the other clades are even more divergent, with an inferred loss of 16 to 86 probes in circa 13 to 34 deletions per strain relative to the Houston-1 isolate, or 5 to 22 indels if only counting regions covered by at least two probes and for which data are available for all strains.
Next, we performed a clustering analysis based on the MLST data using the neighbor-joining method (Fig. 2A), which was broadly similar to a clustering based on the collected CGH data using the parsimony method (Fig. 2B), and counting as the unit of mutation segments that cover two or more neighboring probes with the same status. Both analyses distinguished the ST1 and ST2 isolates from the ST4 to ST7 isolates with bootstrap support values above 88% (Fig. 2A and B), with the exception of the ST1 isolates IndoCat-2 and IndoCat-11, which clustered with the ST5 group in the gene content analysis (Fig. 2B). Another highly supported group in the gene content analysis (96% bootstrap) not seen in the MLST clustering was that of two feline isolates of ST4 (GreekCat-23) and ST5 (Cheetah) (Fig. 2B). Both of these two isolates have lost the prophage region (Fig. 1).
|
DNA amplification. A few regions of the chromosome displayed an increased relative hybridization signal in several strains (Fig. 1). One of the amplified regions matched perfectly the prophage (BAP), and another matched the phage genes of HGIa. The amplifications at these two sites were located within or confined to the borders of the phage elements. We also observed a stronger hybridization signal of a segment in the vicinity of kb 1090 in GreekCat-9 and IndoCat-11 and to a lesser extent in GreekCat-1 and ZimCat-25. However, the most dramatic amplification was observed within the chromosome II-like region, which gave increased hybridization signals in half of the strains (Fig. 1). The amplified regions varied in size from 10 to 300 kb but were always centered at a position corresponding to kb 1660 in the sequenced Houston-1 genome. The amplitude showed a maximal M value of approximately 3.5 at the peak, which corresponds to a 10-fold-higher copy number relative to the reference strain, and gradually decreased in both directions.
A comparison of the sizes of the amplified fragment following DNA isolation after growth on plates for 5 and 10 days showed that the extent of amplification was dependent on the time at which the cells were harvested (Fig. 3). In this comparison, we included strains that in the first study showed weak, intermediate, and very strong amplification (GreekCat-9, GreekCat-34, IndoCat-11, and UGA-7). In all cases, we observed a minor or weak amplification after 5 days of cultivation, which increased markedly in amplitude and size after 10 days. In effect, all strains showed a similar level of amplification after prolonged growth, as illustrated here with UGA-7 and GreekCat-34 (Fig. 3B and D). In comparison, only minor variations were observed for the different Houston-1 subcultures (Fig. 3E and F and data not shown). No correlation was observed between the extent of amplification and genotype; the amplifications were observed in strains of all sequence types. Nor was the amplification related to the host or geographic origin of isolation.
|
|
We observed no correlation between genome structures and sequence types. Rather, isolates of the same sequence type often presented radically different structures, and similar rearrangement scenarios were in several cases inferred for strains of different sequence types. For example, several genomes showed an overall genome structure similar to that of the Houston-1980517 strain, including isolates as diverse as the American ST1 strain UGA-23, the Indonesian ST1 strains IndoCat-2 and IndoCat-11, and the ST5 isolates Cheetah and GreekCat-23. Five isolates of three different sequence types contained a nonsymmetrical inversion event across the terminus of replication, with inversion breakpoints at the duplicated tuf genes, as verified by PCR analysis and sequencing.
| DISCUSSION |
|---|
|
|
|---|
Our results show that human infections have been induced by feline strains of diverse geographic origins and sequence types, in accordance with a high genotypic heterogeneity of B. henselae isolated from humans with disease symptoms (69). However, we failed to identify genes in the sequenced Houston-1 strain that were uniquely shared with the other human isolates, suggesting that the ability to infect humans is not simply due to the acquisition of novel virulence traits. Rather, strains of most sequence types appear capable of causing incidental human infections. In comparison, permanent host switches are rare, with B. quintana as the only known example. It is interesting that neither B. koehlerae (43) nor any of the B. henselae strains here examined are as reduced in size as B. quintana (2). This suggests that the reduction in size of the B. quintana genome was perhaps associated with the permanent switch to the human host population.
The CGH profiles of the B. henselae strains were broadly consistent with the identified MLST variants, although with a few notable exceptions in which B. henselae isolates of different STs clustered in the CGH analysis. One such anomaly was the clustering of the strain Cheetah (ST4) with GreekCat-23 (ST5). However, since the support for this clade was lost upon the removal of phage sequences from the clustering analysis, part of the explanation may be independent excision of the prophage regions in the two strains. Another possibility is that replication of phage genes in the reference strain makes it appear as if isolates with no or less replication of phage genes are missing the prophage. However, attempts to detect phage genes by PCR analyses using primers from various prophage genes consistently failed with Cheetah and GreekCat-23 (data not shown), which together with extremely low hybridization signals suggest that the prophage is truly missing from these strains.
Independent excision of prophages and genomic islands in unrelated strains or recombination between strains yield homoplasies in the CGH data, making it unreliable for phylogenetic analysis. Indeed, a microarray CGH analysis of Helicobacter pylori, a species associated with high recombination frequencies, also revealed similar gene losses in genotypically unrelated strains (30). In effect, the phylogenetic histories of H. pylori (30) and B. henselae (this study) can only partially be predicted from gene content data. Microarray CGH data are perhaps most useful for the classification of isolates from recently emerged clonal pathogens, such as Mycobacterium tuberculosis (39, 65), Yersinia pestis (71), and Staphylococcus aureus (44).
Nevertheless, CGH data are informative in that they reveal gene content variations among strains. The results of this study have shown that the difference between the two most frequently isolated genotypes, the Houston-1 and the Marseille strains, resides in the absence of approximately 15 to 20 kb from the central part of HGIb in the Marseille strain. This region encodes mostly Bartonella-specific genes and is also absent in B. quintana (2) and B. koehlerae (43) as well as from a clade in the ST5 group that encompasses both human and feline isolates. Additionally, we observed variability in the ST2 to ST8 isolates in two clusters of genes containing autotransporter and passenger domains characteristic of type V secretion (50), one of which has been annotated as a cohemolysin and putative cytotoxin (45). This is interesting, since it has been shown that vasculoproliferative lesions of the liver and spleen occurred in three out of four HIV patients infected with strains of the 16S genotype I but in none of eight patients infected with the other genotype (12). Genes in HGIb and in the clusters putatively coding for autotransporters are thus our current best candidates to explain differences in clinical manifestations among human isolates of different genotypes.
The amplification of the chromosome II-like region may potentially also contribute to the creation of novel sequence variants, since genes with higher copy numbers have a higher probability of being transmitted to other cells. The amplified region is up to 300 kb in size in some strains and covers most of the 282-kb-long chromosome-II like segment, which is thought to have been acquired from the integration of another replicon (2). The increase in size of the amplified fragment with time suggests that the replication process is gradually induced and perhaps coordinated with the transition to the stationary phase, during which phage induction and lysis have been observed (13). If so, the differences in the amplification patterns observed among the various strains in the initial analysis (Fig. 1) may depend on the stage at which the cells were harvested.
Like the prophage and the GEIs, sequences in this area of the chromosome show deviations in GC content, and there is a small peak in GC skew near the center of the amplified region (see Fig. 1 in reference 2), which is located within a segment of about 30 kb that may represent an extensively degraded prophage. This region is flanked by two tRNA genes and contains mostly noncoding DNA but also three putative phage genes (an exonuclease, a DNA helicase/primase, and a lysozyme) and a few hypothetical genes of unknown function. A phylogenetic analysis of the helicase shows that it clusters most closely with prophage homologs in Escherichia coli, Salmonella enterica serovar Typhimurium, and Legionella pneumophila and more distantly with helicases from the lytic T3 and T7 phages (data not shown). However, no structural phage genes were identified, providing no evidence for a functional prophage in this region, at least not in the Houston-1Seq strain. The corresponding region in B. quintana is shorter in size and contains the yopP gene, which codes for a secreted protein that modulates the host immune response and is located on a 70-kb virulence plasmid in Yersinia pestis (32, 40).
Escape replication of prophage regions has previously been observed in E. coli (27, 34) and Salmonella enterica (26) and may result from runoff replication due to the loss of excisionase and/or integrase genes (27). Likewise, the observed amplification in some B. henselae strains may be initiated from a replication initiation site derived from an element that was previously replicating autonomously, such as a bacteriophage, a plasmid, or an auxiliary replicon. It remains to be determined whether additional prophage genes and/or plasmid genes are located at the corresponding site in strains with strong amplification.
In contrast to previous estimates (51), our PFGE results indicate that the B. henselae genomes are relatively homogeneous in size, in the range of 1.9 to 2.0 Mb (data not shown). The data also indicated high rearrangement frequencies within and across regions that include the GEIs. Different rearrangement patterns were observed in strains of the same sequence type as well as similar patterns in strains of different sequence types. This makes classification systems based on PFGE or other structural data unsuitable for typing purposes.
The observed variability makes it tempting to speculate that there is selection for shuffling of sequences within and across the islands. Two hypotheses have previously been suggested to explain selection for locally high rearrangement frequencies. The adopt-adapt model suggests that the insertion of phages and/or GEIs leads to an unbalancing of the genome and thereby to fixation of chromosomal rearrangements, as observed in Salmonella enterica (46, 47). According to the adopt-adapt hypothesis, rearrangements occur until the chromosome is partitioned equally on both sides of ori and ter. Here, the expectation is that rearrangement frequencies are high in bacteria that experience high rates of sequence influx and that selection increases the proportion of cells with rapid DNA replication until a balanced genome is restored in the population. In support of this hypothesis is the observation that Salmonella enterica serovar Typhi strains with more balanced genomes and shorter generation times out-compete strains with less balanced genomes (46).
Another theory is that selection promotes recombination so as to generate antigenic variability. An example of this strategy is found in Streptococcus pyogenes, which contains two prophages that are integrated equidistant from the ter region. Superantigens are exchanged between the prophages by homologous recombination, thereby contributing to the diversity of virulence factors in S. pyogenes (52). The streptococcal rearrangements were estimated to have occurred in many strains with different serotypes in the past and to have varied in frequency over time in the clinical strains. Likewise, genomic inversions have been suggested to contribute to antigenic variation in Bacteroides fragilis (10) and Mycobacterium avium (67). Another example is the PPE family in Mycobacterium tuberculosis, members of which are highly variable across strains by multiple insertion-deletions and hypothesized to be involved in antigenic variation (1, 29). Additionally, recombination dominates over mutations in Porphyromonas gingivalis and may help mediate long-term survival in the human host (18). High frequencies of recombination, which is the basis for gene conversion, combinatorial gene shuffling, and genomic inversions, also play a role in antigenic variation in human pathogens such as Neisseria, Borrelia, and Treponema (61).
The difference between the two hypotheses is that the adopt-adapt model predicts that genome rearrangements will eventually come to a halt, whereas the model suggesting diversification by recombination predicts that structural alterations will occur even with a balanced genome. Since the sequenced Houston-1 strain of B. henselae already has a symmetric genome, the adopt-adapt model may not be applicable to this species. Also, rapid replication may not necessarily be a selective force in the evolution in B. henselae and other slowly growing pathogens that establish long-term infections in their hosts. In the natural population of such species, diversification and immune evasion by intrachromosomal recombination seem likely to be a more dominant selective force, which could help explain the extensive rearrangements observed in areas of the GEIs and the terminus of replication.
Interestingly, B. henselae isolates taken from naturally infected cats during peaks of bacteremia showed marked differences in their PFGE patterns, suggesting that rearrangements play a role in persistent infections (38), although it could not be ruled out that coinfections of strains of different genotypes also occurred. Additionally, PFGE profiles derived from human strains and their corresponding cat source isolates were concordant but differed in size for one or more fragments, indicative of genomic rearrangements caused by antigenic variation (12). Furthermore, in vivo passage of B. henselae in immunocompetent and immunodeficient mice resulted in significant morphological changes, possibly due to different numbers and sizes of fimbriae and other adhesins (66), some of which may be encoded by the GEIs.
Indeed, flanking a long stretch of phage genes in the largest island, HGIa, in the sequenced Houston-1 strain are multiple copies of fhaB, which codes for filamentous hemagglutinin (FHA), and of fhaB/hecB, which codes for the corresponding transport protein. FHA is the dominant adhesin in Bordetella (36, 55), which is a pathogen that causes respiratory tract infections in humans and animals. FHA in Bordetella pertussis is highly immunogenic in humans and is also a protective antigen in animal models (4, 9, 62). As in Bordetella, FHA in B. henselae may serve as a major antigenic determinant and play a role in host receptor recognition. Immediately downstream of fhaB is a short hypothetical gene putatively coding for a protein of 138 amino acids that is repeated 20-fold in the genome. In total, almost 50% of the sequences in the GEIs are members of repeat families. The repeated structure of the GEIs and the vicinity of the FHA genes to prophage genes facilitate recombination events and spread among cells by horizontal gene transfer, thereby further contributing to variability within the B. henselae population. Thus, as in S. pyogenes, homologous recombination across repeated sequences may contribute to the shuffling of sequence segments in antigenic determinants.
B. henselae infections are normally self-limiting, but the symptoms and severity of the disease depend on the immune system of the host. An interesting question is whether B. henselae is able to sense the immune status of the host and respond appropriately. Experimental studies similar to the one performed in this study of B. henselae grown in the natural reservoir could help determine whether amplifications, deletions, and rearrangements occur during host colonization and what the implications of these alterations are for virulence and antigenic diversity.
In short, we suggest that recombination within and across GEIs in the B. henselae genome continuously generates subpopulations with different sets, sequences, and possibly even different expression patterns of major adhesions. This may alter host receptor recognition patterns and support evasion of the host immune response during long-term residence in the natural reservoir. High recombination frequencies among genes coding for outer membrane and secreted proteins could also offer an advantage to strains that have a wider host range. We hypothesize that this provides a selective constraint on the preservation of GEIs in the B. henselae population, which leads to the testable hypothesis that the surface-exposed proteome differs for strains with different genomic architectures, despite otherwise similar gene pools.
| ACKNOWLEDGMENTS |
|---|
This research was supported by grants from the Foundation for Strategic Research in Sweden (S.S.F.), the Swedish Research Council (V.R.), the Göran Gustafsson Foundation, the Knut and Alice Wallenberg Foundation (K.A.W.), and the Network of Excellence EuroPathogenomics (LSHB-CT-2005-512061).
| FOOTNOTES |
|---|
Published ahead of print on 25 August 2006. ![]()
H.L. and O.V. contributed equally to this work. ![]()
Present address: División de Microbiología, Universidad Miguel Hernández, 03550 Alicante, Spain. ![]()
Present address: Institute for Biology and Biochemistry/Bioinformatics, University of Potsdam, Potsdam, Germany. ![]()
| REFERENCES |
|---|
|
|
|---|