Characterization and Comparative Analysis of the Staphylococcus aureus Genomic Island vSaβ: an In Silico Approach

With the rapid increase of available sequencing data on clinically relevant bacterial species such as S. aureus, the genomic basis of clinical phenotypes can be investigated in much more detail, allowing a much deeper understanding of the mechanisms involved in disease. We characterized in detail the S. aureus genomic island vSaβ and defined a superordinate system to categorize S. aureus strains based on their vSaβ type, providing information about the strains’ virulence-associated genes and clinical potential.

substantially smaller vSa␥. These genomic islands are extremely stable and have highly conserved genes. However, the gene compositions of each of the islands can vary substantially between some strains and are yet identical between others (8,10).
In 2008, when Baba et al. (8) classified 12 sequenced S. aureus strains into 4 vSa␣ types and 3 vSa␤ types, the comparison of the genomic islands was rather limited. To date, over 10,000 S. aureus whole genomes (including contigs and scaffolds) are available in the NCBI database. Such an enormous increase in sequencing data calls for a new, superordinate system to classify S. aureus strains based on genomic islands. The genomic island vSa␤ is of particular interest, as it carries two genes belonging to the type I staphylococcal restriction-modification (RM) system (hsdM and hsdS) and harbors a number of virulence-associated genes, such as a hyaluronate lyase precursor gene (hysA), a lantibiotic gene cluster (bacteriocins of S. aureus [bsa]), two leukocidin genes (lukD and lukE), an enterotoxin gene cluster (EGC), and a cluster of serine protease genes (serine protease like [spl] genes).
First characterized in 2001, the spl cluster was described as an operon containing 6 genes (splA, splB, splC, splD, splE, and splF) with DNA sequence similarities ranging from 42% to 94% (11). For listing new spl genes, the existing alphabetical nomenclature is not ideal. Therefore, we establish an approach to unambiguously name the genes in the spl cluster, including new ones, based on their phylogenetic relationships.
To our knowledge, the findings of Baba et al. (8) have not been proceeded. Hence, we extended their vSa␤ listing to a total number of 15 types by adding 12 new types obtained by analysis of the vSa␤ islands of 103 clinical S. aureus strains. Thus, striking conservation of the virulence-associated genes was found within each type.      For each vSa␤ type, a reference (ref. ) strain was chosen. The vSa␤ genes were translated into protein sequences using the standard code and aligned to the corresponding protein of the reference strain. Shading indicates identity of Ն95%, and asterisks indicate truncated or fragmented genes. Where a gene was absent or truncated in the reference strain, the protein sequence of another strain of that type was used as a reference, indicated with R below the corresponding protein.
ϩ, presence of the corresponding genes in the reference strains that served as a target for comparing the genes of the other strains of the group. b vSa␤ types X and XV have a second copy of SplB. c vSa␤ type XII has a second copy of SplD4.
Most vSa␤ types followed a similar structural design (with exceptions of types V, VII, and XIII) carrying a distinct set of virulence genes ( Fig. 1). They included hysA, the lipoprotein (lpn) genes, the RM genes hsdM and hsdS, a cluster of spl genes (splA, splB, splC, splD 1 , splD 2 , splD 3 , and splD 4 ), a probable beta-lactamase gene (bla), a cluster of 9 bsa genes (bsaA1, bsaA2, bsaB, bsaC, bsaD, bsaE, bsaF, bsaG, and bsaP), the lukD and lukE genes, as well as an EGC consisting of the seg, sei, sem, sen, seo, and seu genes. In some cases, the seu gene was replaced by the two truncated genes, ent1 and ent2. The compositions of these genes and gene clusters varied substantially between different vSa␤ types but were highly conserved within the same vSa␤ type ( Fig. 1 and Table 1). In addition to these core vSa␤ genes, a number of hp genes were found of which some were assigned to a FIG number by the RAST (Rapid Annotations using Subsystem Technology) pipeline. By typing the vSa␤ islands and aligning all related sequences, a consensus structure for each vSa␤ type could be defined. vSa␤ type I possessed the genes of the type I RM system, an spl cluster (splA, splB, splC, and two copies of splD 2 ), lukD, lukE, and a complete EGC. The amino acid sequences were, with only very few exceptions, highly similar within this vSa␤ type (Ͼ95%) ( Table 1). The exceptions were truncated genes, the lack of a second splD 2 gene, and a lower sequence identity of the splD 2 protein in three strains (Table 1). Furthermore, 3 sequences for mobile element proteins (mep), a bsaG fragment, bla, and remnants of phages were detected (Table 1; Fig. 1). Comparing the structures of all 11 vSa␤ type I genomic islands, Mu50 showed consensus and hence was used as a reference for this vSa␤ type. The consensus of vSa␤ type II was best represented by S. aureus strain TW20. It harbored the type I RM genes, an spl cluster (splA, splB, splC, splD 1 , and two copies of splD 2 ), a bla, a bsa cluster, and the lukD and lukE genes. Exceptions were some strains that harbored truncated hsdM, basE, and lukE genes. Moreover, vSa␤ type II comprised sequences for two mep genes, bla, a number of hp genes, and a partial phage ( Table 1). The consensus on vSa␤ type III was strain MRSA252. It harbored hysA, hsdM, hsdS, an spl cluster (splB, splC, splD 1 , splD 2 , splD 3 , and splD 4 ), and an EGC. Besides three spl genes being all pseudogenes (splB, splD 2 , and splD 3 ), vSa␤ type III lacked lukD and lukE but had additional genes in its 5= region (a number of putative lpn genes, a hysA, and a lipase gene). vSa␤ type III carried sequences for mep, hp, and remnants of a phage. In most of the type III strains, the core proteins were highly conserved, showing an identity of Ն99% when S. aureus strain MRSA252 was used as a reference. Deviations thereof could be found in the spl cluster and in hsdM ( Table 1). The vSa␤ type IV was best represented by strain M3783C and was, with over 85,000 bp, substantially longer than the other vSa␤ types. It harbored a number of hp genes, a hysA gene, an spl cluster (splB, splC, splD 2 , splD 4 , and a truncated splA), a complete bsa cluster, an EGC (with truncated seg and sem), lukD, lukE, and the bla gene. hsdM and hsdS were either missing or had premature stop codons. The amino acid sequences of all other vSa␤ key genes were highly conserved (Ն99%) among strains, with only a small number of exceptions (Table 1). Furthermore, a 50.4-kb intact mosaic phage closest to Ipla88 (NCBI RefSeq accession no. NC011614) was detected in 6 vSa␤ type IV strains, coding for approximately 75 proteins including those encoded by the EGC also found in other vSa␤ types. In contrast, the strains M2323C and Sa110 harbored only remnants of two phages; hence, the vSa␤ islands of these two strains were shorter than the other type IV islands. The structure of vSa␤ type V differed substantially from the other types and was only 15,965 bp long and highly conserved, showing a nucleotide sequence identity of Ն99% when using strain S0385 as a reference. From the typical genes, vSa␤ type V harbored only hysA. However, it carries a total of 4 transposons, as well as a number of hp genes, some of which were related to FIG numbers, or they were identified as lpn genes. Within vSa␤ type VI, strain K2R was a suitable representative. It comprised hysA, hsdS, hsdM, an spl cluster (splA, slpB, splC, splD 1 , and two copies of splD 2 ), bla, a bsaG fragment, lukD, lukE, a phage integrase, as well as a number of hp genes, some of which were assigned a FIG number, and others were identified as lpn genes. In addition, a partial phage was detected encompassing the entire spl cluster. Four strains (M0443, Newbould, G07I, and Lodi4R) had a frameshift in the beginning of lukE, leading to a truncated protein. Additionally, G07I and Lodi4R also lacked the splD 1 and the second splD 2 gene. Except for these very few special cases, all genes of vSa␤ type VI were highly conserved between strains, showing an amino acid identity of Ն98%. The low overall nucleotide identity of 76% in strain Lodi4R compared to the reference was based on the insertion of the transposon Tn554 within hysA. This strain was, therefore, considered a special case of vSa␤ type VI. Tn554 carried 3 transposase genes, as well as a bla operon carrying the blaI, blaR1, and blaZ. If Tn554 was disregarded, the nucleotide sequence identity increased to 93%. vSa␤ type VII, similar to type V, was short (about 17,460 bp) and lacked the typical vSa␤ genes, except for hysA and a hsdS fragment. Furthermore, all vSa␤ type VII strains showed a partial phage with genes for proteins targeting the host's immune response (chemotaxis-inhibiting protein Cpn and extracellular complement-binding protein Scn), as well as genes for a peptidase, a phage lysine, an ATP-binding protein, and a transposase. The amino acid sequences of these phage-encoded proteins were identical, with the exception of M3386D that was lacking a part of the phage containing genes coding for the ATP-binding protein and the transposase. The vSa␤ type VIII included only the JKD6159 strain. It comprised hysA, hsdM, hsdS, an spl cluster (a truncated splA, splB, splC, splD 1 , and splD 4 ), bla, a bsaG fragment, lukD, lukE, a partial phage, a number of hp and phage-related genes, and a number of lpn genes in the 5= region. For vSa␤ type IX, ED133 served as the reference harboring the vSa␤ key genes hsdM and hsdS, an spl cluster (a truncated splA, slpB, splC, and splD 2 ), lukD, lukE (truncated), bla, and a complete bsa cluster, as well as a number of hp genes, some of which were identified as lpn genes and others that were assigned a FIG number. In addition, vSa␤ type IX carried 2 genes encoding ATP-binding cassette (ABC) transporter proteins. and a gene encoding a phage integrase. Despite the isolates being from 3 different countries, they shared an overall nucleotide sequence identity of Ն95% in their vSa␤. The only small difference in the vSa␤ key genes was found in hsdM and lukD, which showed lower similarities in some strains (Table 1). vSa␤ type X showed 93% nucleotide sequence identity. The vSa␤ consisted of an spl cluster (splA, two copies of splB, splC, splD 1 , splD 2 ), a bla, a bsaG fragment, lukD, lukE, a complete hsdS gene and an hsdM gene with a premature stop codon. In addition to a number of hp and FIGassigned genes, lpn, two genes encoding an ABC transporter, and a phage integrase gene could be found, similar to vSa␤ type IX. The amino acid identity in the key genes were Ն98%, with the exceptions of lukE being truncated in O11, splD 1 lacking in strain O46, and a slightly lower identity in splD 2 (93% amino acid identity). vSa␤ type XI consisted of two strains with an amino acid identity of Ն99% in the vSa␤ key genes and a nucleotide identity of 99% over the entire vSa␤ region. Their vSa␤ consists of hsdM and hsdS, but hsdM has a premature stop codon. Furthermore, vSa␤ type XI was characterized by an spl cluster (splA, splB, splC, splD 2 , splD 3 , and splD 4 ), a bla, a basG fragment, lukD, lukE, a complete EGC (with a truncated sen), several hp genes (some of which with assigned FIG numbers), and a short hysA fragment. The most striking feature of vSa␤ type XII was the enlarged spl cluster consisting of splA, splB, splC, splD 1 , splD 2 , splD 3 , and 2 copies of splD 4 . It further harbored both RM genes (hsdM with premature stop codon), bla, a bsaG fragment, lukD, lukE, and a complete EGC along with several hp genes, some with assigned FIG numbers. The strains categorized as vSa␤ type XIII had a typical vSa␤ structure in the 3= end, yet a clear beginning could not be defined for this vSa␤ type, as it lacked the conserved 5=-end sav1803. Instead, there were three hp genes as well as two recently discovered enterotoxin sequences, sel26 and sel27. These were followed by several hp genes (some with assigned FIG number) and genes typical for vSa␤, such as lpn, hsdM, hdsS, bla, the EGC followed by the tRNA cluster, and sav1831, typically marking the 3= end. vSa␤ type XIV harbored hysA, both RM genes (hsdM and hsdS), an spl cluster (splA, splB, splC, splD 1 , splD 2 , and splD 4 ), bla, a bsaG fragment, lukD, and lukE. When strain G08M was used as a reference, most proteins showed an amino acid identity of 100%, with the exceptions being three strains with truncated hsdM genes. vSa␤ type XV consisted of both RM genes (hsdM and hsdS), an spl cluster (splA, two copies of splB, splC, splD 1 , two splD 2 , splD 3 , and splD 4 ), bla, a bsaG fragment, lukD, and lukE. When strain Lodi11bM was used as a reference, the only differences were lukE being truncated and SplD 4 having a lower amino acid identity in strain 4185.
In summary, all vSa␤ types had the same basic structural design. The design was minimal for types V, VII, and XIII and was more complex for the other types, as they harbored a distinct set of virulence factors and gene clusters. Also common to all vSa␤ types, except to type XIII, was the pervasive flanking by the same conserved sequences with the locus tags SAV1803 and SAV1831. Variation among the vSa␤ types was observed for the spl cluster in gene number and composition. Furthermore, some types were characterized by truncated RM genes and/or bsa or spl genes. Within each vSa␤ type, they were highly conserved, at the structural and the protein levels.
HsdS. Inspection of the HsdS sequences with a phylogenetic approach ( Fig. 2A) revealed that the HsdS sequence alone does not provide sufficient resolution for inference of gene content or virulence determinants of the entire vSa␤. Hence, categorization of vSa␤ based on the nucleotide sequence identity of the entire genomic island can be considered the method of choice when the entire sequence of the genomic island is available.
Serine proteases. All vSa␤ islands, with the exceptions of types V, VII, and XIII, carried an spl cluster, harboring between 4 and 9 spl genes. Phylogenetic analyses using the maximum-likelihood method confirmed the existing Spl family members SplA, SplB, SplC, and SplD, but with SplD having 4 different variants (SplD 1 , SplD 2 , SplD 3 , and SplD 4 ) (Fig. 2B). SplD 1 replaces the former SplE, and SplD 2 replaces the former SplD and SplF. The SplD 3 and SplD 4 clades include Spl proteins that could not be assigned into any of the preexisting Spl variants. Furthermore, the presence of multiple splD variants per strain points to a relatively recent gene duplication.
Clonal complexes and spa types. spa typing showed that one vSa␤ type can harbor a number of spa types, but each spa type is limited to one vSa␤ type. Highly striking was the consistency of a vSa␤ type with the strains' clonality. The predominance of a single clonal complex (CC) per vSa␤ type underlined the concept that vSa␤ acquisition and its diversification happened prior or simultaneously to the clonal diversification of ancestral S. aureus strains.

DISCUSSION
With the rapid advancement of sequencing technologies and the decreasing costs thereof, the amount of S. aureus sequences deposited in databases is growing exponentially. These databases offer a powerful information source for studying the variabilities and consistencies between different isolates and their potential ability to cause disease. As opposed to the slow accumulation of point mutations, acquisition of larger parts of DNA through horizontal gene transfer leads to a rapid genetic change. This can be crucial in survival under certain selection pressures (antibiotics) or in novel niches (new host) (10,12,13). The mechanism of how S. aureus acquired its genomic islands is not fully understood, yet here, we have elaborated why phage mediation has played a crucial role. Viral origin of vSa␤. Throughout all vSa␤ types, the strains belonging to one vSa␤ type almost exclusively harbor the same virulence genes. In addition to the gene composition, an amino acid sequence identity of 95% or higher is another characteristic within each type, and in many cases, even 100% over all strains of a vSa␤ type (Table 1).
Despite the range of hosts and geographical origins covered in this study, the vSa␤ islands were always located in between SAV1803 and SAV1831 at the 5= and 3= ends, respectively. This highly precise location together with the tRNA cluster preceding SAV1831 as observed throughout all vSa␤ types strongly indicates that these sites may be key for the presence of the vSa␤ islands in S. aureus. Indeed, tRNAs are known to harbor a conserved attachment site (14) for integrating prophages and other foreign DNA (15,16).
It is widely accepted that S. aureus genomic islands were acquired through horizontal gene transfer, but the exact mechanism and their current mobility status have been questioned (8,10,17). Our data show the presence of partial phages in almost all vSa␤ types, with the exceptions of type V, where no phages were predicted, and type IV, where a complete prophage was predicted by PHASTER. In addition, vSa␤ type IV also harbors all virulence-related key features of vSa␤, as follows: the type I RM system, the spl cluster, the bsa cluster, the leukocidin genes, and the EGC. Other vSa␤ types  FIG 2 (A and B) Phylogeny of the 20 HsdS (A) and 68 Spl (B) amino acid sequences of Staphylococcus aureus. For both trees, protein sequences that differed in at least one amino acid were aligned, and a phylogenetic tree was reconstructed using maximum likelihood. For each tree, a scale indicates the relative (Continued on next page) possess the EGC only (types I, III, XI, XII, and XIII), the bsa cluster only (II and IX), or lack both. In this sense, vSa␤ type IV can be considered the most complete of all vSa␤ types regarding virulence-related gene content and phage. Moon et al. (14) showed that this very phage of strain RF122 (vSa␤ type IV) can be mobilized in vitro, resulting in heterogeneous, yet overlapping particles that integrated sequentially through recombination into the host; in some cases, it even resulted in the transfer of the almostcomplete vSa␤ (14).
Our results also demonstrate that all vSa␤ islands containing the key virulence factors also harbored partial phages, suggesting that vSa␤ originated in an ancestral S. aureus strain through a phage integration event. In the course of evolution, multiple recombination, integration, and excision events may have occurred, resulting in various combinations of the virulence genes that we now observe in the different vSa␤ types. To our knowledge, vSa␤ type IV is the only type to harbor a complete phage that has been shown to be mobilized, suggesting that the phages of the other vSa␤ types are likely to have lost their mobility in the course of evolution.
The unequivocal location, the demonstrated mobility (14), and the ubiquitous presence of phage particles in vSa␤ islands, as well as a conserved phage attachment site, strongly support the hypothesis that vSa␤ was mediated by a phage, followed by diversification into the types observed today.
vSa␤ and clonality. In total, vSa␤ typing of 103 S. aureus strains revealed 15 different types. They correlated strongly with the CCs and were always linked to one specific vSa␤ type. Therefore, the CC of S. aureus allows the link to virulence-associated vSa␤ key genes. This link has been made previously (18), but by assessing and categorizing the novel 12 vSa␤ types, we confirmed this observation to be a general principle. Indeed, all studied strains follow this pattern, indicating that the horizontal acquisition of vSa␤ in an ancestral S. aureus strain happened before divergence into the clones occurred. It is evident that the primordial vSa␤ underwent multiple genetic changes (e.g., recombination and duplications), likely just prior or simultaneously to clone formation, resulting in the types observed today.
The evolutionary link of vSa␤ types and CCs are in perfect agreement with the topology of the phylogenetic tree by Boss et al. (19). For that study, the authors concatenated 7 S. aureus-specific genes from 30 different strains and aligned these using the Needleman-Wunsch algorithm. This alignment was then used to construct a maximum parsimony phylogeny. This phylogeny showed the evolutionary relationship between some of the most common S. aureus CCs that all possess a vSa␤ and a type-specific set of virulence genes. Therefore, it is evident that the common ancestor of these CCs already carried an ancestral vSa␤.
The discrepancy between the overall structures of vSa␤ types V and VII and all other vSa␤ types can be explained by an evolutionary loss of the vSa␤ key genes in these types during two separate events. We can still find vSa␤ typical elements, such as the conserved regions marking the 5= and 3= ends of vSa␤ or the almost ubiquitous hp FIG01108826 in the 3= region. Furthermore, type V harbors sequences that can be found in a number of other vSa␤ types, such as 3 hp genes with an assigned FIG  number located the 5= region (FIG01108398, FIG01108644, and FIG01108514), lpn, and hysA. Interestingly, type VII encodes different virulence factors that tackle the host's immune defenses. As vSa␤ types V and VII have additional transposase genes, we suggest that in these two types, the vSa␤ key genes were replaced by a transposon, indicating that vSa␤ themselves may be a hot spot for inserting mobile genetic elements and potentially accumulating virulence factors. vSa␤ type XIII has a deviate structure, as it lacks the typical 5= region found in the other types. Together with types V and VII, it lacks the spl cluster and the luk genes. We cannot exactly locate the beginning of this vSa␤ type on the genome, but interestingly, we found two very recently discovered phage-associated enterotoxin genes (sel26 and sel27) in that region (20). Furthermore, we found an hp (FIG01108398) which is adjacent to the SAV1803 in some other types.
Virulence and vSa␤. The variable regions contribute to the fate of an S. aureus strain on a given host and its disease-causing potential, as these regions encode a number of virulence factors (6,10). They include the pore-forming LukD and LukE, which are present on most vSa␤ types (except III, V, VII, and XIII). Both proteins are members of the leukocidin family (21) that form pores in the lipid bilayer of host cells, particularly in neutrophils, leading to cell death and, therefore, the promotion of immune evasion and progression of the infection (22,23).
Further virulence factors encoded on vSa␤ are the Spl proteins that are unique to S. aureus and are organized in an operon (11). vSa␤ types V, VII, and XIII lack the spl cluster, whereas all other types have a spl cluster, some of them with truncated genes. Based on our phylogenetic approach, there are 4 different spl genes (splA, splB, splC, and splD) and 4 gene variants of splD (splD 1 , splD 2 , splD 3 , and splD 4 ). SplA, SplB, and SplC form their own clades and are clearly separated from the SplD clade (Fig. 2B). The former SplD and SplF are now members of the SplD 2 clade, which is unsurprising, as their amino acid identity can be as high as 94% (11,24). The SplD 3 and SplD 4 clades consist both of Spl sequences that have not been previously studied and did not match any of the predefined spl genes based on sequence similarity. We observed a high conservation of the spl operon within but not between vSa␤ types. The function of spl genes in infection and disease is still largely unclear (24)(25)(26)(27). Studies showed that spl genes are expressed and secreted during host infection (25,27) and have been linked to allergic reactions (28,29). Interestingly, vSa␤ can harbor multiple copies of the very same spl gene. The plasticity of the spl cluster is an evidence for its varied importance among different vSa␤ types.
Hyaluronic acid is a major component of the connective tissue, in particular, the extracellular matrix (30). HysA is considered a virulence factor, as it can depolymerize hyaluronic acid and favoring the spreading of infection (31,32). Many S. aureus genomes contain a chromosomal copy of hysA (33) outside of vSA␤. The additional copy present in the vSa␤ region of types III to VIII and XIV may be linked to enhanced invasiveness of these clones.
Lantibiotics (or bacteriocins) are antimicrobial peptides produced by some Grampositive bacteria against closely related species (34,35) and are thought to play a role in colonization by outcompeting other bacteria (10,24). vSa␤ typing. In contrast to the link of CC and vSa␤ types, it has been suggested that the vSa␤ type is dependent on the strain's HsdS sequence (8). These predictions were not very reliable, as they were based on 3 vSa␤ types and 12 sequenced S. aureus strains only. From a total of 15 vSa␤ types, we defined 12 new types based on multiple hosts and geographical regions. Within the scope of our study, the method used for typing the vSa␤ region proved to be a very robust procedure, as all used strains could be exclusively allocated to one vSa␤ type, and cross-classification was never observed. Hence, we expect the specified principles to be robust enough to uphold future findings when additional S. aureus strains are analyzed.
In the scope of this study, we limited the data set to clinically relevant S. aureus strains either from human or animal infections. Strains that were not invasive (i.e., colonizers and strains isolated from foods) were excluded from the data set. Hence, our results are limited to these invasive strains only, yet we did find that a few colonizers also fit into the system proposed here (data not shown). In the future, more vSa␤ regions need to be analyzed spanning more hosts and including noninvasive strains.
Conclusions. In general, the vSa␤ islands harbor a number of virulence-associated and pathogenic genes with different scopes of action. While the exact functions of many of these genes are yet to be unraveled, it is clear that they have severe effects on the host's health and are likely to play key roles in S. aureus adaptation to the clinical microenvironment. Our data support a viral origin of the vSa␤ region. As the vSa␤ type is strongly linked to a strain's CC, acquisition of the vSa␤ region happened in a very ancestral S. aureus strain while the transformation to the distinct vSa␤ subtypes occurred before or simultaneously to diversification into the different clones. The here-suggested superordinate system to classify S. aureus strains based on their vSa␤ region may be used in the future to assess the clinical potential of an S. aureus strain. In the future, more vSa␤ genomic islands need to be analyzed and categorized into this superordinate system.

MATERIALS AND METHODS
Sequencing. The in-house S. aureus genome collection includes 23 strains that were previously sampled from bovine mastitis (36). All strains were kept in skim milk at -20°C and were recultured at 37°C for 24 h on blood agar (bioMérieux Suisse s.a., Geneva, Switzerland). Plates were sent to Microsynth AG (Balgach, Switzerland) for DNA extraction and subsequent whole-genome sequencing (WGS), initially by the 454 (Roche, Basel, Switzerland) and later by the Illumina (Illumina, Inc., San Diego, CA) technology, as the 454 method was no longer available. For de novo assembly of the reads to contigs, they used the Newbler v.2.6 assembler for the 454 (Roche) technology and SPAdes v.3.1 (37) for the Illumina technology (see File S1 in the supplemental material for details). In addition, contigs of 4 bovine S. aureus strains after Illumina WGS were provided by M. Luini (IZSLER, Lodi, Italy) and P. Cremonesi (CNR, Lodi, Italy).
Data collection. In an attempt to characterize the vSa␤ islands of our 27 bovine mastitis S. aureus genomes as described by Baba et al. (8), it turned out that many of them did not match the 3 previously defined types proposing a much higher diversity of vSa␤; hence, an in-depth characterization based on more data was required. To do so, more sequences were collected by a BLAST search (36) of each known, presumptively novel, vSa␤ against the NCBI nonredundant/nucleotide (nr/nt) database (https://www .ncbi.nlm.nih.gov/nucleotide/) using the default BLAST settings and limiting the search to S. aureus. To increase the diversity of the vSa␤ islands, the NCBI databases were also evaluated for genomes or contigs of clinical S. aureus strains from hosts other than humans or cattle. This approach was selected, as invasive strains of S. aureus are host specific (19,38,39), possibly accounting for additional types of vSa␤ islands. For each vSa␤ type, Ն10 chromosomal sequences of clinical S. aureus were then attempted to be retrieved. If this was impossible, the BLAST search was further extended to the NCBI whole genome shotgun contig database (https://www.ncbi.nlm.nih.gov/assembly/). All available chromosomal sequences were then retrieved, including all contigs containing a complete vSa␤ region.
Using this approach, a total of 76 sequences were obtained, with 43 sequences from genomes and 33 sequences from contigs (see File S2).
Data analysis. (i) Data set. In total, 27 of our own and 76 publicly available sequences were included in the present study. From these 103 analyzed S. aureus strains, 58 originated from human hosts, 33 were from bovine hosts, and 7 were from ovine hosts. For 5 strains, no host information was available. Of these 103 strains, 37 strains showed known vSa␤ types (I to III), and 66 strains showed novel vSa␤ types (IV to XV) (see File S2 for details).
(ii) vSa␤ typing. The vSa␤ regions were identified on genome sequences by aligning the sequences of conserved hp genes flanking vSa␤ at the 5= end (SAV1803) and 3= end (SAV1831) (8) using the Clone Manager Professional 9 (CM9) software (Scientific & Educational Software, Denver, CO). Subtyping of the vSa␤ islands was then based on the initial work by Baba et al. (8). Using the hsdS gene of 12 clinically relevant S. aureus strains, Baba et al. grouped them into three vSa␤ types (vSa␤ I, vSa␤ II, and vSa␤ III [8]). We then aligned the vSa␤ sequences of these 12 strains in the CM9 software using the Needleman-Wunsch algorithm and found that the overall similarities within a vSa␤ type were Ն90%. For each of these vSa␤ types, a representative sequence was then selected and considered the type-specific reference sequence (Table 1). Disregarding included phages and transposons, vSa␤ islands showing overall sequence similarities of Ͻ90% compared to all the existing reference sequences were then considered new vSa␤ types. Through the iterative process of aligning nontyped vSa␤ islands to each of the vSa␤ type reference sequences, the 103 sequences of the data set were grouped into 15 vSa␤ types (vSa␤ I to XV, Table 1).
(iii) HsdS phylogenetics. To infer the phylogeny of the HsdS proteins (37,40), the corresponding nucleotide sequences were translated using the standard code. Proteins that differed in at least one amino acid were then used for a multiple-sequence alignment (MSA) in the CM9 software using the Needleman-Wunsch approach. The MSA was exported to the BioEdit software (http://www.mbio.ncsu .edu/BioEdit/bioedit.html) for visual inspection and manual curation. Afterwards, the curated MSA was imported into the MEGA X software (41) to assess a maximum likelihood (ML) phylogeny of the HsdS proteins. To do so, models using 9 different substitution matrices were computed including or leaving out modeling for invariant sites and for the evolutionary rate differences among sites by a discrete gamma distribution. The optimal model was then selected based on the lowest value of the Akaike information criterion resulting in the JTT substitution matrix (37), with specific parameters for the gamma distribution. This model was then used to construct a phylogenetic ML tree (MEGA X software [41]). Initial trees were built by applying the maximum parsimony algorithm. The topology with the highest log-likelihood value was then selected (Fig. 2A). Strains lacking the hsdS gene were excluded.