ABSTRACT
Helicobacter pylori is a genetically diverse and coevolved pathogen inhabiting human gastric niches and leading to a spectrum of gastric diseases in susceptible populations. We describe the genome sequence of H. pylori 908, which was originally isolated from an African patient living in France who suffered with recrudescent duodenal ulcer disease. The strain was found to be phylogenetically related to H. pylori J99, and its comparative analysis revealed several specific genome features and novel insertion-deletion and substitution events. The genome sequence revealed several strain-specific deletions and/or gain of genes exclusively present in HP908 compared with different sequenced genomes already available in the public domain. Comparative and functional genomics of HP908 and its subclones will be important in understanding genomic plasticity and the capacity to colonize and persist in a changing host environment.
Helicobacter pylori is a highly recombining pathogen (1, 4, 5, 6, 10, 14, 18) widely known to be coevolved with its human host (3). In cases of mixed H. pylori infections, recombinant strains emerge with different allelic compositions (6, 10). An important mechanism supportive of its adaptive evolution (3) could be insertion-deletion and substitution polymorphisms of either individual gene loci, such as cagA or vacA (6, 16), or the entire genomic islands (6, 14). In addition, genotyping of serial isolates obtained from the same patient revealed similar fingerprints with minor differences (11, 14, 16), possibly due to independent genomic alterations. This phenomenon has been described as “microevolution” (6, 10, 16). However, since DNA fingerprinting alone (12) is not informative enough, whole genome sequences are necessary (2) to verify microevolution.
H. pylori strain 908, a close relative (6) of H. pylori J99, was isolated from an African patient living in France, who suffered from duodenal ulcer disease (6, 16). We sequenced the genome of HP908 using the Illumina genome analyzer (GA2x, pipeline ver1.6). The sequence reads (∼60 Mb of 101 bp paired-end reads with an insert size of 300 bp) were assembled using Velvet software with the hash length set to 21 (19). The assembled contigs were aligned to the published genome sequence of H. pylori J99 using BLAT (13).
Annotation was performed using RAST (7), Artemis (17), Glimmer (9), Genemark (8) and EasyGene (15) software. The HP908 chromosome was 1,549,666 bp in length, with a G+C content of 39.3% and a coding percentage of 91.5. There are 1,548 protein-coding sequences (CDSs), with an average length of 883 bp, 36 transfer RNAs, and three rRNA loci.
Our comparative genomic analysis based on the draft sequence assembly of HP908 (and thus provisional at this stage) revealed that nearly 65, 178, 209, and 151 ORFs were absent in the HP908 genome compared to strains J99, G27, P12, and 26695, respectively (of which 21 are from the plasticity region of J99). The majority of the genes specific to HP908 were functionally unknown, although some represented ORFs corresponding to an ABC transporter(s), ATP binding protein, and transcription repair coupling factor(s), etc. Other strain-specific genes included those corresponding to outer membrane proteins and cag proteins, such as cag7 and cagY-like proteins. Strain 908 possesses complete general secretion machinery, which helps in secretion of outer membrane proteins to the extracellular environment from the inner membrane. It also contains an intact cag pathogenicity island (cag PAI) and the genomic island tfs3 and possesses virulence-associated alleles of vacA. The genome sequence also revealed the presence of putative virulence factors, such as ORFs JHP917-JHP918 of the dupA locus and an intact HP986 gene of the plasticity region cluster of H. pylori 26695.
Finally, we hope the genome sequence will be invaluable for understanding by comparative genomics the intriguing aspects of H. pylori's evolution and adaptation in different host populations.
Nucleotide sequence accession numbers.
The draft genome can be accessed via GenBank under accession number CP002184 and project ID 50869; GenBank IDs for the sequences of the cag PAI and tfs3 are EF195721 and EF195724, respectively.
ACKNOWLEDGMENTS
This genome program was funded by the University of Hyderabad through intramural startup grants to Niyaz Ahmed. The study was carried out under the wider umbrella of the European Helicobacter Study Group (EHSG), of which Niyaz Ahmed and Francis Megraud are fellows. Functional annotation of the HP908 genome is part of the planned studies under the Indo-German Research Training Group, Internationales Graduiertencolleg (GRK1673), Functional Molecular Infection Epidemiology, an initiative of the German Research Foundation (DFG) and the University of Hyderabad (India), of which Niyaz Ahmed is a speaker. S. Haritha Devi received her postdoctoral fellowship under the UoH-DBT/CREBB program of the University of Hyderabad and the Indian Department of Biotechnology of the Ministry of Science and Technology.
We are thankful to Barry Marshall, Seyed E. Hasnain, Leonardo A. Sechi, and Ramy K. Aziz for helpful advices and suggestions. Our thanks are also due to Ayesha Alvi, S. Manjulata Devi, and Hervé Lamouliatte for help at various stages of this study. Niyaz Ahmed is an adjunct professor of molecular biosciences at the University of Malaya, Kuala Lumpur, Malaysia, and an adjunct professor of chemical biology at the Institute of Life Sciences, Hyderabad, India.
FOOTNOTES
- Received 17 September 2010.
- Accepted 5 October 2010.
- Copyright © 2010 American Society for Microbiology