This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Robins, H.
Right arrow Articles by Levine, A. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Robins, H.
Right arrow Articles by Levine, A. J.

 Previous Article  |  Next Article 

Journal of Bacteriology, December 2005, p. 8370-8374, Vol. 187, No. 24
0021-9193/05/$08.00+0     doi:10.1128/JB.187.24.8370-8374.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.

A Relative-Entropy Algorithm for Genomic Fingerprinting Captures Host-Phage Similarities

Harlan Robins,1*,{dagger} Michael Krasnitz,1*,{dagger} Hagar Barak,2 and Arnold J. Levine1

Institute for Advanced Study, Natural Sciences, Einstein Drive, Princeton, New Jersey 08540,1 Molecular Biology Department, Princeton University, Washington Road, Princeton, New Jersey 085442

Received 15 August 2005/ Accepted 3 October 2005

The degeneracy of codons allows a multitude of possible sequences to code for the same protein. Hidden within the particular choice of sequence for each organism are over 100 previously undiscovered biologically significant, short oligonucleotides (length, 2 to 7 nucleotides). We present an information-theoretic algorithm that finds these novel signals. Applying this algorithm to the 209 sequenced bacterial genomes in the NCBI database, we determine a set of oligonucleotides for each bacterium which uniquely characterizes the organism. Some of these signals have known biological functions, like restriction enzyme binding sites, but most are new. An accompanying scoring algorithm is introduced that accurately (92%) places sequences of 100 kb with their correct species among the choice of hundreds. This algorithm also does far better than previous methods at relating phage genomes to their bacterial hosts, suggesting that the lists of oligonucleotides are "genomic fingerprints" that encode information about the effects of the cellular environment on DNA sequence. Our approach provides a novel basis for phylogeny and is potentially ideally suited for classifying the short DNA fragments obtained by environmental shotgun sequencing. The methods developed here can be readily extended to other problems in bioinformatics.


* Corresponding author. Mailing address: Institute for Advanced Study, Natural Sciences, Einstein Drive, Princeton, NJ 08540. Phone: (609) 734-8318. Fax for H. Robins: (609) 951-4489. E-mail: hrobins{at}ias.edu. Fax for M. Krasnitz: (609) 951-4438. E-mail: krasnitz{at}ias.edu.

{dagger} H.R. and M.K. contributed equally to this work.


Journal of Bacteriology, December 2005, p. 8370-8374, Vol. 187, No. 24
0021-9193/05/$08.00+0     doi:10.1128/JB.187.24.8370-8374.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.




This article has been cited by other articles:

  • Mrazek, J. (2009). Phylogenetic Signals in DNA Composition: Limitations and Prospects. Mol Biol Evol 26: 1163-1169 [Abstract] [Full Text]  
  • Robins, H., Krasnitz, M., Levine, A. J. (2008). The Computational Detection of Functional Nucleotide Sequence Motifs in the Coding Regions of Organisms. Exp. Biol. Med. 233: 665-673 [Abstract] [Full Text]  
  • Yu, Z., Li, Z., Jolicoeur, N., Zhang, L., Fortin, Y., Wang, E., Wu, M., Shen, S.-H. (2007). Aberrant allele frequencies of the SNPs located in microRNA target sites are potentially associated with human cancers. Nucleic Acids Res 35: 4535-4541 [Abstract] [Full Text]