Welcome to the PolyBayes Web Site!


PolyBayes is a computer program for the automated analysis of single-nucleotide polymorphism (SNP) discovery in redundant DNA sequences. The primary motivation for its development is to provide a general and reliable tool for the discovery of genetic variations in what is an exponentially increasing volume of sequence data in public and private databases. The software integrates algorithmic solutions to three of the main challenges in sequence-based SNP discovery:

As its main output, the PolyBayes program produces a list of candidate polymorphic sites, each site with an associated SNP probability score that has been demonstrated to accurately forecast the true positive rate in subsequent validation experiments. A selectable score threshold allows the user to strike a balance between highly accurate predictions and the recovery of additional, rare polymorphisms, or SNPs in low quality sequences.

The software is easily integrated into the Phred/Phrap/Consed infrastructure developed at the University of Washington. Multiple alignments marked up with SNP information can be viewed directly with the Consed sequence viewer.

PolyBayes was developed at the Washington University Genome Sequencing Center. Since its inception, it has been used in various sequence-based SNP discovery projects.

Additional information about PolyBayes is available in a publication or by visiting the slide show page. With questions, please feel free to contact the authors.

Software information

Programming language: Perl 5

Current software version: PolyBayes 3.0 Release: 2001-11-01


The licensing practice of the PolyBayes software is about to change. For academic and other not-for-profit use, you will be able to download from this site, after agreeing to the licensing terms. However, currently all licensing is done through the Washington University Office of Technology Management. Even so, there is no licencing fee for academic and other not-for-profit use. Until the simplified licensing takes effect, if you are interested in obtaining the software, and/or to inquire about commercial licencing terms, please mail:

Tom Hagerty, Marketing and Operations Manager, Washington University Office of Technology Management (hagertyt@wustl.edu)

or contact the lead author at marth@bc.edu.


The PolyBayes software was first applied in a pilot project for SNP detection in expressed sequence tags (ESTs) anchored to genomic clone sequences. Results and a more detailed algorithmic description (A general approach to single-nucleotide polymorphism discovery) were recently published in the December 1999 issue of Nature Genetics. (Medline reference: Nat Genet 1999 Dec;23(4):452-456). We have also included a slide show, an earlier, shorter version of which was presented at the 1999 Cold Spring Harbor Meeting on Genome Sequencing & Biology.

SNP mining projects

PolyBayes serves as the SNP detection engine behind the SNP discovery effort in two large-scale genome-wide SNP discovery projects:


Underneath we provide contact information for the authors of and main contributors to the PolyBayes software.

Gabor Marth

Email: marth@bc.edu
WWW: http://www.bc.edu/schools/cas/biology/facadmin/marth
Affiliation: Boston College Department of Biology
Address: 140 Commonwealth Ave. Higgins Hall, Romm 415, Chestnut Hill, MA 02467
Telephone: +1.617.552-3571

Ian Korf

Email: ik1@sanger.ac.uk
WWW: http://sapiens.wustl.edu/~ikorf
Affiliation: Sanger Institute

Mark Yandell

Email: myandell@fruitfly.org
WWW: http://sapiens.wustl.edu/~myandell
Affiliation: Berkeley Drosophila Genome Project

Raymond Yeh

Email: ryeh@sapiens.wustl.edu
WWW: http://sapiens.wustl.edu
Affiliation: Washington University Genome Sequencing Center
Address: 4444 Forest Park Blvd., St. Louis, MO 63130
Telephone: +1.314.286-1845

Nathan Stitziel

Email: nstitz1@uic.edu
WWW: http://genome.wustl.edu/gsc/Info/staff/nstitzie
Affiliation: University of Illinois - Chicago
Address: 2028 N. Winchester, Chicago, IL 60614
Telephone: +1.773.772-7994

LaDeana Hillier

Email: lhillier@watson.wustl.edu
Affiliation: Washington University Genome Sequencing Center
Address: 4444 Forest Park Blvd., St. Louis, MO 63130
Telephone: +1.314.286-1811

Pui-Yan Kwok

Email: kwok@itsa.ucsf.edu
Affiliation: University of California, San Francisco
Address: 505 Parnassus Ave, Box 0130, Long 1332A, San Francisco, CA 94143-0130
Telephone: +1.415.514-3802

Warren Gish

Email: gish@blast.wustl.edu
WWW: http://blast.wustl.edu
Affiliation: Washington University Genome Sequencing Center
Address: 4444 Forest Park Blvd., St. Louis, MO 63130
Telephone: +1.314.286-1836

Contact information

For technical information, please contact the lead author at: marth@bc.edu, or one of the contributing authors. Information about licensing can be obtained from Tom Hagerty at hagertyt@wustl.edu.

Comments: Gabor T. Marth, marth@bc.edu at the Biology Department, Boston College
Last modified: Tue Dec 16 23:58:14 2003