BI524
Computational Foundations of Bioinformatics
Tu Th 1:30--3:00 in Higgins 425.
Office hours: We 2:00 -- 3:00, Thur 10:30--11:30 in Higgins 577
Course description
|
Text
|
Grading policy
|
Academic Integrity Policy
|
Homework
|
Class Notes
|
Demos
|
Tests
Biology is increasingly a field dominated by high-throughput methods,
yielding large data sets which require data analysis using both public
domain/commercial software as well as new algorithms to be
implemented in a programming language. Bioinformatics is an
interdisciplinary area concerned with the application of
mathematics, statistics and programming to solve mainstream problems in
biology, problems such as the following.
-
According to the ENCODE Consortium article in Nature (June 14, 2007),
the human genome is "pervasively transcribed", much of it in RNA
transcripts of unknown function (i.e. not tRNA, mRNA, rRNA, miRNA, etc.).
What does all this biological "dark matter" really do?
-
According to an article of Aldman and Terzic in the
Journal of the American Medical Association (November 2005),
"the field of clinical oncology is poised for unprecedented innovation,
reflecting the confluence of breakthroughs in decoding disease pathobiology
in the context of high-throuhput enabling technologies". A key issue in
molecular pathology is the development of statistical models and
computer programs to determine biologically significant gene expression
patterns for certain kinds of disease.
In this course, you will learn how to navigate through public databases
containing protein conformations, nucleotide and amino acid sequences,
etc. and how to use bioinformatics software (BLAST, ClustalW,
Vienna RNA Package, etc.). This is fun and easy, but not the main focus of the
course -- a more in depth treatment of such issues is given in the course
BI420 Introduction to Bioinformatics.
The goal of the course, which assumes no prior experience in computer
programming, is to learn to use the UNIX operating system and how
to write interpreted programs called scripts,
in order to parse
biological files (PDB, GenBank, etc.), to implement some bioinformatics
algorithms, to invoke executable code from within a program, etc.
In the course, we will focus
principally on the language Python, a simple, elegant scripting language,
and towards the end of the course will additionally cover
aspects of Perl.
Return to table of contents
Required Texts
-
"Developing Bioinformatics Computer Skills",
by Cynthia Gibas and Per Jambeck,
O'Reilly & Associates, Inc. (2001),
ISBN 1-56592-664-1.
Perl run-time environment, documentation and tutorial:
http://www.perl.org/.
This text is a well-written introduction to biological
databases, tools and public domain software, UNIX, and Perl.
- Python, by Chris Fehily, Peachpit Press,
Visual Quickstart Guide, 0-201-74884-3 (2002).
Python run-time environment, documentation and tutorial:
http://www.python.org/.
This text presents the basics of Python programming language.
Optional Texts
Reference list of good texts if you choose to go on.
(Do not purchase.)
-
Beginning Perl for Bioinformatics:
An introduction to Perl for Biologists, by J. Tisdall, O'Reilly (2001).
-
Python course in Bioinformatics, by Katja Schuerer and
Catherine Letondal (Institut Pasteur).
You can download and print off the pdf file which is
available as well.
-
"Python Essential Reference", Second Edition,
David M. Beazley,
New Riders Publishing (a Prentice-Hall company),
ISBN 0-7357-1091-0
Excellent reference work with good glossary for finding Python syntax.
See
http://islab.cs.uchicago.edu/python/.
If you'd really like program efficiently in Python, then I've
found this book to be indispensible (i.e. strongly recommended).
-
"Bioinformatics: A practical guide to the analysis of
genes and proteins", edited by A.D. Baxevanis and B.F.F. Ouellette,
second edition, Wiley & Sons, Inc. (2001).
-
"Learning Python", by
Mark Lutz and David Ascher,
O'Reilly Publishing Co., (1999),
ISBN 1-56592-464-9.
Introductory text to Python programming with some
example programs and good overview of basic syntax and applications
of the language. I've found that the glossary is of limited use,
since many important terms are not listed there.
-
"Learning the UNIX Operating System", Fourth Edition,
by Jerry Peek, Grace Todino & John Strang,
O'Reilly & Associates, Inc.,
ISBN: 1-56592-390-1
Unix is the best platform for efficient work in bioinformatics,
so this tutorial will help you to learn Unix. Though not required
in this introductory course, since I work on Unix, all class examples,
etc. will be demonstrated from a Linux platform, rather than
Macintosh or Windows. We will not spend class time covering Unix;
however, if you plan to do research in computational biology, you'll need
to learn Unix on your own.
-
"A Primer of Genome Science",
by G. Gibson and S.V. Muse,
Sinauer Associates, Inc. (2002).
Return to table of contents
| Homework, class participation |
30% |
| Midterm |
30% |
| Final Exam |
40% |
The grading policy is subject to change. If so, then this will be
clearly announced with ample time.
Academic integrity is central to the mission of higher education. Please
observe the highest standards of academic integrity in this course. Please
review the standards and procedures that are published in the univeristy
catalog and on the web, at:
http://www.bc.edu/offices/stserv/academic/resources/policy/#integrity.
Make sure that the work you submit is in accordance with university
policies. If you have any questions, please consult with me. Violations
will be reported to the Deans' Office and reviewed by the College's
Committee on Academic Integrity. This could result in failure in the
course or even more severe sanctions.