BI524
Computational Foundations of Bioinformatics
Tu Th 1:30--3:00 in Higgins 425.
Office hours: Monday 5:00 -- 6:00, Thursday 10:30--11:30 in Higgins 577,
or by appointment
Course description
|
Text
|
Grading policy
|
Academic Integrity Policy
|
Homework
|
Class Notes
|
Demos
|
Tests
Biology is increasingly a field dominated by high-throughput methods,
yielding large data sets which require data analysis using both public
domain/commercial software as well as new algorithms to be
implemented in a programming language. Bioinformatics is an
interdisciplinary area concerned with the application of
mathematics, statistics and programming to solve mainstream problems in
biology, problems such as the following.
-
According to an article of Aldman and Terzic in the
Journal of the American Medical Association (November 2005),
"the field of clinical oncology is poised for unprecedented innovation,
reflecting the confluence of breakthroughs in decoding disease pathobiology
in the context of high-throughput enabling technologies". A key issue in
molecular pathology is the development of statistical models and
computer programs to determine biologically significant gene expression
patterns for certain kinds of disease. Another key issue concerns the
application of machine learning techniques, such as
support vector machines, to "learn" to distinguish
the expression profile of healthy cells from tumor cells.
In this course, you will learn how to write programs in the Python
programming language, in order to "parse" biological data -- files of protein
and RNA 3-dimensional conformations, annotated genomic data from
NCBI (National Center for Biotechnology Information) and
EMBL (European Molecular Biology Laboratory), how to run compiled
code, such as BLAST or Vienna RNA Package from within a script, and
to parse the output, how to build a bioinformatics web server, and
how to train and test a support vector machine for a bioinformatics
classification problem, such as determining RNA polymerase binding sites
within a genome.
The goal of the course, which assumes no prior experience in computer
programming, is to enable you to work on a UNIX platform,
the most important operating system for bioinformatics research,
and to to write interpreted programs called scripts,
for the problems listed in the previous paragraph. In the course, we will focus
principally on the language Python, a simple, elegant scripting language,
and towards the end of the course will additionally cover
aspects of Perl.
Although this course has no prerequisites, you may find it helpful
to have already taken BI420 "Introduction to Bioinformatics", a non-programming
introduction to bioinformatics, some databases and public domain tools.
BI420 is by no means a requirement -- you can certainly take BI424 without
first having learned about current biological databases and public domain
tools. However any biology major, who wants to be able to work with biological
data beyond the limitations provided by public domain web servers, will
want to learn the techniques of "scripting" taught in this course.
Return to table of contents
Required Texts
-
"Starting Out With Python", by Tony Gaddis, Pearson/Addison-Wesley Publishing
Company,
ISBN-13:978-0-321-53711-9
ISBN-10:0-321-53711-4
(2009).
-
"Developing Bioinformatics Computer Skills",
by Cynthia Gibas and Per Jambeck,
O'Reilly & Associates, Inc. (2001),
ISBN 1-56592-664-1.
-
Perl run-time environment, documentation and tutorial:
http://www.perl.org/.
-
Python run-time environment, documentation and tutorial:
http://www.python.org/.
Optional Texts
Reference list of good texts if you choose to go on in bioinformatics.
(Do NOT purchase. This list is provided for those who get really interested
in bioinformatics and would like some suggested texts for future reading.)
-
Beginning Perl for Bioinformatics:
An introduction to Perl for Biologists, by J. Tisdall, O'Reilly (2001).
-
"Python Essential Reference", Second Edition,
David M. Beazley,
New Riders Publishing (a Prentice-Hall company),
ISBN 0-7357-1091-0
Excellent reference work with good glossary for finding Python syntax.
See
http://islab.cs.uchicago.edu/python/.
If you'd really like program efficiently in Python, then I've
found this book to be indispensible (i.e. strongly recommended).
-
"Bioinformatics: A practical guide to the analysis of
genes and proteins", edited by A.D. Baxevanis and B.F.F. Ouellette,
second edition, Wiley & Sons, Inc. (2001).
-
"Learning the UNIX Operating System", Fourth Edition,
by Jerry Peek, Grace Todino & John Strang,
O'Reilly & Associates, Inc.,
ISBN: 1-56592-390-1
Unix is the best platform for efficient work in bioinformatics,
so this tutorial will help you to learn Unix. Though not required
in this introductory course, since I work on Unix, all class examples,
etc. will be demonstrated from a Linux platform, rather than
Macintosh or Windows. We will not spend class time covering Unix;
however, if you plan to do research in computational biology, you'll need
to learn Unix on your own.
Return to table of contents
| Homework, class participation |
30% |
| Midterm |
30% |
| Final Exam |
40% |
The grading policy is subject to change. If so, then this will be
clearly announced with ample time.
Academic integrity is central to the mission of higher education. Please
observe the highest standards of academic integrity in this course. Please
review the standards and procedures that are published in the univeristy
catalog and on the web, at:
http://www.bc.edu/offices/stserv/academic/resources/policy/#integrity.
Make sure that the work you submit is in accordance with university
policies. If you have any questions, please consult with me. Violations
will be reported to the Deans' Office and reviewed by the College's
Committee on Academic Integrity. This could result in failure in the
course or even more severe sanctions.