03-510/42-434. Computational Biology, Spring 2009 (12 units)

Lectures: TuTh 3:00-4:20 Doherty Hall 2302
Recitation: F 1:30-2:20 Doherty Hall 2302    
Instructor: Robert F. Murphy TA: Ayshwarya Subramanian
Offices: C120 Hammerschlag Hall and 409G Mellon Institute, x83480 654F Mellon Institute
Email: murphy@cmu.edu ayshwarya@cmu.edu
Office Hours: Wednesday 6pm-7pm and Thursdays 11am-12pm


This course presents an overview of important applications of computers to solve problems in biology. It is intended for undergraduate students with good computer programming experience (graduate students should take the graduate version, 03-710/42-734). Major topics covered are computational molecular biology (analysis of protein and nucleic acid sequences), biological modeling and simulation (including computer models of population dynamics, biochemical kinetics, cell pathways, neuron behavior, and mutation) and biological imaging (digital image processing, morphological image analysis, and image classification). Course work consists of homework assignments making use of existing software for these applications, reading of scientific papers, and programming assignments in a language chosen by the student from C, C++, Java, Perl or Matlab. Students may only use one of the following for credit, 03-310, 03-311, 03-510, 03-710, 42-434, 42-534, or 43-734. Prerequisites: Modern Biology (03-121), Calculus II (21-112/21-122), and Advanced Programming (15-200) or Fundamental Data Structures and Algorithms (15-211), or permission of instructor.

Lecture Format

Lectures for 03-510/42-434/03-710/42-734 will be given in common on Tuesdays and Thursdays from 3:00 to 4:20.

Getting Help

(1) There will be a recitation from 1:30 to 2:20 on Fridays in DH 2302. The recitation session will include graded quizzes, review of material covered during the previous week and answers to questions regarding homeworks.

(2) You are encouraged to meet with the TA during office hours or to email the instructor to schedule a time to meet.


50% of the course grade will be derived from homework assignments and 5% from quizzes given during the recitation. 20% will be derived from the midterm exam (March 5th) and 25% from the final exam. Homework must be submitted by the 3:00 PM on the due date in order to receive any credit. No credit will be given after that date unless an extension has been requested at least 24 hours PRIOR to the due date. Most homework assignments will include opportunities for extra credit. After grading, homeworks may be corrected and submitted for regrading within one week of return. In this case, the grade for the assignment will be the average between the original grade and the grade after correction.

Students should read the university policy on cheating and plagiarism. Students may not copy any portion of a homework assignment from another student, nor may they jointly prepare all or part of an assignment. Examples of unacceptable collaboration are:

(1) jointly doing an analysis and then printing multiple copies of the results.

(2) using any portion of a spreadsheet or program written by someone other than the instructor.


Computational Molecular Biology: Analysis of Nucleic Acid and Protein Sequence (Jan. 13 through Mar. 3).The methods by which computers are used to manipulate and analyze sequences and structures will be covered. Students will become familiar with Internet-based information services relevant to biology (including Entrez and the BLAST server) and will gain understanding of the principles behind major sequence analysis methods. Topics will include:

-information retrieval with Entrez and Web browsers

-statistics of sequence patterns

-basics of machine learning for molecular biology

-pairwise sequence alignment

-comparison with sequence databases

-finding sequence motifs

-finding protein coding regions

-finding genes

-clustering genes by expression

-prediction of macromolecular properties

-retrieving and displaying macromolecular structures

Computational Cell Biology: Biological Modeling and Imaging (March 17 to Apr 30). A range of approaches used to model the behavior of biological systems will be covered. Underlying concepts covered include recursion relations, phase plane analysis, parameter identifiability, and description of systems using differential equations. Applications of digital image processing and analysis to biological images, especially from fluorescence microscopy, will be discussed, particularly in the context of building models from images. Spreadsheets (e.g., Excel), symbolic mathematics packages (e.g., Mathematica or Maple), and Matlab will be used. Specific topics include:

-recursion relations in population dynamics

-biochemical kinetics

-cellular pathways

-simulation of action potentials

-compartmental analysis

-acquiring and viewing digital images

-image processing: basic operations, automation

-pattern analysis: feature extraction, classification, clustering

-model building from images

Required Text

Neil C. Jones and Pavel A. Pevzner, An Introduction to Bioinformatics Algorithms, The MIT Press, 2004 (ISBN-13: 978-0262101066)

Recommended Additional Text

Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison, Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998 (ISBN: 0521629713)

Additional Sources

Warren J. Ewens, Gregory R. Grant, Statistical Methods in Bioinformatics: An Introduction, Springer -Verlag, 2001 (ISBN: 0387952292)

Pavel A. Pevzner, Computational Molecular Biology: An Algorithmic Approach, MIT Press, 2000 (ISBN: 0262161974)

Peter Clote, Rolf Backofen, Computational Molecular Biology: An Introduction, John Wiley & Sons, Ltd., 2000 (ISBN: 0471872520)

Andreas D. Baxevanis, B.F. Francis Ouellette, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley-Interscience, 1998 (ISBN: 0471383910).

Michael Gribskov, John Devereux, Sequence Analysis Primer, (UWBC Biotechnical Resource Series) original issue, Stockton Press, 1990 (ISBN: 156159007X); reissue, Oxford University Press, 1994 (ISBN: 0195098749)

Michael S. Waterman (ed.), Mathematical Methods for DNA Sequences, CRC Press, Inc., Boca Raton, Florida, 1989 (ASIN: 084936664X)

D. Fasman, Prediction of protein structure and the principles of protein conformation, Plenum Press, New York, 1989 (ISBN: 0306431319)

Lee A. Segel, Modeling dynamic phenomena in molecular and cellular biology, Cambridge University Press, Cambridge University Press, 1984 (ISBN: 052127477X)

John A. Jacquez, Compartmental Analysis in Biology and Medicine, Second Edition, The University of Michigan Press, Ann Arbor, 1985 (ASIN: 0472100637)

E.K. Yeargers, R.W. Shonkwiler, and J.V. Herod, An Introduction to the Mathematics of Biology (with Computer Algebra Models), Birkhauser, Boston, 1996