Computational Biology
Empowering bright young minds to unlock the power of computation to solve research problems at the frontier of modern biology.

Program Length |
---|
Jun. 22 to Jul. 13, 2024 (3 weeks) |
Early Decision & International Applications Due |
---|
![]() |
Scholarship* & Regular Decision Applications Due |
---|
![]() |
Housing Options |
---|
Resident orCommuter** |
Program Overview
The Pre-College program in Computational Biology provides extensive training in both cutting-edge laboratory experiments to generate biological data and the computational analysis of the data that these experiments generate.
Computer science has revolutionized biology and medicine. Tomorrow's life scientists need deep knowledge of not only the laboratory techniques used for generating experimental data but also the rigorous computational techniques necessary to analyze and model these data. Pre-College Computational Biology offers an unparalleled experience for high school students to explore this relationship in a university setting.
Our work in the program focuses on answering big picture biological questions about the microbes living in Pittsburgh’s three rivers as well as the ongoing COVID-19 pandemic. After sampling water from one of Pittsburgh’s three rivers, students will use modern laboratory techniques to isolate the bacterial DNA from the water and break the DNA strands into millions of tiny fragments that are then read. The question, then, is what to do with all this information? This is where computational biology flies to the rescue.
Our program is structured to allow students to appreciate the inherent synergy between experimentation and computational analysis in modern biology. We will spend approximately half of each day of the program following a hackathon model, in which students will work in small groups to write programs solving computational problems, with hands-on guidance from the instructor and teaching assistants. Students will spend the other half of each day in the laboratory, conducting experiments to generate large datasets to be analyzed with student code.
Carnegie Mellon University is a leader in automated science and, as part of the experimental side of the program, students will get the chance to work in our automation lab. They will use robots to run experiments while learning how machine learning can be used in the design and execution of experiments.
Final projects at the close of the program allow students to present their work to peers, parents, guardians, and other guests. Example student projects can be found at our program homepage.
Who's a good fit and is programming required?
We are looking for students who love biology, have demonstrated that they are proficient in mathematics, and who are looking for a program that will teach them how computational approaches are fundamental to a complete understanding of modern biology.
Programming experience is not required.
We do not require that students have experience in programming, as we provide preparatory materials that provide our students with the foundation in programming that they will need to be successful. (See “Programming Preparatory Materials” below.)
Curriculum
Experimental
- Bacterial colonization and genome sequencing
- DNA extraction
- Genome assembly
- Polymerase chain reaction
- Gel electrophoresis
- 16S ribosomal RNA gene sequencing
- Transfection (adding genes to E. coli)
Computational
- Genome assembly
- Downstream genome analysis, such as gene finding
- Sequence alignment and its applications to species identification, genome annotation, and gene comparison
- Evolutionary tree construction
- Metagenomics analysis
Module 1: Diversity Within Pittsburgh’s Three Rivers’ Microbiome
Sampling
- How do we design an experiment to learn about microbes in the environment?
- How were DNA sequence data generated?
- How can you isolate and identify individual colonies of bacteria?
Microbiome diversity
- How can we extract DNA from samples with a variety of organic material with different structures (viruses, plants, bacteria, other microorganisms)?
- How can we use our knowledge of evolution and molecular biology to focus our experiments on studying bacteria?
- How can we use sequence data to determine the diversity of microbes in the rivers?
- How can we measure the difference between two samples?
- How can we determine what drives microbial diversity in river water?
Module 2: Mapping DNA to a Database
DNA comparison
- How can we quantitatively determine the difference between two DNA strands containing only A’s, C’s, T’s, and G’s?
Bacteria identification
- How can we isolate bacteria in the laboratory?
- From bacteria, how do we isolate DNA?
- How can we match a DNA sequence to a database of known bacteria?
- How can we use computational techniques to understand and characterize images of bacterial colonies?
SARS-CoV-2 application
- How can we compare the SARS-CoV-2 genome against related viruses? Does it differ more in some genes than others?
Module 3: Reconstructing a Genome
Sequencing
- How can we generate short fragments of DNA taken from an organism in a lab?
Genome reconstruction
- How do we assemble our short strands of DNA and reconstruct them into a complete SARS-CoV-2 or bacterial genome?
Module 4: Gene Identification
- Given a complete coronavirus or bacterial genome, how can we determine where the genes are?
- Can we infer the function of a gene from only its sequence?
- What genes are present in the coronavirus genome and what do they do?
Module 5: Evolutionary Tree Construction
- What are the evolutionary relationships among bacteria in Pittsburgh’s rivers?
- Can we use evolutionary trees of viruses sampled from patients to determine the origin of SARS-CoV-2 in the U.S.?
- How can we visually compare multiple sequences to one another?
- How can we quickly determine where mutations in the coronavirus occurred and use this to identify variants?
Programming Preparatory Materials
IMPORTANT: Admitted students will be required to complete some assignments taken from this project before starting the program.
Eligibility and Application Requirements
Eligibility Requirements
- Be at least 16 years old by the program start date.
- Be a current sophomore or junior in high school at the time of application submission. Please note: Talented sophomores are encouraged to apply, however, most of our admitted students will be juniors.
- Have an academic average of B (3.0/4.0) or better.
Application Requirements
- Completed online application
- Unofficial transcript
- Standardized test scores (optional)
- One letter of recommendation
- Responses to essay prompts
Application Essay Prompts
- What do you hope to gain from participating in Carnegie Mellon’s Pre-College Programs?
- Why are you interested in studying Computational Biology?
Frequently Asked Questions
What is computational biology?
Great question! The short answer is the application of high-powered computational approaches to analyze biological or medical datasets. For a lengthier explanation, check out the first 20 minutes of this video recorded by Professor Compeau.
Do I need to bring my own computer? What other supplies do I need?
Because our program is heavily dependent on coding, each student in our program will need to bring a laptop. We will provide all other resources needed.
Will I earn college credit from this program?
No, Pre-College Computational Biology students do not earn college credit.
Are international students allowed to participate?
Yes, our program is open to international students as long as they are able to enter the United States and come to Pittsburgh. They are not, however, eligible for scholarship consideration.

