Empowering bright young minds from diverse backgrounds to unlock the power of computation to solve research problems at the frontier of modern biology.
The Pre-College Program in Computational Biology provides extensive training in both cutting-edge laboratory experiments to generate biological data and the computational analysis of the data that these experiments generate.
Computer science has revolutionized biology and medicine. Tomorrow's life scientists need deep knowledge of not only the laboratory techniques for generating experimental data but also the rigorous computational techniques necessary to analyze and model these data. The Pre-College Computational Biology program offers an unparalleled experience for high school students to explore this relationship in a university setting.
Our work in the program focuses on answering big picture biological questions about the microbes living in Pittsburgh’s three rivers as well as the ongoing COVID-19 pandemic. After sampling water from one of Pittsburgh’s three rivers, students will use modern laboratory techniques to isolate the bacterial DNA from the water and break the DNA strands into millions of tiny fragments that are then read. The question, then, is what to do with all this information? This is where computational biology flies to the rescue.
Our program is structured to allow students to appreciate the inherent synergy between experimentation and computational analysis in modern biology. We will spend approximately half of each day of the program following a hackathon model, in which students will work in small groups to write programs solving computational problems, with hands-on guidance from the instructor and teaching assistants. Students will spend the other half of each day in the laboratory, conducting experiments to generate large datasets to be analyzed with student code.
Carnegie Mellon University is a leader in automated science, and as part of the experimental side of the program, students will get the chance to work in our automation lab. They will use robots to run experiments while learning how machine learning can be used in the design and execution of experiments.
Final projects at the close of the program allow students to present their work to peers, parents, guardians, and other guests. Example student projects can be found at our program homepage.
The Computational Biology curriculum changes from year to year and is subject to change as we continue to hone our program to find fun activities that we can cover with students. For past and present curricula, please consult our program homepage.
*For specific program dates see, the home page.
**In order to be eligible as a commuter student, the parent or legal guardian must have a permanent residence within approximately 30 miles of campus or within Allegheny County. Families who relocate temporarily to the Pittsburgh area are not eligible for commuter status. There are no exceptions to this policy.
Students will explore the following topics:
- Bacterial colonization and genome sequencing
- DNA extraction
- Genome assembly
- Polymerase chain reaction
- 16S ribosomal RNA sequencing
- Genome assembly
- Downstream genome analysis, such as gene finding
- Sequence alignment and its applications to species identification, genome annotation, and gene comparison
- Evolutionary tree construction
- Metagenomics analysis
Module 1: Diversity Within Pittsburgh’s Three Rivers’ Microbiome
- How do we design an experiment to learn about microbes in the environment?
- How were DNA sequence data generated?
- How can you isolate and identify individual colonies of bacteria?
- How can we extract DNA from samples with a variety of organic material with different structures (viruses, plants, bacteria, other microorganisms)?
- How can we use our knowledge of evolution and molecular biology to focus our experiments on studying bacteria?
- How can we use sequence data to determine the diversity of microbes in the rivers?
- How can we measure the difference between two samples?
- How can we determine what drives microbial diversity in river water?
Module 2: Mapping DNA to a Database
- How can we quantitative determine the difference between two DNA strands containing only A’s, C’s, T’s, and G’s?
- How can we isolate bacteria in the laboratory?
- From bacteria, how do we isolate DNA?
- How can we match a DNA sequence to a database of known bacteria?
- How can we use computational techniques to understand and characterize images of bacterial colonies?
- How can we compare the SARS-CoV-2 genome against related viruses? Does it differ more in some genes than others?
Module 3: Reconstructing a Genome
- How can we generate short fragments of DNA taken from an organism in a lab?
- How do we assemble our short strands of DNA and reconstruct them into a complete SARS-CoV-2 or bacterial genome?
Module 4: Gene Identification
- Given a complete coronavirus or bacterial genome, how can we determine where the genes are?
- Can we infer the function of a gene from only its sequence?
- What genes are present in the coronavirus genome and what do they do?
Module 5: Evolutionary Tree Construction
- What are the evolutionary relationships among bacteria in Pittsburgh’s rivers?
- Can we use evolutionary trees of viruses sampled from patients to determine the origin of SARS-CoV-2 in the US?
- How can we visually compare multiple sequences to one another?
- How can we quickly determine where mutations in the coronavirus occurred and use this to identify variants?
Programming Preparatory Materials
IMPORTANT: Admitted students will be required to complete some assignments taken from this project before starting the program.
Eligibility and Application Requirements:
We are looking for students who love biology, have demonstrated that they are proficient in mathematics, and who are looking for a program that will teach them how computational approaches are fundamental to a complete understanding of modern biology.
Programming experience is not required.
We do not require that students have experience in programming, as we provide preparatory materials that provide our students with the foundation in programming that they will need to be successful. (See “Programming Preparatory Materials”.)
- Be at least 16 years old by the program start date (to participate in the residential program).
- Be a current sophomore or junior in high school at the time of application submission.
- Please note: talented sophomores are encouraged to apply, however, most of our admitted students will be juniors.
- Have an academic average of B (3.0/4.0) or better.
- Completed online application
- Unofficial transcript
- Standardized test scores (optional)
- Standardized tests are not required. We assess applicants holistically and take into consideration many factors, including quantitative background and skill. One way in which this skill can be demonstrated is through optional submission of PSAT, SAT, ACT, or SAT Subject Test scores and/or by mathematics coursework.
- One Letter of Recommendation
- Responses to essay prompts
Application Essay Prompts
- What do you hope to gain from participating in Carnegie Mellon’s Pre-College Programs?
- Why are you interested in studying Computational Biology?
Frequently Asked Questions
- Great question! The short answer is the application of high-powered computational approaches to analyze biological or medical datasets. For a lengthier explanation, check out the first 20 minutes of this video recorded by Professor Compeau.
- Because our program is heavily dependent on coding, each student in our program will need to bring a laptop. We will provide all other resources needed.
- Yes, our program is open to international students, as long as they are able to enter the United States and come to Pittsburgh. They are not, however, eligible for scholarship consideration.