Carnegie Mellon University

Due to the ongoing considerations related to the Covid-19 pandemic, the Pre-College program will be offered only through remote instruction in the summer of 2021. Please view the Online Academic Information section below to review the updated program overview, curriculum, and requirements for our online Pre-College Computational Biology program.

Online Academic Information:

The program structure will allow students to appreciate the inherent interplay between experimentation and computational analysis. We will spend approximately five hours each day following a hackathon model (including a break), in which students will work in small groups to write programs solving computational problems, with hands-on guidance from the instructor and teaching assistants.  These problems will center on answering big biological questions about the microbes living in the three rivers as well as the ongoing COVID-19 pandemic.

Students should be available from weekdays from 1pm to 6pm ET.  Some days will include extra work to be completed before the next class session. 

Final projects at the close of the program allowed students to present their work to parents, guardians, and other guests.

Programming bootcamp

Preparatory materials will be provided from Professor Compeau’s Programming for Lovers open education project to admitted students in advance of the program to provide fundamental programming skills.  Because these materials will be provided in avance, no prior programming experience is required to be successful.

REQUIREMENTS

  • High-speed internet access and access to a computer – contact us if the laptop requirement will prevent you from participating.
  • U.S. Location. The Program is available only to individuals in the United States. Students are required to be physically located in the United States at all times when accessing Program courses and/or activities.

Program Topics (these are taken from the 2020 program and are subject to change as we continue to improve the program):

Module 1: diversity within the three rivers’ microbiome

Sampling: How do we design an experiment to learn about microbes in the environment?  How were DNA sequence data generated?

Microbiome diversity: How can we use sequence data to determine the diversity of microbes in the rivers?  How can we measure the difference between two samples?   How can we determine what drives microbial diversity in river water?

 

Module 2: mapping DNA to a database

DNA comparison: How can we quantitative determine the difference between two DNA strands containing only A’s, C’s, T’s, and G’s?

Bacteria Identification: How can we use computational techniques to understand and characterize images of bacterial colonies?

 

Module 3: reconstructing a genome

Sequencing: How can we generate sequence data from a whole genome in a lab?

Genome reconstruction: How do we take short strands of DNA and reconstruct them into a complete genome?

 

Module 4: gene identification

Given a complete coronavirus or bacterial genome, how can we determine where the genes are?

  • Can we infer the function of a gene from only its sequence?
  • What genes are present in the coronavirus genome and what do they do?

 

Module 5: evolutionary tree construction

  • How can we determine the evolutionary relationships between organisms or viruses?
  • How can we determine where many New York cases of Covid-19 originate using evolutionary trees?
  • How can we visually compare multiple sequences to one another?
  • How can we quickly determine where mutations in the coronavirus occurred?

Overview

 

Pre-College Computational Biology provides extensive training in both cutting-edge laboratory experiments to generate biological data and the computational analysis of the data that these experiments generate.

Biological and medical research have become fully fledged computational disciplines. Tomorrow's life scientists will need deep knowledge of not only the laboratory techniques for generating experimental data but also the rigorous computational techniques necessary to analyze and model these data. The Pre-College Computational Biology program offers an unparalleled experience for high school students to explore this relationship in a university setting.

After sampling water from one of Pittsburgh’s three rivers, students will isolate the bacterial DNA from the water and break the DNA strands into millions of tiny fragments that are then sequenced. Students will then build algorithms to identify the species of microorganisms present in the water samples and construct an evolutionary tree determining how they've evolved.

Picture of Pre-College Computational Biology students talking and laughing after working in the Biology Lab.

Curriculum

The curriculum changes from year to year and is subject to change as we continue to hone our program to find fun activities that we can cover with students. For an updated curriculum, please consult our program homepage.

Overall Structure

We will begin the program by setting sail on Pittsburgh’s three rivers for a full day of collecting water samples and engaging in fun interactive educational activities.

The rest of the program allows students to appreciate the inherent interplay between experimentation and computational analysis. Our students will spend half of each day in a laboratory working together to perform biological experiments, many of which center on answering biological questions about the microbes living in the three rivers.  The other half of the day will follow a hackathon model, in which students will work in small groups to write programs solving computational problems, with hands-on guidance from the instructor and teaching assistants. The code that students write in the hackathon will serve as the foundation for analyzing the biological data that they generate as a result of laboratory experiments.

Students will explore the following topics:

Experimental

  • Bacterial colonization and genome sequencing
  • DNA extraction
  • Genome assembly
  • Polymerase chain reaction
  • 16S ribosomal RNA sequencing

Computational

  • Genome assembly
  • Downstream genome analysis, such as gene finding
  • Sequence alignment and its applications to species identification, genome annotation, and gene comparison
  • Evolutionary tree construction
  • Metagenomics analysis

Carnegie Mellon University is a leader in automated science, therefore students will also have the opportunity to work in our automation lab. They will use robots to run experiments while learning how machine learning can be used in the design and execution of experiments.

Final presentations at the close of the program will allow students to present their work to parents, guardians, and other guests.

Programming boot camp

Preparatory materials will be provided to admitted students in advance of the program to provide fundamental programming skills.  No prior programming experience is required to be successful.

Eligibility

We are looking for students who love biology, have demonstrated that they are proficient in mathematics, and who are looking for a program that will teach them how computational approaches are fundamental to a complete understanding of modern biology.

To be eligible for Pre-College Computational Biology students must: 

  • Be at least 16 years old by the program start date (to participate in the residential program).
  • Be a current sophomore or junior in high school.
    • Talented sophomores are encouraged to apply, however most of our admitted students will be juniors.

  • Have an academic average of B (3.0/4.0) or better.

Programming experience is not required. 

We do not require that students have experience in programming, as we provide preparatory materials that provide our students with the foundation in programming that they will need to be successful.

Application Requirements

The complete application for the Pre-College Computational Biology consists of the following:

  • Completed Online Application
  • Unofficial Transcript
  • Standardized Test Scores (optional)
    • Standardized tests are not required. We assess applicants holistically and take into consideration many factors, including quantitative background and skill. One way in which this skill can be demonstrated is through optional submission of PSAT, SAT, ACT, or SAT Subject Test scores and/or by mathematics coursework.
  • One Letter of Recommendation
  • Responses to the following essay prompts (300-500 words):
    • What do you hope to gain from participating in the Carnegie Mellon Pre-College program?
    • Why are you interested in studying Computational Biology?

Frequently Asked Questions

Great question! The short answer is the application of high-powered computational approaches to analyze biological or medical datasets. For a lengthier explanation, check out our post from Dr. Bob Murphy.

We will provide all supplies needed for the pre-college program. The computational components of the program will be completed in University computer clusters, and so students will not need to bring their own computer.