Carnegie Mellon University Receives NSF Grant to Develop New Computational Techniques for Unraveling Human Genetic History-Mellon College of Science - Carnegie Mellon University

Tuesday, August 1, 2006

Carnegie Mellon University Receives NSF Grant to Develop New Computational Techniques for Unraveling Human Genetic History

PITTSBURGH—A team of researchers at Carnegie Mellon University has received a three-year, $646,000 grant from the National Science Foundation to develop computational methods that will quickly identify key regions of the human genome that can be traced to prehistoric times. These regions can then be used to reconstruct human genetic histories. Ultimately the new tools, which draw from the latest techniques in population genetics, theoretical computer science and operations research, will help researchers address basic questions about human evolution and identify regions of the genome involved with diseases like cancer, diabetes and mental illness.

Humans are 99.9 percent identical at the genetic level, and the key to understanding the diversity of the human species is buried in the 0.1 percent that makes us genetically different from one another. But sorting through the genome to identify and analyze these variations is a computational nightmare.

"Computer analysis of these genetic variations allows us to infer how human populations have evolved over thousands of years. Given our current computational tools, though, we could not complete this task in our lifetimes even if we had every computer in the world working on the problem," said Russell Schwartz, an assistant professor of biological sciences and principal investigator on the project. "We will instead tackle those portions of it that can be solved with confidence given current limitations, while simultaneously pushing the limits of established tools as far as possible through novel algorithm development."

The most common genetic variations occur as single nucleotide polymorphisms (SNPs), single mutations in one of the four chemical bases that make up DNA. Each human genome is made of more than six billion of these bases. Researchers have identified many of the predicted 10 million SNPs in the human genome, but understanding how these variations have accumulated over the course of human history and how they became distributed in human populations is a computational challenge.

Schwartz and co-principal investigators Computer Science Professor Guy Blelloch and R. Ravi, professor of operations research and computer science at the Tepper School of Business, are creating new computational techniques to identify patterns of SNPs that are common in human populations — patterns that indicate ancient relationships shared among humans today. According to the researchers, developing these tools is critical to finding genes that cause disorders like diabetes or heart disease.

To help develop these tools, Carnegie Mellon researchers will analyze data gathered by the International HapMap Project. This research consortium is mapping variations in the human genome to find genes that could help diagnose disease susceptibility and design targeted medicines in the future. The "Hap" is short for haplotypes, or sets of associated SNPs along a segment of the genome that have been conserved throughout human genetic history. Researchers created an initial HapMap — a map of shared blocks of SNPs — by analyzing DNA in blood samples collected from people in Nigeria, Japan, China and the United States (with ancestry from northern and western Europe).

Sorting through millions of SNPs to identify haplotypes is even more computationally challenging because of recombination, a shuffling of genetic material between chromosomes that occurs when sperm and egg cells are produced. Because recombination events accumulate over the course of many generations, they complicate efforts to identify shared ancestry between different people or different regions of the genome. Finding the haplotypes, which have undergone little or no recombination in the recent past, would help scientists identify and trace the ancestral lineages of specific genes across populations.

Schwartz and his colleagues are attempting to find haplotypes with more precision than current techniques by using a new method for partitioning DNA into small segments they call "haplotype motifs." These motifs frequently occur across human populations. Already, their approach has identified ancient haplotype patterns consistent with current evidence about human evolution. For example, the team used their algorithms to analyze data from the HapMap to confirm evidence of ancient haplotype patterns predating the divergence of Chinese and Japanese populations, as well as some patterns predating European and Asian population divergence.

The team is also simultaneously developing novel algorithms to infer phylogenies (family trees) of pieces of the human genome that have not been touched by recombination.

"We are applying new methods from theoretical computer science to create phylogenies that are guaranteed to be the best possible, given the SNP data available to us and our understanding of how the observed patterns of SNPs were created," Schwartz said.

At present, these phylogenies are generally inferred by approximate, or heuristic, methods that do not always make the best possible inferences from the available data, according to Schwartz. The team is developing optimal methods for this task and a related extension where the genome pieces may have limited mutation. These new methods draw from a variety of techniques ranging from graph theory to mathematical programming.

"Both new analyses will together provide us with a partial history of the human genome and detailed information about specific genetic regions where such information can be inferred with confidence," he said.

The grant will also allow the team to develop new course material in the areas of algorithms and computational biology, and provide undergraduate and graduate student research opportunities at the boundaries of quantitative and biological research.

By: Amy Pavlak 412-268-8619