Carnegie Mellon University
Pioneering an Integrated Approach to Diagnosing, Combating Cancer

Pioneering an Integrated Approach to Diagnosing, Combating Cancer

For too long, cancer has perplexed and frustrated scientists. Two patients diagnosed with the same stage of what looks like the same cancer can have vastly different outcomes, even after the same treatment. Equally disturbing, the chemotherapies used to treat a tumor in many cases harm healthy tissues, making some patients dread the treatment more than the disease.

Molecular medicine is now turning cancer diagnostics and treatment on its head, offering provocative instances in which doctors use the molecular features of a tumor to diagnose subtypes of cancer, thereby establishing which tumors are most aggressive. Using this information, scientists have developed a handful of drugs targeted against specific molecules to foil cancer growth.

Although encouraging, progress to date has been piecemeal. To advance cancer diagnosis and treatment requires a faster pace of discovery and a comprehensive cataloging of vast quantities of molecular data. The key to success, according to MCS scientists, lies in integrating state-of-the-art molecular biology with computational innovation. MCS has won a hefty $3.5 million grant from the Pennsylvania State Department of Health to pursue this strategy, one of only four multi-million dollar grants awarded in 2002. That MCS won such a large award while lacking an extensive portfolio of other cancer-related grants may strike some as unusual, but not Richard McCullough, dean of MCS.

“This award validates our expertise at unifying biology, computer science and other fields in fundamentally new approaches to identify cancer subtypes and to improve tumor detection and treatment. This molecular classification will allow us to detect and diagnose cancers in radically different ways from just a few years ago.”

The multidisciplinary grant, “Integrated Protein Informatics for Cancer Research,” combines six projects, each based on original methods developed by the project leaders and their collaborators within MCS, as well as at the University’s School of Computer Science and Department of Statistics. The grant also provides funding to partner with Dickinson College and Gettysburg College faculty.

Central to the MCS proposal is understanding the production and activity of proteins. Proteins dictate when cells divide, how cells respond to their environment and how cells carry out life-sustaining processes. When proteins work inappropriately, cancer may emerge. To grapple with cancer effectively, scientists want to catalog all the proteins within healthy cells and understand how they interact with one another normally, as well as track how their interactions fail as healthy cells turn deadly. This goal is a tall order, given that a cell makes up to 300,000 proteins. Keeping track of the data — an emerging field called proteomics — requires exceptional ingenuity and skill in laboratory and computational technologies.

While the challenge is daunting, the payoff is big: identifying the very proteins that trigger and sustain a cancer. Understanding protein miscreants and how they talk to one another will enable scientists to track the subtle molecular changes that occur over time as a cell becomes cancerous. This new understanding, in turn, should yield the unprecedented ability to detect and treat tumors at their earliest stages. It also should give scientists the molecular knowhow to divide what appears to be one cancer into its many subtypes — those benign and those aggressive. It will enable doctors to differentiate the more than 100 known kinds of cancer into perhaps 1000 varieties, each characterized by one or more cancer-associated molecular events. Comprehensive proteomics profiles of cancer should speed drug discovery considerably because the molecular flaws that give rise to a specific cancer will form the basis of that cancer’s therapy.

MCS Biological Sciences has recruited a remarkable set of talented investigators poised to take on these complex projects, according to department head Elizabeth Jones.

“MCS faculty expertise in molecular biology, biochemistry and computational biology has resulted in an impressive set of technological innovations and scientific discoveries within the last decade,” comments Jones.

MCS successfully vied for the state cancer research grant by leveraging the power of these recent innovations and the university’s unparalleled talent in computational science.

A Complex Problem

To gain an appreciation for how the team integrated its original methods in protein informatics to address cancer research, it helps to retrace the steps of how proteins are made. Recall that a gene is a stretch of DNA that is transcribed into messenger RNA (mRNA), which is then translated into protein. The myriad number of proteins in a cell is possible because genes contain protein coding regions (exons) interspersed with regions that do not contribute to a protein (introns). After transcription, RNA is processed to remove introns and splice together exons, thereby leaving an mRNA that is translated into a functional protein. By alternatively splicing together different exons, one gene may yield up to 38,000 proteins that have subtly different functions or vastly different purposes within a cell. Cells also control when proteins are made. These two features — protein variation and timing of protein production — provide a cell great flexibility to control its destiny and the destinies of cells around it.

Sometimes, one or more proteins are not made appropriately, and a cell fails to correct the problem. Ultimately, this breakdown can transform a healthy cell into a diseased cell, such as a cancer that divides unceasingly. Worse still, such aberrant activity influences cells surrounding a tumor, allowing it to spread.

Redefining Proteomics

Understanding the changes that cause a cell to become cancerous requires multiple levels of investigation. To be sure, plenty of scientific teams outside Carnegie Mellon are looking at protein variations between healthy and cancer cells, but most of the methods developed by MCS scientists are so new and radically different that they are either in use only by a few outside academic institutions or not at all. And arguably no other group is combining these approaches to redefine cancer proteomics.

Traditional proteomics assesses protein variety and abundance in healthy versus cancer cells, but most laboratory methods set up to conduct this work are inefficient at capturing all the types and quantities of proteins made by healthy cells and accurately comparing them to those produced by cancer cells. At MCS, a team led by Jonathan Minden and William Eddy, professor of statistics at the College of Humanities and Social Sciences, is combining Carnegie Mellon-developed differential gel electrophoresis (DIGE) and innovative computational software to accomplish this task.

Developed in 1997 by Minden, associate professor of biological sciences, DIGE offers a rapid and sensitive visualization of protein differences between normal and cancerous cells. Conventional twodimensional gel electrophoresis compares protein differences using at least two different gels. Because no two gels are identical, matching protein spots can be very difficult. But DIGE enables researchers to compare protein expression patterns for two or more biological samples on one gel. Computerized processing and analysis of DIGE gels should speed the discovery of subtle protein differences that discriminate healthy cells from cancer.

“By integrating DIGE with computational software, we expect to develop a database for large-scale DIGE comparisons of healthy individuals and those with leukemia that could help us pinpoint proteins involved in the disease,” states Minden.

The pathway from a single gene to multiple proteins is another level of complexity that needs to be captured systematically to understand how a normal cell turns cancerous. This task would involve measuring all the alternative mRNA splicing events to detect patterns that could signal cancer, thus requiring a strong computational component. At MCS, a team of investigators led by Javier Lopez, associate professor of biological sciences, has developed technologies to understand and categorize the differences in mRNA splicing events between healthy and cancer cells, providing the basis for a state grant project.

Once proteins are made, they travel to specific parts of a cell to perform their jobs. Comparing the locations of proteins could reveal if a protein made in a healthy cell fails to appear where expected in a cancer cell. MCS associate professors of biological sciences Jonathan Jarvik and Peter Berget have developed CD tagging, which refers to labeling and tracking gene expression through Central Dogma, from gene to RNA to protein. CD tagging reflects the real-time distribution and location of fluorescently labeled proteins within cells.

A project led by professor of biological sciences Robert Murphy applies machine learning methods to CD-tagged proteins to automatically analyze digital images from fluorescence microscopes. This new science — location proteomics — is superior to the eye in objectively locating proteins within different cell structures. Location proteomics will address how the aberrant expression of a specific cancer-causing gene derails the expression and configuration of proteins within tumor cells over space and time.

“The proteome of every cell is comprised of thousands of proteins with different and distinct locations, abundances, interactions and biochemical activities. Clearly, the more completely we can describe the composition and dynamics of the proteome, the better we will understand the biology of the cell in health and disease,” states Murphy, who has appointments in the Carnegie Institute of Technology’s Department of Biomedical Engineering and the School of Computer Science.

Should researchers discover that a given protein differs between a healthy cell and a cancer cell, they may not know anything about the protein’s structure. Such information is vital to help scientists develop drugs that are targeted against a dysfunctional protein. In another project, Gordon Rule, associate professor of biological sciences, and Michael Erdmann, professor of computer science and robotics, are combining high-speed computational approaches with nuclear magnetic resonance spectroscopy to generate abbreviated structural “sketches” of uncharacterized or newly discovered proteins. To help reveal a new protein’s function, they will compare these sketches with those of proteins whose structural features and function are well understood.

This approach is truly remarkable because the scientists will not rely on knowledge of the genetic code for a protein, which is traditionally used as the basis first to string together the units (amino acids) that make up a protein and then fold them into a three-dimensional structure. The upshot is that well before a protein’s genetic code is determined, investigators may have a handle on what a suspect protein looks like and how it interacts with other molecules to cause cancer.

“The goal of this research is to develop efficient high-throughput computational tools to rapidly identify the biochemical function of proteins that appear to be involved in cancer,” notes Rule. “Our approach should reduce the need to perform costly, timeintensive studies to produce high-resolution studies of proteins.”

Integrating the Data for Discovery

“Everybody is generating data, but how do we extract knowledge from those data?” asks Dannie Durand, associate professor of biological sciences and computer science. “To elucidate the mechanisms of cancer at both the cellular and organismal levels, we need to capture and evaluate the complex interdependence among many variables, including clinical findings, subclasses of cancer, and gene expression profiles.”

Through one of the state projects, Durand is applying Carnegie Mellon’s world-class expertise in statistical machine learning methods to integrate the wealth of data generated by other projects. This work will develop computational methods to discover and predict subclasses of cancer, predict the sensitivity of given subclasses to specific drugs, and identify suspect proteins for further investigation. Durand and her collaborators are in the vanguard because their analysis looks at many different types of protein data at once. The ability to link simultaneous changes in many aspects of protein behavior promises to unlock a cancer’s secrets in ways that traditional, one-dimensional approaches cannot.

An ongoing challenge faced by the MCS proteomics team is the ever-burgeoning amount of published data that may impact a given investigator’s research activities. Normally, a team of scientists would be monitoring the literature. Tom Mitchell, research professor of computer science and director of the Center for Automated Learning and Development, is commandeering the computer to act as a research assistant in data mining. He hopes to help MCS scientists in identifying online published research findings that bear on their proteomics work.

Only a year into the grant, members of the MCS team are reporting success in designing the technologies and scaling them for high-throughput use. Ambitions remain high.

“We look forward to providing a large public dataset of information that will benefit cancer research not only here at Carnegie Mellon but across the world,” emphasizes McCullough.