Carnegie Mellon University
November 30, 2015

Big Data Researchers Receive Grant To Build Better Models For Predicting Cancer Outcomes

By Byron Spice / CMU / 412-268-9069 /
and Anita Srikameswaran / UPMC / 412-578-9193 /

Big data researchers have received a three-year, $5 million, state Commonwealth Universal Research Enhancement, or CURE, grant to develop better methods for integrating, analyzing and modeling large volumes of diverse data on cancer patients. The goal is to produce more accurate predictions of patient outcomes and to enable clinicians to tailor care for each patient.

Greg Cooper, professor and vice chair of biomedical informatics and director of the Center for Causal Discovery at the University of Pittsburgh, and Ziv Bar-Joseph, professor of computational biology at Carnegie Mellon University, will lead the Big Data For Better Health (BD4BH) project, which also includes UPMC and the Pittsburgh Supercomputing Center.

Ziv Bar-Joseph
Carnegie Mellon's Ziv Bar-Joseph

“We will investigate breast and lung cancer as clinical domains to develop the methods and software tools; however, the methods will be generalizable to other diseases,” Cooper said. “The basic approach will be to process raw data, such as gene sequence and expression data, to derive highly informative biological patterns in the data that are then used to predict patient outcomes. We believe these biological patterns will predict outcomes significantly better than would using the raw data directly, and we plan to test this hypothesis.”

For example, rather than using the set of mutated genes, a type of raw data, in a cancerous tumor as predictors of cancer metastasis, they will infer the cell signaling pathways, a type of biological pattern, which are likely having a significant influence on tumor growth. Those aberrant pathways will then be used to predict clinical outcomes, such as tumor spread or metastasis. The ultimate goal is for such predictions to help inform clinical care.

“Carnegie Mellon’s unique expertise in analyzing and modeling large-scale data, combined with the cutting-edge clinical and biomedical work of UPMC and Pitt, can leverage the large amounts of data being collected on cancer,” Bar-Joseph said. “This will enable patients and clinicians to take full advantage of this data in ways not previously possible.”

Machine learning methods, for instance, can automatically analyze large datasets to discover patterns that people cannot discern. These automated discoveries can then enable researchers to identify relationships between the way specific individuals respond to treatment and their DNA to allow more personalized tailoring of treatments.

“We hope that more accurate predictions of clinical outcomes will assist physicians in devising treatment plans and help patients in making health care decisions,” Cooper said. “The patterns contained in the models may also spur biological insights into the diseases being modeled.”

In collaboration with Lincoln University, the BD4BH program also will train underrepresented minority students to work with big data in both the biomedical and data science realms.

The project is funded by the Pennsylvania Department of Health. The department specifically disclaims responsibility for any analyses, interpretations or conclusions.