
Tool Helps Scientists Spot Source of Disease
Causarray uses statistics and data science to identify the genetic changes behind neurological conditions
Media Inquiries
Carnegie Mellon University researchers have developed a statistical tool that could help pinpoint the genetic changes that cause diseases like Alzheimer’s and schizophrenia. While scientists have long identified genes associated with these conditions, confirming which changes actually cause disease has remained a challenge. The tool, causarray(opens in new window), offers hope.
CMU’s Kathryn Roeder(opens in new window), UPMC University Professor of Statistics and Life Sciences in the Statistics & Data Science and Computational Biology departments, said that causarray has already been proven effective at identifying significant genetic changes.
“Moving from statistical studies of association to studies of causation is one of the major accomplishments of the field in the last 10 years,” she said.
Roeder co-wrote the study with CMU’s Jin-Hong Du(opens in new window) and Maya Shen(opens in new window), as well as Hansruedi Mathys, an assistant professor in the Department of Neurobiology at the University of Pittsburgh.
Unraveling complex causal relationships
Causarray relies on the concept of “unmeasured confounders” — subtle, often hidden factors that sway a cell’s fate. “You have a different life than I have. We have confounders,” said Roeder. “Well, cells have confounders, too.”
As one example of how causarray can be used, Roeder said that the tool will be essential in the analysis of data from CRISPR (which stands for clustered regularly interspaced short palindromic repeats). In a typical CRISPR study, researchers might selectively modify the DNA of a living organism by knocking out a gene in one cell and then watching what happens, inferring the effects of that treatment by comparing the results to the condition of cells that were left untouched. However, such approaches can’t take into consideration the unmeasured confounders — factors such as cell cycle or experiment temperature — that may also impact the path each cell will take, regardless of which genes were knocked out.
“What we do is say, well, let’s take this cell that got the treatment, and estimate what would have happened to that particular cell if it did not have treatment,” said Roeder. “This is what’s known as a counterfactual.”
At the same time, causarray uses vast amounts of gene expression data to also predict what would happen to the control cells.
“We are trying to look through the data for the common pattern found in multiple genes to identify those unmeasured confounders,” said Du, lead author of the study and recent graduate of the Ph.D. in Statistics & Machine Learning program(opens in new window). “And by correcting for those effects, we’re trying to move from association to causation.”
To be clear, Roeder and Du said they did not invent the counterfactual approach. Rather, they are among the first to apply it to genomics, using Du’s elegantly coded causarray software.
“You can actually look at the features of the data, and the data will pick up that signal because of an implicit correlation across genes,” said Roeder. “Recent advances, like CRISPR, hold the promise to lead to real breakthroughs in our understanding of brain disorders, but we will only achieve these advances if they are paired with powerful statistical tools.
“This is the magic of it.”