The new technique, published online in the Journal of Machine Learning Research, promises improved accuracy in the analysis of microscopic images produced by today's biological screening methods — such as the ones used in drug discovery.
This could mean reducing both the cost and the time necessary for these screening methods. It could also make possible new types of experiments that require fewer resources and perhaps uncover subtle anomalies that otherwise would go undetected, the researchers said.
Robert Murphy, the director of the Lane Center and one of the authors, explained that upwards of 100,000 cells and relationships between those cells need to be analyzed in the screening process.
Without the sort of speedups achieved in the new study, this kind of analysis would be impossible, said Murphy, who is the Ray and Stephanie Lane Professor of Computational Biology and a professor in the departments of Biological Sciences, Biomedical Engineering and Machine Learning at Carnegie Mellon.
The technique will be applicable in fields beyond biology because it improves the efficiency of what is known as the "belief propagation algorithm," a widely used method for drawing conclusions about interconnected networks.
"Current automated screening systems for examining cell cultures look at individual cells and do not fully consider the relationships between neighboring cells," said Geoffrey Gordon, a professor in the School of Computer Science's Machine Learning Department. "This is in large part because simultaneously examining many cells with existing methods requires impractical amounts of computational time."
In many cases, computer vision systems have been shown to distinguish patterns that are difficult for humans to detect, he added. However, even automated systems may confuse two similar patterns — and the confusion may be resolvable by considering neighboring cells.
Murphy, Gordon and fellow author, biomedical engineering student Shann-Ching "Sam" Chen, were able to expand their focus from single to multiple cells by increasing the efficiency of the algorithm.
In the case of biological specimens, for instance, the algorithm can be used to infer which parts of the image are individual cells or to determine whether the distributions of particular proteins within each cell are abnormal.
But as the number of variables increase, the belief propagation algorithm can grow unwieldy and require an impractical amount of computing time to solve these problems.
The belief propagation algorithm assumes that neighbors — whether they are cells or bits of text — have effects on each other. So the algorithm represents each piece of evidence used to make inferences as a node in an interconnected network, and exchanges messages between nodes.
The Carnegie Mellon researchers found shortcuts for generating these messages, which significantly improved the speed of the entire network.
The Ray and Stephanie Lane Center for Computational Biology was established in 2007 with a focus on bringing machine learning methods to bear on complex biological problems, especially cancer diagnosis and treatment.