Carnegie Mellon University

pharmaceutical drugs

February 25, 2019

Faculty Spotlight: New Research Explores Machine Learning to Simplify Drug Development

A new model developed at the Tepper School of Business applies machine learning to create better methods for the laborious and expensive process of drug discovery.

The paper, titled “Efficient Nonmyopic Batch Active Search,” was coauthored by Benjamin Moseley, Carnegie Bosch Assistant Professor of Operations Research. It develops a theoretical basis for testing properties of substances, such as drugs, and uses these findings to improve the state of the art for discovering new drugs in practice.

Typically, drug development involves a quest to find chemical compounds that bind with a target protein associated with a particular disease. 

“So much of this testing is done manually, and it is very expensive,” says Moseley.

A scientist may know the structure of the compounds and what they have in common, but not whether they have the underlying properties that are needed for the drug to be effective. To test each one manually can take a great deal of time and expense, so the scientist typically has a limit on how many tests they can run, meaning a compound might get overlooked that would otherwise be a good candidate for developing into a drug for that disease. 

Moseley applied a Bayesian approach to the problem, meaning the probability of the compound working is updated as more information becomes available. By assuming a group of chemical compounds has some relationship to each other, Moseley’s algorithm suggests that there is some hidden distribution of the property needed to create the drug and then searches for that property (or multiple properties) across an entire group of similar compounds at once. The model looks into the future to see what might happen, and uses that information to guide the exploration of the compounds.

Moseley estimates the model represents a 20 to 30 percent improvement over standard procedures. This paper was recognized with a spotlight presentation at the conference Advances in Neural Information Processing Systems (NeurIPS) in December 2018, the leading machine learning publication venue. This honor was given to 168 out of 4,856 submissions.

 “It can alleviate one of the bottlenecks of drug discovery,” says Moseley, who notes that testing chemical compounds for a specific property is one of the most expensive steps in the entire process: “It could improve one of the most expensive steps of the drug discovery process.”

Moseley says the same model can be applied to help solve other problems, such as finding new mixtures to make metals in material sciences.

“I’m always looking for really cool problems that have real motivation, but that mathematically we don’t understand,” he says. “This paper offers a foundational understanding of the problem.”