Carnegie Mellon University
December 21, 2023

Chemistry Ph.D. Students Awarded Tata Consultancy Services Presidential Fellowships

By Amy Pavlak Laird

Jocelyn Duffy
  • Associate Dean for Communications, MCS
  • 412-268-9982

The Tata Consultancy Services (TCS) Presidential Fellowships celebrates two graduate students at Carnegie Mellon University for their research on developing machine-learning tools and automated, experimental methods to predict the outcomes of chemical reactions and advance drug design. Chemistry Ph.D. students Polina Avdiunina and Zhen "Jack" Liu earned the fellowships in 2022 and 2023, respectively.

"The TCS Fellowship provides a unique opportunity for graduate students. Both Polina and Jack could really focus on exciting science that is enabled by the Carnegie Mellon University Cloud Lab and explore how machine learning accelerates chemical experiments," said Associate Chemistry Professor Olexandr Isayev, who advises both Avdiunina and Liu.

The Presidential Fellowship provides important financial support to recruit and retain outstanding graduate students. The fellowship was made possible through a generous gift in 2015 from Tata Consultancy Services (TCS), a leading global IT services and consultation organization, to support undergraduate scholarships and graduate-level fellowships across the university.

"We congratulate Tata Consultancy Services Presidential Fellowship awardees Polina Avdiunina and Zhen Liu for their work on AI-led materials design and development. The work demonstrates that in-silico experimentation and digital bio-twins are ready to transform new product development across a wide range of industries. This aligns with our belief that digital technologies such as AI are ready to accelerate sciences and will unleash tremendous innovation across all sectors," said Harrick Vin, Chief Technology Officer, TCS.

Over a decade, through its Co-Innovation Network (COIN™) Program, TCS has supported hundreds of inventors worldwide to make a difference in their respective fields. Their inventions will advance the state of the art in several industries, thereby transforming businesses and the world we live in.

Polina Avdiunina

Polina Avdiunina came to Carnegie Mellon to pursue her doctorate and to be involved with cutting-edge research. And cutting-edge it is. She designs and conducts wet lab biochemistry experiments and remotely controls them from her computer. 

"It's like a high-tech future reality that doesn't even feel like a reality yet," said Avdiunina, a second-year Chemistry Ph.D. student.

She's working with Emerald Cloud Lab (ECL), a remotely operated research facility that handles all aspects of daily lab work without the user ever setting foot in the lab. ECL was cofounded by Carnegie Mellon alumni Brian Frezza and DJ Kleinbaum, who are working with the university to build the Carnegie Mellon Cloud Lab.

"I get to work with biological molecules and drug-design projects, but I'm not pipetting; I'm not doing any wet experiments," she said. "I program all the experimental work on the computer using ECL software for orchestrating experiments in the facility."

Avdiunina studies protein kinases, proteins that regulate nearly all aspects of cell life. Malfunctioning kinases cause cancer and diabetes as well as metabolic and neurological disorders, which make them ideal drug targets. There are currently more than 70 drugs that inhibit specific kinases, and Avdiunina is working on finding more.

When she was a student studying bioengineering and bioinformatics at Lomonosov Moscow State University in Russia, she spent a summer working with Associate Chemistry Professor Olexandr Isayev, then at the University of North Carolina, on building machine learning models that would be able to predict how well certain small molecules bind to kinases. Such small molecule inhibitors might be able to be used as drugs.

"Potentially discovering a new molecule that could become a new drug for people who need it is a main motivation for me," she said.

Avdiunina returned to Russia to complete her degree after her summer research experience but joined Isayev's lab at Carnegie Mellon for her Ph.D. work. The project she had worked on years earlier had reached its experimental stage. The machine learning tool has predicted hundreds of potential kinase inhibitors, and Avdiunina is now designing high-throughput screening experiments to test their effectiveness in real time. The experiments are being run at the ECL, which recently moved from San Francisco to Austin. Once the Carnegie Mellon Cloud Lab opens in 2024, the research will transfer to Pittsburgh.

"The automated experiments are a fast and effective way to experimentally check our predictions," Avdiunina said.

Avdiunina's current experiments focus on one specific kinase. Still, the goal is to standardize and validate the laboratory protocol, which would allow them "to test multiple kinases against hundreds of small molecules and do it automatically, so you don't need to recreate the protocol every time," she said.

As an undergraduate student, Avdiunina gained experience in conducting lab experiments but was more interested in working on computational projects. All of that experience is coming in handy as she collaborates with scientists who are experts at manually running the kinase assays in the lab. They work together on the various comprehensive details needed for automated processes to be run successfully, efficiently, and on a bigger scale.

"It's not easy," Avdiunina said. "There are a lot of things involved in that experiment, so there are many things that could go wrong. I'm calling them often, and I'm lucky to have them guiding and helping me."

Her experience with programming has been key in helping her learn ECL's systems. "I'm also really grateful for the ECL team. They understand that their cloud facilities are the first out there and that it does take time for people to learn how to use them. They have been investing in teaching others how to use it, and they've been very helpful with that."

For Avdiunina, the "project is a perfect combination of everything for me."

Zhen "Jack" Liu

Zhen "Jack" Liu arrived at Carnegie Mellon as a first-year graduate student with plans to study chemical biology. A rotation in a computational chemistry lab changed everything.

The idea that people can use machine learning and artificial intelligence to predict what's going to happen in the lab without running a single experiment dazzled him, he said. He was inspired to switch his focus to computational chemistry. The problem? Liu didn't know anything about programming.

"The first time I learned Python was when I came to CMU," Liu said. "I knew nothing. Absolutely nothing." 

Now, five years later, he's added an entirely new field of expertise to his resume.

"There is a huge gap between experimental chemistry and computational chemistry, so I needed to fill that gap," said Liu, whose undergraduate degree is in organic chemistry. "It took me a lot of effort to learn the relevant knowledge and keep up the pace with my peers."

Liu's work focuses on developing machine learning tools to assist chemists in molecule design and reaction prediction. Since a chemical reaction can be highly complex, it takes a lot of time and effort to conduct experiments to determine the best reactants, catalysts, and conditions needed to run a highly effective reaction that delivers the highest yield of product while avoiding unintended by-products. This is where machine learning and artificial intelligence can have a big impact, especially when it comes to predicting the reaction yield.

"Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis," Liu said. "If you have a model that is accurate in predicting theoretical yield, then a chemist in the lab does not need to run that many experiments. They would just need to run experiments that are recommended by the model. This saves time for more creative things."

Despite its importance, predicting the theoretical yield remains challenging because the yield depends on many observable and unobservable factors throughout the reaction process, including the interaction between molecules, environment conditions, and the experimental techniques used by a chemist at the bench.

The goal is to design an accurate and generally applicable yield prediction model. But models trained on large data sets are notoriously bad at accurately predicting yields. Liu set out to figure out why.

With the guidance of Isayev, Liu investigated the yield prediction task. Through a systematic benchmark study on the amide coupling reactions, they discovered that reactivity cliffs and yield uncertainty are key factors that degrade model accuracy. To overcome this challenge, Liu designed four sets of descriptors with Auto3D, each of which describes different aspects about the reactions.

"Incorporating multimodal information and stacking techniques worked out, and we achieved an R2 for yield prediction on a large reaction dataset at about 0.45," Liu said. "Though still far from satisfying, this is a significant improvement compared to previous efforts."

Liu's discovery, published in the journal Chemical Science, highlights that yield prediction models must be sensitive to reactivity cliffs, which are a dramatic drop off in reactivity that occurs with a minor change in a molecule's substructure. The models must also be robust to the uncertainty associated with yield measurements.

With this new insight, Liu is working on improving his yield prediction model. He is looking forward to running experiments through the Carnegie Mellon Cloud Lab, which Liu expects will generate very high-quality data.

"In the CMU Cloud Lab, everything is measured by machine, so many of the variables, like human operations, will be automated and standardized. We think this will be really important for yield-prediction-model building," Liu said.

In the future, Liu will continue to develop a machine learning package and models for accelerating drug discovery.

"I would feel a sense of achievement if I could do a job that could impact others," he said.

— Related Content —