Carnegie Mellon University
October 11, 2021

Student Works To Automate Scrubbing Physics Data for Better Results

By Kim Lyons

Jocelyn H. Duffy

Particle physics research isn't always full of dramatic eureka moments. But each researcher on a project has a part to play.

For Harrison Wolf, a junior in Carnegie Mellon University's Department of Physics, that means helping to clean data.

He says it's like trying to run a marathon with a rock in your shoe: "You can continue the race without stopping to get rid of the rock, but those 26 miles will be a lot harder and more arduous."

If you have a large data set to study, and that data isn't "clean," or has abnormalities, eventually you have to deal with the bad or incorrect data so as not to corrupt research results.

Even better than that is finding a way to automate the data-cleaning process and make the detection of the bad data more efficient, which is what Wolf is working on for his Summer Undergraduate Research Fellowship from the Office of Undergraduate Research.

Wolf is one of Professor Roy Briere's team members, who are among some 1,000 researchers around the world collaborating on the Belle II particle physics experiment. Belle II's goal is to study the properties of heavy particles containing a bottom quark, known as B mesons.

"You hear about these huge physics experiments doing groundbreaking, world-changing research," Wolf said. But under the hood of an experiment like Belle II there are hundreds of people that need to make sure that they are getting good data," he added.

"I do a lot of computer work, a lot of statistical analysis — the past few weeks I've been looking at the same 14,000 histograms and trying to write a computer program that says 'OK here's what's wrong with this data, here's how we can turn off what's not working until we can fix it, so that people higher up can get the good data they need to make those groundbreaking discoveries.'"

Briere said that Wolf essentially is working to find ways to more efficiently automate scanning through data to find abnormalities, or misbehaving electronics. He added that there are more than 14,000 different signal paths, or "channels," which can each act up from time to time.

"We already have some software that does this, we're just trying to make it better," Briere said.

Wolf, who is pursuing a minor in drama, said he chose Carnegie Mellon because he knew it would allow him to pursue both his scientific and artistic interests in a meaningful way.

"I really liked that CMU is strong in both areas: science and technical fields but also artistic fields as well," he said. "I'm getting to study both of the things that I love."

Briere said he had Wolf in a class before Wolf signed on for the summer work and thought he would be well suited for it.

Wolf said that while it would have been nice to be with other people in his group, having to work remotely because of the pandemic didn't present any significant obstacles. He said he hopes to continue to work with Briere. What he has learned so far has helped fine tune skills he will need in the job market.

"These types of data analyses are a big part of what's being done in the field," Wolf said.