Hands-On Data Analytics Program Builds Scientific Skills
By Kirsten Heuring
Media Inquiries- Associate Dean of Marketing and Communications, MCS
- 412-268-9982
The better the tools, the more data researchers have at their fingertips. Carnegie Mellon University's Mellon College of Science is preparing students to lead the way through its hands-on Master's in Data Analytics for Science (MS-DAS) program.
"I want to work in health care, and what I liked about this program is that a lot of material in our classes would be applicable," said Sophia Kurz, a recent MS-DAS graduate.
Launched in 2021, the MS-DAS program equips students from varied academic backgrounds with the skills to analyze scientific data, especially large datasets requiring large scale computing. The one year program is a partnership between the Mellon College of Science and the Pittsburgh Supercomputing Center.
A cornerstone of the program is a capstone project where students work with corporate partners on industry datasets to address real-word data challenges. Kurz, along with Ananya Agrawal, Ananya Chembai, Aaditya Nair and Yaning Wu, worked with Reddit to streamline the social media platform's backend systems.
"There were jobs in the system that had been running for six months, and all that data was huge," Chembai said "We were looking for bottlenecks to see how we could make their system more efficient."
The project had three phases. First, the group analyzed what kinds of jobs took the longest to run. Second, they developed a reporting process flag and categorize failed jobs. Finally, the team created an algorithm to predict resource usage and identify which jobs might fail.
"I definitely learned about data structures and how to make sense of this massive amount of data," Wu said. "I really enjoyed working with my groupmates, talking about what we were finding and trying different methods."
Paul Raff, Reddit's head of analytics engineering and a triple Carnegie Mellon alumnus in mathematical sciences and computer science, mentored the students throughout the project. He said their work made a meaningful contribution to improving Reddit's backend.
"At Reddit, we operate tens of thousands of data workloads daily to power all of our machine learning, A, and data-driven experiences that delight our users," Raff said. "The MS-DAS capstone team did a great job providing a lot of great insight into the goings-on, interdependencies and areas of improvement in these workloads to allow us to make this critical part of Reddit's infrastructure more reliable and efficient."
Manfred Paulini, professor of physics and associate dean for research in the Mellon College of Science who also serves as executive director of the MS-DAS program said the capstone project is vital to the program.
"The capstone projects with corporate partners in our MS-DAS program make true connections from the classroom to working in the industry by students applying course material to real-world datasets and projects under the guidance of our corporate partners," Paulini said. "I particularly appreciate the projects we had with Reddit as it allowed me to get to know Paul Raff who is a great supporter of science at CMU."
Students said MS-DAS program's emphasis on practical experience gave them confidence as they prepare for careers in data analytics in the sciences and beyond.
"This project has helped me understand a complicated set of data and break it all down," Kurz said. "I've grown a technical mindset of how to find something when it isn't explicitly given to me in the data."