Carnegie Mellon University

How Can We Better Extract Answers From Big Data?

Ben Moseley, Tepper School Assistant Professor of Operations Research, speaks about his research on new ways to solve problems mathematically with big data.

Video Transcript

A lot of my work is theoretical, so I am by trade a mathematician. And so what I'm doing is discovering new ways to solve problems mathematically and showing and proving that they work better than the current methods that are in practice. Then, once we have these methods, I work with practitioners to get them implemented, running on real datasets, and empirically show that we're beating them in practice.

My paper, titled "Scalable K Means++," is on scaling a machine learning method to the big data setting so you can process large datasets fast. We designed an algorithm that's genuinely new, that fits into this model, and so then it can be deployed in the large data center. So how did we do it? We used theoretical models to drive a new algorithm design of how to implement an algorithm that runs in a data center. So with so many people using the internet and social networking sites, we're able to gather more and more information. But this information's only useful if we can actually learn from it to make better decisions.

So my background is in computer science, and what we do is we use ideas from computer science to improve our business methods. So, in particular, we improve the decision-making process — everything from how do you put an advertisement on a website to how do you design how trucks should be routed at FedEx, or how do you design which airline routes you should actually have? I work, right now, on bioinformatics, and there, what we're doing is we're finding faster ways to compare large biological sequences. And so, that way, we can tell the similarity and dissimilarity in evolutionary relationships between genes, but at a larger scale than we've been able to do in the past.

My dream problem is to develop a meta-method that, say, takes all of the ways we solved problems in the past and automatically scales it to handle large datasets. So the vision is to create kind of a meta-technique that we can apply to solve any problem on large datasets.