Carnegie Mellon University

Image of Eric Nyberg and the Meltwater data team

August 14, 2018

Carnegie Mellon Joins Meltwater To Advance Data Science

By Byron Spice

Byron Spice
  • School of Computer Science
  • 412-268-9068

Students and faculty at Carnegie Mellon University's School of Computer Science are collaborating with the digital media intelligence firm Meltwater to advance the state of the art in artificial intelligence education and research using the company's AI platform.

Meltwater, which has the world's most diverse collection of open and licensed data, has opened its underlying AI platform,, to Carnegie Mellon and other select universities. The platform allows students and faculty to create, connect and organize web-scale information to generate real-time analytics that support decision-making from online data.

CMU will use in graduate AI courses and as a resource for the university's data science and AI research community.

"Sharing access to real-world data helps students, researchers and data scientists solve real-world problems more rapidly," said Eric Nyberg, director of CMU's Master of Computational Data Science program and a professor in the Language Technologies Institute. "In addition to realistic real-time data sources, the platform also includes AI modeling and integrated cloud computing to greatly simplify the process of building and optimizing new web-scale analytics."

This past January, Nyberg joined Majd Sakr, a teaching professor in CMU's Computer Science Department, to launch the Accelerated Cloud for Artificial Intelligence Project with support from Meltwater. A team of MCDS capstone students — Shihui Li, Ganesh Palanikumar and Sida Wang — began developing a set of realistic benchmark challenges for natural language processing tasks.

Image of Eric Nyberg
Eric Nyberg

The ACAI project's initial focus is named entity recognition, an element of information extraction that classifies objects with proper names into categories, such as people, organizations and locations. In the coming semester, the team plans to create an open, web-scale named entity recognition challenge and benchmark possible solutions using resources.

An international authority on the design of computer systems for answering questions, Nyberg worked with his students and members of IBM Research from 2007 to 2011 to develop Watson for the Jeopardy! Challenge. As a member of the Scientific Advisory Board, Nyberg helped to shape the development of the platform. He said he is focused on using to develop new methods for rapid, cost-effective development of specialized question-answering systems for specific information domains.

"Eric Nyberg has played an instrumental role during the development of the platform that helped us build the right interface and toolkits for data scientists," said Aditya Jami, CTO of Meltwater. "CMU is well known for pushing the boundaries in the field of AI and this collaboration will undoubtedly foster a new wave of open innovation."

"The ACAI project will crystallize important advancements in the engineering of cost-effective AI systems," Nyberg said. "The biggest challenge for current students is how to explore the large space of data, features and models available for developing a particular analytic in order to find an optimal or acceptable solution before they run out of time or computing resources," he continued. "The ACAI framework will allow students to explore this large solution space by providing a systematic approach that teaches cost-effective use of cloud resources to build AI systems."

"ACAI's research outcomes are already strengthening's machine learning and text analytics capabilities across the board," said Giorgio Orsi, principal scientist and director of natural language processing at Meltwater. "They are enabling rapid model-building and customization of our text analytics to the needs of our customers."

Nyberg said he hopes to use as part of his regular graduate course, Design & Engineering of Intelligent Information Systems.

"Building a state-of-the-art AI system requires us to store, preprocess and annotate text collections with a variety of feature extractors, while simultaneously exploring the space of possible models that can be built from the data as they emerge," Nyberg said. "By providing a principled framework for storage, metadata and model training, along with a massive collection of open web data and metadata, Meltwater will make it possible for students to build advanced analytics in a classroom setting."