Carnegie Mellon University
June 28, 2019

CMU Conference Convenes International Crowd to Discuss AI and Data

By Ben Panko

Last month, 150 researchers, librarians, scientists, computer scientists and industry professionals from 10 countries and dozens of organizations convened at Carnegie Mellon University for the 2019 Artificial Intelligence and Data Reuse (AIDR) Conference. The event was organized by the Pittsburgh Supercomputing Center (PSC), a joint center between Carnegie Mellon and the University of Pittsburgh, and Carnegie Mellon’s University Libraries, and was supported by the National Science Foundation (NSF) and the Association for Computing Machinery (ACM).

“When NSF solicited ideas for advancing long-term reuse of scientific data, we saw an opportunity to convene a highly interdisciplinary meeting to discuss approaches for discovering and reusing the valuable data that is currently siloed in project- and region-specific repositories,” said PSC Chief Scientist Nick Nystrom and principal investigator for the NSF award that supported the conference. “It’s a challenging problem for which artificial intelligence approaches are likely to be valuable.”

During the three-day conference, attendees heard from numerous speakers about their experiences using artificial intelligence to tackle a diverse set of topics related to big data. Sessions addressed diverse challenges facing researchers, including a focus area on life sciences, panels focusing on challenges and opportunities, with an emphasis on ethics and smart cities and presentations on topics ranging from handwriting recognition to analysis of honeybee behavior. Student awards recognized outstanding presentations and posters.

“The challenge of data discovery and reusability has a critical human factor that can be addressed with a coordinated effort led by funding agencies, academic institutions, entities involved in the dissemination of knowledge and researchers,” said Paola Buitrago, PSC Director of AI and Big Data, conference co-chair and coordinator of the student awards. “A focus on incentives is mandatory. Addressing the human aspect of the challenge would increasingly open the door to AI-powered tools, enabling data reuse and discovery at unprecedented levels."

"With the recent advances in machine learning and AI, it is possible to train computers to find optimal solutions to a problem, such as integrating different datasets and extracting metadata," Carnegie Mellon librarian Huajin Wang, who served as conference chair, said in a statement.

Carnegie Mellon trustee and Mellon College of Science alumnus Glen de Vries, founder of Medidata Solutions Inc., described his company's work developing AI-based software for physicians and researchers to use for performing clinical trials and sharing data. "In a world where we have better access to data and better techniques to look at data, we should be able to evolve this significantly," de Vries said in his keynote speech.

"The topic of this meeting is one of the most important for the future of empirical science," E. Fredkin University Professor Tom Mitchell, founder of Carnegie Mellon's Machine Learning Department in the School of Computer Science, said in his keynote address.

Mitchell presented his ongoing research into how the brain processes natural language, including how he's trained AI programs to decode which word a person heard based on images of their brain processing that word. He also touched on the failure decades ago to establish a national data repository for fMRI images, which he believes was driven by reluctance among neuroscientists to share their work. Going forward, he said, pooling the data generated by individual researchers so it can be analyzed at large scales will be vital.

"I think we are in the midst of a very significant set of changes in scholarly practice broadly, and they're being driven by data sharing and open science," Cliff Lynch, executive director of the Coalition for Networked Information, noted in one of several panel discussions during the conference.

Another AIDR conference is being planned for next year, and anyone interested in artificial intelligence and data reuse can sign up for the AIDR mailing list here.