June 06, 2019

Libraries Convene Community of Scholars to Tackle Data Challenges

By Shannon Riffe sriffe(through)andrew.cmu.edu

Media Inquiries

Shannon Riffe

University Libraries
sriffe(through)andrew.cmu.edu
412-268-7260

Carnegie Mellon University Libraries recently hosted a conversation on harnessing the power of artificial intelligence for scientific data discovery.

The AIDR (Artificial Intelligence for Data Discovery and Reuse) 2019 conference took place May 13-15 and brought 150 researchers, computer scientists, librarians and industry representatives from 10 countries and 65 institutions and organizations to CMU May 13-15.

Supported by the National Science Foundation (NSF)'s public access initiative, organized by the Carnegie Mellon University Libraries with the assistance of the Pittsburgh Supercomputing Center, and in-cooperation with the Association for Computing Machinery (ACM), AIDR 2019 focused on innovative solutions that would enable scientists and researchers to extract more value from large, complex datasets.

"With the recent advances in machine learning and AI, it is possible to train computers to find optimal solutions to a problem, such as integrating different datasets and extracting metadata," said Huajin Wang, a CMU librarian and conference chair. "We created AIDR 2019 because it's about time that people working in a variety of disciplines come together to benefit from diverse expertise, and address these mutual challenges together, using the power of AI."

Attendees heard from speakers including Tom Mitchell, the E. Fredkin University Professor of Machine Learning and Computer Science and interim dean of the School of Computer Science; Glen de Vries, a 1994 graduate of the Mellon College of Science and president and co-founder of Medidata Solutions; and Natasha Noy, staff scientist at Google AI and team lead for Google Dataset Search. Discipline-specific presentations and panel discussions rounded out the agenda.

Beth Plale, professor of informatics and Computing at Indiana University Bloomington, and a science advisor with the National Science Foundation, addressed attendees at the Artificial Intelligence for Data Discovery and Reuse) 2019 conference at Carnegie Mellon in May.

Rema Padman, Professor of Management Science and Healthcare Informatics in the Heinz College of Information Systems and Public Policy works on data-driven decision making in the IT-enabled healthcare context, particularly to support complex clinical and consumer focused decisions. Her work involves the analysis of large amounts of structured data as well as video data to better understand challenges such as how patients can be more informed about their health conditions to improve self-care.

"The AIDR meeting with its focus on addressing the challenges of data quality, reproducibility and reuse is directly relevant to data driven decision making in healthcare and many other domains," Padman said. "I was particularly struck by the range of topics presented at the conference, including astronomy, archeology, brain science and my own work — all examples of data driven decision making with different types of data, tools and methods, and motivated by exciting research questions."

Convening a diverse set of speakers and attendees for this inaugural event was a priority for the conference organizers. As the explosion in the volume of scientific data has made it increasingly challenging to find data scattered across platforms, greater data complexity and lack of consistent data standards across disciplines present new hurdles to evaluating data quality, reproducing results and reusing data for new discoveries.

"Difficulty in scientific data reuse has been an important issue that impedes rapid progress in many disciplines, yet it is a problem that cannot be easily solved by any single discipline alone," Wang said. "University Libraries have played an essential role in connecting the campus community, providing digital tools and services for open science and open data, and fostering collaborations across disciplines, so it is only fitting that we take a leading role in this initiative."

Last year's Open Science Symposium, organized by the Libraries and held Oct. 18-19 at the Mellon Institute Library, assembled a diverse audience from departments at CMU and the University of Pittsburgh to discuss the growing open science movement, which has aimed to make all research products, including data, code, and publications, freely available.

The Libraries will continue to create venues for cross-disciplinary opportunities for CMU scholars with the second Open Science Symposium on Nov. 7, and a second AIDR event in 2020. A newly created AIDR mailing list, is available for anyone who is interested in the topic of AI and data reuse, and is not limited to conference attendees. Sign up for the mailing list at https://lists.andrew.cmu.edu/mailman/listinfo/aidr-all.