Unlocking American Research Dominance: Opportunities and Chokepoints in AI for Science

By: Aaron Bartnick

AI could revolutionize scientific research and power a second century of American leadership, but only if we can overcome gaps in interdisciplinary talent, data accessibility, algorithm development, and computational resources.

Why it matters: Recent advances in AI could rapidly accelerate innovation in fields from agriculture and energy to materials science and cell biology. It could be possible to read and digest every single publication in almost any field, capture that knowledge in a single system, discover patterns across complex datasets, then make predictions with unrivaled visibility and test them through automated experimentation that improves repeatability. This could revolutionize the pace and scale of scientific discovery, ensuring America continues to lead the world in advanced technologies.

The challenge ahead: To realize the promise of AI for Science, CMU research shows we need to address four key chokepoints:

Interdisciplinary expertise at the intersection of AI and a given scientific field.
Access to machine-readable publications and underlying experimental data.
Algorithmic advances to move from identifying correlations in this data to discovering causal relationships.
Computational and energy resources necessary to run such algorithms.

Of these, the two most important bottlenecks specifically for scientific research are limited interdisciplinary expertise and access to publicly available, machine-readable data.

Interdisciplinary expertise: Scientists spend years developing expertise in a particular field or method, and often prefer to leverage that expertise rather than pursue additional training in AI. Funding interdisciplinary PhD programs in areas with specific industry or national security applications — particularly those located in industrial hubs that might otherwise struggle to attract and retain top talent — could help build this new generation of scientific leadership.
Accessible data: Many scientific data sets are either (a) far smaller and more complex than those used to train chatbot and deep learning models, or (b) in formats that are not machine-readable and would take untold hours to make usable. And all scientific disciplines suffer from barriers that limit data sharing across researchers and institutions, including paywalls limiting AI’s ability to access journal articles and institutions limiting access to proprietary datasets. Working with Congress and leading corporations to establish data accessibility standards and offset the costs of opening scientific journals to AI (and the public) could help unlock these valuable resources.

Algorithmic development and access to compute, energy, and capital are also important constraints on applying AI to scientific research. But we have thus far found these bottlenecks are generally no worse for scientific applications than for broader AI use cases.

What we’re doing: Carnegie Mellon is joining with federal agencies and partners across national labs, academia, and industry to spearhead an initiative to build a national network of AI-enabled autonomous experimentation laboratories. The university has begun convening scholars from across the country and is working in partnership with a major U.S. technology company to quantify the scale of these bottlenecks, identify cost-effective solutions, and provide more detailed recommendations to leaders across government, industry, and academia.

The bottom line: If we can address several key chokepoints — most importantly, developing an interdisciplinary workforce and making data more accessible — AI can help ensure America’s continued global dominance in scientific discovery and innovation.