Carnegie Mellon University
AI Measurement Science & Engineering (AIMSEC)

CMU-NIST Collaborative Research Center

Core Evaluation Phases - Shifting Left

System Scoping

Define project goals, data needs, analyses, model requirements, and context considerations to ensure the AI is aligned with the use case.

Comprehensive Model Testing

Assess the AI model’s performance using offline, observational data to determine if it warrants further testing.

Red Teaming

Stress-test the model’s vulnerabilities and assumptions with technical and domain experts to evaluate robustness.

System Field Testing

Evaluate the AI system’s real-world impact on users and communities through iterative pilots and operational trials.

Improving Metrology and Practices for Test and Evaluation

The rapid pace of innovation poses an additional challenge in the form of ongoing pressure to update algorithms, computing infrastructure, corpora of training data, and other technical elements of AI capabilities. This context suggests a “move to the left” approach for test and evaluation, where development stakeholders are engaged earlier in the process timeline. This change allows test-enabled system designs along with engineering processes and tools configured to produce not just deployable models, but also associated bodies of evidence that can support an ongoing process of affordable and confident test and evaluation.

Research Focus

AIMSEC research focuses on advancing measurement science for modern AI systems, including Machine Learning (ML) and Generative AI (GenAI) systems, such as large language models (LLMs).

Based on our expertise in a wide range of application domains including human services, finance, education, energy, and more, we are working with stakeholders across industries to contextualize and test our approaches. This application focus lends concreteness to the research and will benefit the specific domains. The technical results regarding risk management, however, are intended to have broad benefits in diverse applications. 

Building Community Around AI Measurement Science & Evaluation

AIMSEC taps into the extensive CMU network of partnerships not just internally but also with industry, government, and academic colleagues elsewhere, to convene workshops, webinars, technology demo showcases, etc. Our work is intended to integrate technical and application considerations, with extensive collaboration across boundaries within the institution, and also to create a framework for collaboration with technical team members at NIST and at other institutions.