Our Approach
Integrating research and application to advance capabilities within the AI risk management framework (RMF)
Core Evaluation Phases - Shifting Left
System Scoping
Define project goals, data needs, analyses, model requirements, and context considerations to ensure the AI is aligned with the use case.
Comprehensive Model Testing
Assess the AI model’s performance using offline, observational data to determine if it warrants further testing.
Red Teaming
Stress-test the model’s vulnerabilities and assumptions with technical and domain experts to evaluate robustness.
System Field Testing
Evaluate the AI system’s real-world impact on users and communities through iterative pilots and operational trials.
Improving Metrology and Practices for Test and Evaluation
The rapid pace of innovation poses an additional challenge in the form of ongoing pressure to update algorithms, computing infrastructure, corpora of training data, and other technical elements of AI capabilities. This context suggests a “move to the left” approach for test and evaluation, where development stakeholders are engaged earlier in the process timeline. This change allows test-enabled system designs along with engineering processes and tools configured to produce not just deployable models, but also associated bodies of evidence that can support an ongoing process of affordable and confident test and evaluation.
Research Focus
AIMSEC research focuses on advancing measurement science for modern AI systems, including Machine Learning (ML) and Generative AI (GenAI) systems, such as large language models (LLMs).
Based on our expertise in a wide range of application domains including human services, finance, education, energy, and more, we are working with stakeholders across industries to contextualize and test our approaches. This application focus lends concreteness to the research and will benefit the specific domains. The technical results regarding risk management, however, are intended to have broad benefits in diverse applications.
Building Community Around AI Measurement Science & Evaluation
AIMSEC taps into the extensive CMU network of partnerships not just internally but also with industry, government, and academic colleagues elsewhere, to convene workshops, webinars, technology demo showcases, etc. Our work is intended to integrate technical and application considerations, with extensive collaboration across boundaries within the institution, and also to create a framework for collaboration with technical team members at NIST and at other institutions.