Active Featured

Clinical AI Evaluation Methods

Developing rigorous methodologies to assess AI-assisted clinical decision support systems, from algorithm performance to real-world clinical impact.

Funding NIH R01, AHRQ
Partners UNC School of Medicine, Duke Clinical Research Institute

Overview

Clinical AI systems are rapidly being deployed in healthcare settings, but methods for evaluating their real-world effectiveness lag behind. This research program develops rigorous approaches to assess AI-assisted clinical decision support from multiple perspectives.

Key Research Questions

  1. How do we measure the true clinical impact of AI diagnostic tools? Traditional performance metrics (sensitivity, specificity, AUC) don’t capture how AI changes clinician behavior or patient outcomes.

  2. What role does “soft ground truth” play in AI evaluation? When even expert labels are uncertain, how should we calibrate and assess AI predictions?

  3. How do implementation factors affect AI effectiveness? The same algorithm may perform differently depending on workflow integration, clinician training, and patient population.

Current Projects

Evaluating AI-Assisted Radiology

A multi-site study examining how AI diagnostic aids affect radiologist performance and patient outcomes in breast cancer screening.

Soft Ground Truth Methods

Developing statistical approaches for evaluating AI systems when gold-standard labels are unavailable or uncertain.

Implementation Science for Clinical AI

Qualitative and quantitative research on barriers and facilitators to effective AI implementation in clinical settings.