AI tools vary in their predictability based on their design, application, and underlying technology. While some AI systems provide consistent outputs within well-defined parameters, others may show greater variability due to their probabilistic nature, complexity of tasks, or frequent updates to their underlying models or knowledge base. Thus, relying on a single assessment of an AI tool’s efficacy and safety does not provide sufficient safeguards against errors, biases, and potential harm to patients or learners. Frequent evaluation is essential, leveraging evidence-based practice frameworks to ensure these tools remain effective and safe in their intended contexts.
Additionally, fostering research initiatives and promoting collaboration ensures that findings inform ongoing practice, policy development, and advances in the field. By integrating these evidence-based practice frameworks, medical education programs can ensure the safe, effective, and equitable use of AI, supporting the development of future-ready health care professionals.
From Principle to Practice
Apply this principle to your practice using the following strategies:
- Implement programmatic assessment for AI tools. Stakeholders should leverage interdisciplinary committees comprising medical educators, clinicians, AI developers, ethicists, and learners to guide the evaluation process and conduct regular review cycles to assess performance trends, identify unintended consequences, and recommend ongoing improvements. These committees should develop frameworks that combine quantitative metrics (e.g., accuracy, reliability, user engagement) with qualitative assessments (e.g., impact on clinical reasoning, educational value, patient safety). Enable user-friendly mechanisms for educators, staff, and learners to report concerns, errors, or unexpected outputs from AI tools.
- Support research in AI evaluation. Foster research initiatives aimed at evaluating AI’s impact on medical education. Establish funding mechanisms and encourage multicenter collaborations to conduct large-scale studies. Engage educators, staff, and learners in systematic research through protected time and resources, enabling robust data collection on AI’s educational impact. Start the evaluation process by defining clear, hypothesis-driven questions that assess an AI tool’s utility, risks, and benefits within its intended context. Disseminate findings through peer-reviewed publications, conferences, and educational repositories to build an evidence base for best practices in AI integration. Ensure research priorities address stakeholder needs and emerging challenges in AI-enhanced education.
- Gather and appraise diverse sources of evidence. Drive evaluation and oversight by leveraging a variety of evidence sources. These include published literature in the form of peer-reviewed research or perspectives and firsthand observations akin to bedside teaching to assess real-world AI performance. Additionally, utilize data from existing or newly implemented monitoring systems to ensure a comprehensive understanding of an AI tool’s utility, risks, and outcomes in practice. Critically appraise the evidence for reliability, validity, and applicability to ensure robust conclusions and actionable insights.
Bibliography
- Beavins E. HLTH24: here’s a first look at a draft “nutrition label” for health AI. Fierce Healthcare. Published October 20, 2024. Accessed December 30, 2024. https://www.fiercehealthcare.com/ai-and-machine-learning/hlth24-heres-first-look-draft-nutritional-…
- Gin BC, O’Sullivan PS, Hauer KE, et al. Entrustment and EPAs for artificial intelligence (AI): a framework to safeguard the use of AI in health professions education. Acad Med. Published online November 14, 2024. doi:10.1097/ACM.0000000000005930