aamc.org does not support this web browser.
  • AAMCNews

    AI in medical education: 5 ways schools are employing new tools

    Artificial intelligence is being tapped to create quizzes, simulate patients, pinpoint student struggles, and write assessments. Human oversight is critical.

    Young woman uses mobile phone on virtual visual screen at night

    In Ohio, a medical school uses artificial intelligence (AI) to generate questions and answers to help students prepare for tests.

    In Florida, an AI tool helps a medical school create evaluations for students applying for residency.

    In Texas and Minnesota, medical students conduct clinical visits with AI-simulated patients on computers.

    These are a few of the ways that medical schools are slowly integrating AI into courses and clinical training, largely along the lines of small pilot efforts. AI tools are showing how they can make some educational processes “more effective and efficient,” says Claudio Violato, PhD, assistant dean for assessment, evaluation, and research at the University of Minnesota Medical School (UMN Medical School).

    “We are just beginning to see how AI will fundamentally upend some of the ways that we educate and train our nation’s future doctors,” says Alison Whelan, MD, chief academic officer of the AAMC. “The cautious experimentation and implementation of AI in medical school offers great promise for faculty and students — but it has to be carried out with great care and clear-eyed assessments.”

    Below are some examples of the cutting-edge AI integrations occurring at medical schools.

    Building clinical skills

    Challenge: Faculty seek ways to give students more opportunities to practice and hone their communication and clinical-reasoning skills through interactions with standardized (that is, simulated) patients.

    Typically, a student meets with a standardized patient in a controlled learning setting, such as an exam room in a clinical lab. The patient has been trained to express specific symptoms, conditions, and medical histories. The student asks questions, might perform an exam, and develops a diagnosis and treatment plan. Faculty members evaluate the student’s performance based on firsthand observations during the visit and on the student’s notes for the medical record.

    The process of training standardized patients (who are typically paid), running through the clinical visits with faculty observers, and evaluating the student interactions and notes requires a lot of time, money, and staff, Violato says. Another complication is variability among human observers, regardless of how much they try to score students on objective criteria, notes Ronald Rodriguez, MD, PhD, professor of urologic science at the University of Texas Health Science Center at San Antonio (UT Health San Antonio).

    AI role: Several schools are starting to use AI to assess students’ clinical skills. One way is for students to interact with AI-generated standardized patients on a computer, with the AI tool responding to student questions and comments. The tool evaluates the visit according to standards on which it has been trained, such as the widely adopted Objective Structured Clinical Examination assessment tool. (This supplements student interactions with human standardized patients.)

    For example, UT Health San Antonio fed an AI program scores of simulated cases covering a wide range of conditions, symptoms, histories, and demographics, Rodriquez says. Plus, staff provided this important instruction: “Be honest with your responses. Don’t offer any more than what they ask, and only answer if you know the correct answer. Do not make up information.”

    At UMN Medical School, students also conduct clinical visits on a computer with AI-generated standardized patients, Violato says. But the school uses AI during visits with human standardized patients as well, recording the visits and then using an AI tool to evaluate the interaction and the students’ notes for the medical record.

    What are faculty learning from these interactions? At the University of Cincinnati College of Medicine (UC College of Medicine), transcripts of the student interactions with AI standardized patients have yielded insights about students who correctly diagnose cases versus those who do not, says Laurah Turner, PhD, MS, associate dean for artificial intelligence and educational informatics.

    One example: “Students who get them right tend to ask questions throughout the entire encounter, whereas students who get them incorrect tend to ask questions at the end. They tend to jump to ordering labs or tests or doing physical exams before working through questions.”

    Evaluating students for residency applications

    Challenge: The Medical Student Performance Evaluation (MSPE) is one of the most important documents for medical students when they apply for residency. As described by the AAMC, the MSPE is created by the school “to provide residency program directors an honest and objective summary of a student’s salient experiences, attributes, and academic performance.”

    The evaluation includes basic factual information about academics and clerkships, performance assessments, and subjective information, such as notable achievements and characteristics. The MSPE is typically put together with contributions from a student’s teachers and clerkship director, under the guidance of an administrator who oversees the process. Students also often contribute to the content and can review the final document.

    That takes significant staff time, says Latha Chandran, MD, MPH, executive dean for education and policy at the University of Miami Miller School of Medicine (Miller SOM). A student might have narrative assessments from eight or more faculty members, which someone (a professor, for instance) has to read to create a summary for the MSPE.

    AI role: The Miller SOM trained an AI tool to read the narrative assessments and create a summary (within minutes), which the professors overseeing the process then use to create the formal evaluation. That not only saves time, Chandran says, but the AI summaries “are more polished and synthesized” than what is typically produced by the “cut and paste” tactic of a person picking out text from numerous narrative assessments.

    Chandran stresses that the MSPE is still created by people and reviewed by each student, who can make corrections and suggest changes in consultation with the professor overseeing the process: “AI is facilitating the creation of it, but the final decision is with the team.”

    Evaluating faculty

    Challenge: Do professors actually read all the student evaluations of their courses and make adjustments accordingly? That’s a common question throughout higher education. The Miller SOM believes AI will make it more likely that the answer is yes.

    Chandran illustrates the challenge for someone who teaches classes and directs clerkships: “Two hundred student evaluations come in. It’s too much” to thoroughly read them all, especially with so much written in narrative form, and to determine the most common or useful comments.

    AI role: The Miller SOM uses AI to read the evaluations and produce the most common observations (perhaps 10) about what to improve and what worked well.

    “So, they did not like my TBL [team-based learning] session. I need to change that,” Chandran says hypothetically. “But they loved this [other aspect of the course]; I’m going to keep doing that.

    “We can communicate to the teachers what the students are telling us.” Professors “can use actionable feedback.”

    Creating questions for test prep

    Challenge: Professors want to create questions to help students study for tests, including the all-important United States Medical Licensing Examinations (USMLE), which are administered in steps both during and after medical school.

    “Generating questions is challenging, because humans have to be trained in question generation, and that is expensive, not only in terms of time but also resources,” says Turner, of the UC College of Medicine. Many students use outside test-preparation services, but “test prep is very expensive. That creates economic barriers” for students to get access to those materials.

    AI role: UC College of Medicine created an AI tool to analyze content from a specific course, then generate USMLE-style study questions and answers (along with explanations for the correct answers) based on that content. Furthermore, the school is training the system to produce questions and answers “that adhere to USMLE standards” for its exams, Turner says.

    The school conducted a pilot test last academic year in a course about the blood system. The course director determined that 85% of the questions and answers met USMLE criteria. After the human review, almost three-quarters of the AI-generated material was provided to the students as study material, Turner adds.

    As for the content that did not meet the standards, she says the human feedback was uploaded into the AI system so that it can learn to improve its generated questions and responses.

    Not only will such a system help students prepare for tests, Turner says, but it should “lower the economic barrier” for students to get study materials, “because we can generate unlimited questions.”

    Helping students improve

    Challenge: It can be difficult to analyze what subjects a student is struggling with and creating content to help them improve.

    AI role: Faculty administrators say some professors are using AI to draft course quizzes, which the professors review and refine before assigning them to students. That opens a possibility for targeted help. Rodriguez is testing this for some of his classes at UT San Antonio: He created an AI program that generates quizzes, grades them, then generates a new quiz for particular students that focuses on material that causes them trouble.

    The program goes through several iterations of quizzes, assessments, and new questions for the student based on the ongoing results, “until you master that particular weakness,” Rodriguez says.

    UC College of Medicine is taking a similar approach. It builds on an AI-generated curriculum mapping program, which seeks to align all curricular content — such as program objectives, course content, assessment items — with learning outcomes. Among other things, the UC College of Medicine program analyzes student quiz results, so professors can see “students who are consistently getting questions wrong in a certain area,” such as dermatology or diabetes care, Turner says. “That’s an opportunity to say, ‘We need to make an intervention.’”

    Here’s an example of one intervention: “We’re going to pump diabetes questions to you, using our [AI] question generator, so that you can be exposed to [that subject] more and learn.”

    In the past, Turner says, “There’s no way we could have done that. I could tell you the student got an 87 on a test, but identifying any pattern in the types of questions that a student is getting wrong would have been impossible due to time and resources.”

    The future

    More such AI efforts are on the way. The Josiah Macy Jr. Foundation is funding demonstration projects to explore the uses of AI in medical education. Last year Harvard Medical School awarded 11 grants of up to $100,000 each for AI innovation projects in medical education.

    But implementation will continue to move slowly, largely through pilots here and there, Turner says.

    “Current applications of AI in medical education primarily exist as small-scale pilots rather than mature, scaled implementations,” she notes. “The fundamental challenge isn’t technological, nor is it innovation. It is [building] the infrastructure to translate these promising pilots into routine educational practice.”

    As the use of AI does pick up, faculty stress that the new tools supplement, rather than replace, the “human in the loop.”

    “The human is the final arbiter of the ‘ground truth’” in assessing AI-generated work with and for students, Rodriguez says. “The human part of the equation is such an important component. We must protect it with the highest level of scrutiny.”