When people aspiring to become doctors submit their applications to the Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, they hope their academic achievements, volunteer work, and motivations will impress the first person who reads about them. But the first reader of those applications is not a person — it’s an artificial intelligence (AI) system.
The AI system acts as an initial screener for the approximately 5,000 applications that come to the New York school each year, recommending who should be invited for interviews, who does not meet the baseline criteria, and the maybes who should be reviewed for further consideration (the largest group). Screeners on the admissions committee review the applications of those designated as “yes” and “further review” to decide which 800 or so students get interview offers.
This process helps the reviewers better assess the applications, says Rona Woldenberg, MD, the school’s associate dean for admissions. “We took the pool of 5,000 applications and shrunk it down to 1,500 to 2,000” for committee review, she says. “That enabled us to be more efficient and to focus our work on where it really needed to be” — carefully considering the most qualified applicants.
Just as important, Woldenberg says, is that using one AI tool to initially screen every application can remove much of the personal bias and variability that inevitably creep in among groups of human reviewers, no matter how they try to avoid it.
The Zucker School of Medicine is among a handful of medical schools that are using or exploring the use of AI tools in admissions. The NYU Grossman School of Medicine in New York City uses an AI tool for initial screening as well. The University of Cincinnati College of Medicine (UC College of Medicine) and The George Washington University School of Medicine and Health Sciences (GW SMHS) are developing AI platforms they hope to pilot within a year or two. Others, like the University of California, San Diego, School of Medicine, are having discussions about how they might do the same.
Admissions administrators at these schools believe AI technology offers the potential to more efficiently and equitably process a volume of applications that exponentially exceeds the available slots for students.
“Last year we had 5,000 applicants, and we have to get that down to about 180” accepted students, notes Laurah Turner, PhD, MS, associate dean for artificial intelligence and educational informatics at UC College of Medicine.
That takes enormous staff resources. Kevin Nies, Med, assistant dean of admissions at GW SMHS, estimates that from July through February, each member of the review team spends 20 to 25 percent of their time evaluating their share — about 2,500 — of the 13,000 applications. A study of the AI program at the NYU Grossman School of Medicine estimated that the manual screening process of applications “involves more than 6,000 hours of faculty time yearly.”
Admissions leaders discussed how AI might improve parts of the assessment process, how leaders are developing AI programs to do that, the results so far, and what lies ahead.
AI advantages
Medical schools provide admissions screeners with extensive training about how to assess applications, taking into account academic achievement, clinical experience, extracurricular activities, personal attributes, and the school’s mission. Ideally, each reviewer assesses every application the same way as the other reviewers. In reality, that’s impossible.
“When you have human reviewers, there’s going to be a lot of variation” among them in their perception of applicants, Turner says.
One common variation is the value each committee reviewer places, even subconsciously, on student experiences. Ioannis Koutroulis, MD, PhD, associate dean of MD admissions at GW SMHS, notes that one reviewer might give more weight to an applicant whose courses and career ambitions lie in research, while another favors someone who is dedicated to community service, and yet another perks up over applicants from Harvard.
“It’s inevitable that different reviewers will value one experience over another based on their own background,” he says.
Even within an individual, shifts occur in perception and judgment as admissions screeners wade through the applications. Turner rhetorically asks: “Is their processing of the first essay the same as the 500th essay?”
Imagine, then, a reviewer who reads every application, each within seconds rather than half an hour, around the clock, and applies the exact same standards to each one. That could be AI.
“AI can provide an approach that gives every application the same consistent review, potentially reducing the variation or subjectivity inherent in using groups of human screeners,” says Marc Triola, MD, associate dean for educational informatics and a professor of medicine at NYU Langone Health in New York City. Triola oversaw the AI pilot for the NYU Grossman School of Medicine.
Graham Keir, MD, a neuroradiologist in New York and a consultant on AI for the Zucker School of Medicine, explains how: “If you give AI a certain set of inputs [information about the applicants], it’s always going to give you the same outputs, no matter what time of day, no matter how it’s feeling. It didn’t have a high clinical workload that day. It’s not tired.”
“What we're trying to do is to be more efficient and more equitable.”
That efficiency and equity depend in large part on how the AI systems are taught to evaluate applicants.
Teaching intelligence
How do you teach an AI system to know what a medical school values most in prospective students? Admissions departments have partnered with engineering and technology divisions within their university systems to build programs based on whom the school has accepted in the past and what it looks for in applications.
The baseline criteria are easy for AI to sort, such as minimum GPAs and MCAT scores. More nuanced is identifying the experiences and statements that indicate which students align most with the school’s priorities: Serving disadvantaged populations? Interest in primary care? Driving medical research?
“You download thousands of applications and train the system on what you’re looking for,” says Koutroulis at GW SMHS. “If I tell the system, ‘I want you to look for service, volunteer, work, public health,’ the system needs to learn how to identify those in the experiences section of the application. It takes [inputting] years of applications for the system to learn how to identify what you’re looking for.”
Identifying the experiences and characteristics that align with a school’s mission goes beyond searching for obvious key words, such as “service.” At UC College of Medicine, Turner is working to develop AI systems that could detect language patterns or latent traits. For example, an application with public service references might signal for qualities like conscientiousness, even though the applicant did not write “I am conscientious.”
The training process at the NYU Grossman School of Medicine illustrates a typical approach: It used a sampling of more than 14,000 applications from 2013 to 2017, along with the outcomes of the human screening for those applications, so the system could learn what elements were most likely to yield interview invitations. The researchers ran a trial of prospective students in a subsequent admission cycle to see how it compared with the results of the human reviews for those students. (The AI assessments did not affect admissions decisions during the trial.)
The NYU Grossman School of Medicine and the Zucker School of Medicine report that their pilot tests showed that the AI recommendations repeatedly replicated the recommendations of the human reviewers. Confident that the AI systems were correctly assessing applications according to the standards of the school, they moved to using AI to conduct initial screenings.
Those two schools use their AI tools to assess the structured parts of the application, where applicants provide facts about such things as their academic credentials and extracurricular experiences. Cincinnati plans to start with assessing just the essays. The tool under development at GWU will cover the entire application. These differences reflect what admissions staff believe will provide the most useful initial results for their priorities as they take their first steps into AI assessments; they might evolve to use AI for other parts of the application.
The schools are following guidelines developed by a committee convened by the Association of American Medical Colleges for the use of AI in admissions, which include policies on protection against biases, alignment with the medical school’s objectives, and data privacy.
The future
The results of AI screening so far identify an ironic measure of success: to produce the same results as humans have produced. That saves enormous staff time at the first stage of reviews, but relying on data based on prior decisions might perpetuate previous bias.
Admissions officers and tech engineers don’t contend that at this point AI removes bias from the selection process. The results highlight a common challenge in adopting AI to perform tasks once assigned to humans: If an AI system is built on past human results, it will reflect human biases that produced those results.
“There’s going to be some amount of bias when you train an AI model,” Keir says. “How can we mitigate that bias?”
For example, the system that Keir tested for the Zucker School of Medicine seeks to reduce bias by not providing such information as the applicant’s name, place of birth, and photo. Admissions and engineering staff hope they can use AI to analyze decisions on a micro level and see where long-standing biases have moved assessments toward or away from certain applicants.
At GW SMHS, Nies hopes that AI will eventually help to identify characteristics of applicants who were most likely to matriculate at the school and to succeed academically. “Maybe here are patterns and combinations that we never thought of before,” he says.
For all aspiring physicians, the implementation of AI might induce concern that their applications will get accepted or rejected right off the bat by a machine rather than by a person. Koutroulis, at GW SMHS, notes that for many years, computer programs in many medical schools have sorted applications by basic criteria, mostly academic scores, that send some applications to an effective “no” pile and leave others to be reviewed by the admissions committee.
Admissions leaders stress that the AI tools produce recommendations, not decisions, and will be reviewed by staff to confirm that the admissions criteria have been met and to decide whom to invite for interviews.
“This is not something that will replace human review,” Koutroulis says.