Viewpoints

Healing a broken clerkship grading system

The way medical students are assessed in clerkships discourages learning, increases stress, and offers little useful feedback, say the authors. It’s also particularly problematic for students from underrepresented backgrounds. Here’s how to reform it.

A group of residents listen to a doctor speaking

By Justin Bullock, MD, MPH and Karen E. Hauer, MD, PhD

Feb. 20, 2020

Editor’s note: The opinions expressed by the authors do not necessarily reflect the views of the AAMC or its members.

Imagine starting a new job in a new setting with unfamiliar coworkers. You receive little feedback on your performance, and your supervisors don't observe you doing some of the job’s key tasks. After two months, you receive an evaluation form with ratings and a grade, and you suspect you need to make changes. However, the evaluation provides no real sense of how you might improve in the future. This is what the experience is like for many medical students during their first year of core clinical clerkships.

What’s more, medical students’ grades matter a great deal. Whether students match into a desired residency program often depends on clerkship grades that may not reflect the depth of their knowledge, skills, potential, or professional growth.

Grades also take an emotional toll on learners: Research reveals that about half of medical students experience burnout, and grading constitutes a major source of stress. Strikingly, only 44% of medical students believe that clerkship grading is fair.

Both URM and non-URM [students] received lower grades than white students in most clerkships, even after adjusting for confounding variables.

Ideally, the assessment of students’ performance should serve two main purposes: (1) to provide feedback to help students improve, and (2) to ensure that students have achieved expected competency. The clerkship grading system in place in most medical schools is both flawed and unjust, missing the mark on both purposes of assessment.

Here's a quick primer on how the current grading system usually works:

A school’s clerkship director or grading committee is tasked with assigning students’ grades. They do this by combining written evaluations from supervising residents and attending physicians (which often contain numerical ratings and lengthier comments) with scores on written and clinical skills exams (typically numerical scores and occasionally comments). Clerkship directors attempt to discern whether a half point difference in scores — or a descriptor that a student was "excellent" while a peer was "superb" in a specific skill — signifies meaningful differences. Students will then receive a grade — often “honors,” “high pass,” “pass,” or “fail” (though few actually fail).

Although well-intentioned, this approach leads to many worrisome problems.

For one, grades are based on imperfect data. Assessment of performance in the clinical workplace is inherently less objective than performance on a written test of facts in the classroom. In addition, clerkship supervisors work with students for varying lengths of time before performing evaluations. Sometimes, supervisors write assessments weeks after observing students. Some supervisors, working hard to juggle their work as clinicians, educators, and assessors, never complete their students’ evaluations. Imagine students’ frustration at receiving an evaluation from their supervisor of two days but not from their supervisor of two weeks. And imagine their dismay at being rated on competencies that supervisors never actually saw them perform, which happens when an evaluator assesses some skills by extrapolating from seeing others.

Strikingly, only 44% of medical students believe that clerkship grading is fair.

In our 2019 study of more than 600 students at six medical schools, respondents shared their perceptions of the various drivers of their final clerkship grades. What did they consider the two most important factors? Being liked and which doctors students worked with — not clinical reasoning or medical knowledge. Least important in determining final grades? Improvement. So, many students believe that if they are not perfect on day one, they cannot earn top grades — and that hard work, learning, and improving are not rewarded as much as the good luck to work with team members who like you.

Also worrisome is assessment’s impact on students’ learning.

Educational theory categorizes learners as mastery-oriented and performance-oriented. Mastery-oriented learners adopt a growth mindset, ask questions, and learn for the sake of learning. Performance-oriented learners gravitate towards tasks that make them look good and avoid tasks that make them look bad. When improvement is not rewarded in clerkship grading, learners deprioritize a growth mindset and adopt a performance orientation. In practice, this means that rather than focusing on mastering material and being as prepared as possible for clinical practice, students often focus instead on pleasing their supervisors. However, students can switch to the desirable mastery orientation in the right learning environment with the right approach to assessment and grading.

Meanwhile, the current approach to clerkship grading is even more problematic when it comes to students from backgrounds underrepresented in medicine (URM).

At our institution, for example, we found that African American, Latinx/Hispanic, and Native American/Pacific Islander students consistently receive slightly lower average scores (about one-tenth of a point) in all clerkships, a difference that was magnified as it translated into their receiving half as many top grades. Similarly, both URM and non-URM (such as Asian) minority students at another institution received lower grades than white students in most clerkships, even after adjusting for confounding variables, suggesting that implicit racial bias likely played a role. Discrimination is not uncommon in medical school: More than 40% of graduates report experiencing bias based on their race, gender, or other personal trait, according to AAMC data.

Supervisors’ comments on students’ performance provide clues into the thought processes that may contribute to these differences in scores. An analysis of more than 87,000 written evaluations showed that, although there were no differences by race, gender, or ethnicity in the 10 words supervisors used most often, other important words did show such differences. Men and non-URM students were more often described based on their competence, with words like “scientific” and “knowledgeable,” while women and URM students were more often described by their personality, with words such as “pleasant” and “lovely.”

Our ultimate goal is to ensure that we are preparing students to become lifelong learners and to provide quality care for diverse patients throughout their careers.

Educators increasingly recognize that a different approach to assessing students in clerkships can address concerns for URM and all medical students. Instead of grades, it would be much better to provide low-stakes assessments individualized to meet students’ needs. Supervising faculty would therefore focus less on judging students and more on coaching and teaching.

When a supervising physician describes what went well and what to do differently next time, this feedback guides and supports trainees in becoming competent medical providers. That, after all, is the goal of medical school.

Ways to improve grading

Institutions should consider large, structural changes to address the many concerns associated with the current grading system:

Know the extent of the problem: A first step that all institutions should take is examining their own grading data. Are there any group differences based on race or gender? This analysis helps reveal whether any learner groups are disadvantaged by current assessment systems, and it invites exploration of the cause of any disparities. Releasing this assessment data to the local educational community — or beyond — improves transparency in grading and promotes institutional accountability.
Consider pass/fail grading: In 2019, our institution joined more than 10 other schools as we switched to pass/fail grading during core clerkships. We also increased our focus on frequent feedback to students. In our new system, faculty coaches who do not participate in high-stakes assessment help students interpret feedback from residents and attendings and set learning goals in the context of trusting relationships. Since implementation, we have noticed a visible decrease in the stress of our medical students around assessment. Not all institutions will want to implement pass/fail grading because they worry that it will make the residency selection process more difficult, yet the benefits of doing so can be worth the potential downsides involved.
Address broader issues of bias: For one, faculty should represent the demographic diversity of the student population and patients they serve. Lack of representation may exacerbate students’ risk of “stereotype threat” — an awareness of negative stereotypes others may hold about one’s group and the fear of fulfilling them. This phenomenon taxes the mental resources and impedes the performance of vulnerable students. In addition, all evaluators should be made aware of the unique challenges of URM students that can affect their learning experience, such as daily microaggressions, and ways to help mitigate them.
Train assessors in student evaluation: Instructors should receive formal training in how to evaluate students. Assessors often receive little or no guidance in filling out rating forms or writing useful descriptive comments about students’ performance. Of course, given the inherent variability in instructors, patients, and clinical services, ratings will always bear some subjectivity, but we can certainly work to improve educators' assessment skills. In addition, training should include completion of an Implicit Association Test (IAT) to increase awareness of implicit biases we all hold and how these biases may impact assessments of students.
Give better-quality feedback: Institutions should commit to providing students with actionable, real-time feedback. Our institution is piloting a mobile app in which each student has their own scannable QR code so that supervisors can record brief, immediate feedback. To ensure culture change toward more low-stakes feedback, students are expected to get this feedback twice each week. More timely feedback is more accurate and helps students learn better.

The time has come for educators to confront the flaws in the current clerkship grading system that interfere with learning, promote inequities, and threaten student wellbeing. By reviewing our own institution’s data with an open mind, we have worked to improve transparency and fairness. Our ultimate goal is to ensure that we are preparing students to become lifelong learners and to provide quality care for diverse patients throughout their careers.

Justin Bullock, MD, MPH

Justin Bullock, MD, MPH, is a first-year resident in internal medicine at the University of California, San Francisco, School of Medicine, and a medical education researcher interested in equity and assessment in undergraduate medical education.

Karen E. Hauer, MD, PhD

Karen E. Hauer, MD, PhD, is associate dean for assessment and professor of medicine at the University of California, San Francisco, School of Medicine. She directs the school’s medical student coaching program and has conducted extensive research on medical education.

Topic:

Healing a broken clerkship grading system

The way medical students are assessed in clerkships discourages learning, increases stress, and offers little useful feedback, say the authors. It’s also particularly problematic for students from underrepresented backgrounds. Here’s how to reform it.

Ways to improve grading

Receive weekly news and insights from the AAMC in your inbox.