|

|
 |
Haraysm, P.H.; Woloschuk, W.; Mandin, H.; Brundin-Mather, R. Reliability
and Validity of Interviewers' Judgments of Medical School Candidates.
Academic Medicine, 71;S40-S42, 1996.
PURPOSE: To evaluate the reliability and validity of the
medical school admissions candidate interview; to determine the
extent to which interviewers are consistent in their ratings of
the same candidate (inter-interviewer reliability); and to determine
the extent to which interviewer's ratings correspond to known characteristics
of candidates (validity).
METHODS: In order to collect the required data, six actors
portrayed the scripted roles of poor, average, and good candidates
(referred to as simulated candidates (SCs)) and were included in
the actual candidate interview pool of 200 applicants for the 1995
admissions process at the University of Calgary. SCs were used to
determine whether interviewers can reliably and validly measure
desired noncognitive qualities of medical school candidates. Applicants
were invited for interviews on the basis of a two-person evaluation
of six preadmission areas: academic results, MCAT scores, letters
of reference, extracurricular activities, employment history, and
a written essay. Twenty-five interviewers assessed fourteen attributes
of the interviewees: degree of broad education, knowledge of medicine,
knowledge of University of Calgary Medical School, general motivation,
problem-solving ability, willingness to accept responsibility, ability
to relate to others, leadership potential, self-appraisal, open-mindedness,
maturity, honesty/integrity, sense of humor, and communication skills.
A measure of the interviewers' accuracy in their overall ratings
of the SCs was calculated by a frequency count of "correct"
and "incorrect" ratings. Multivariate analysis of variance
(MANOVA) was used to determine statistically significant shifts
in mean attribute scores by gender of interviewer, gender of candidate,
and performance level of candidate (poor, average, good). Scheffe
post hoc multiple comparisons (p<.05) were used to provide a
more detailed analysis of group differences. A cross-tabulation
and a chi-square test were used to compare the degree of accuracy
between experienced and nonexperienced interviewers.
RESULTS: The authors found that in 56% of the 36 interviews
that were conducted of the SCs, the overall rating matched the performance
level that the actor portrayed. Generalizability analysis indicated
that a relatively moderate portion of variance (9%) was attributed
to differences among the candidates in mean interview ratings. There
was substantial variance (45%) in candidate rating from one interviewer
to the next. The MANOVA indicated a significant second-order interaction
between interviewer gender, candidate gender, and candidate performance
level in relation to how broadly educated the candidate was rated.
The Scheffe post hoc comparisons revealed differences among 5 of
the 14 noncognitive attributes. The cross-tabulations of correct
rating of performance levels by experienced interviewers were significantly
higher (chi square < 0.05) than those of the nonexperienced interviewers.
CONCLUSIONS: The results of this study demonstrated significant
variability among interviewers' SC ratings (reliability = .51) and
moderate validity in interviewers' ratings of a SC's true level
of performance (56% accuracy). The authors recognized that the high
degree of variability may be caused by the differing conclusions
arrived at by the interviewers. In addition, interviewers may have
assigned different levels of importance to each of the 14 attributes.
The overall ratings of experienced interviewers were significantly
more accurate than the ratings of novice interviewers. The authors
suggested that this study revealed the difficulties in the common
medical interviewing process. They suggested however, that the differences
among interviewers can be attenuated by more formal interviewer
training.
|