|

|
 |
Mitchell, K. J. & Anderson, J. A. (1996). Reliability of Holistic
Scoring for the MCAT Essay. Educational and Psychological Measurement,
771-775.
PURPOSE: In this study, the inter-rater reliability of
MCAT essays is evaluated.
METHOD: The MCAT pilot essay was included in the MCAT for
the first time in 1985 Spring administration. For the purpose of
this study, a sample of 3,117 of the essays was selected to represent
the academic and demographic characteristics of the Spring examinee
population. 20 experienced scorers rated the papers on a 6-point
scale. Analyses of variance were computed for table, batch, reader,
subjects and replications.
RESULTS: The data indicated that 66% of the variation in
scores was due to level differences between essays. The inter-rater
reliability index was estimated at .81. The results also showed
that 34% of the variation in scores was due to batch, table, and
readers nested within table (in rank order).
CONCLUSION: The data suggested a need for a revision of
the scoring process for the future administrations. Authors recommend
frequent calibration exercises, rotating table leaders across tables
and grouping essays in smaller batches for future scoring.
|