Options
Agreement and reliability of global rating versus checklist scores in a high-stakes undergraduate OSCE in Rwanda
Journal
BMC Medical Education
ISSN
1472-6920
Date Issued
2026-02-14
Author(s)
Olayinka Rasheed Ibrahim
Natalie McCall
Abebe Bekele
Biniam Ewnte Zelelew
Oluwaseun Ojomo
Anteneh Gadisa Belachew
Equlinet Misganaw Amare
Zelalem Mengistu Gashaw
Birhanu Abera Ayana
Ariane Nina Ndayikeje
DOI
10.1186/s12909-026-08809-4
Abstract
Background: Despite the objective structured clinical examinations (OSCE) being widely used in the assessment of clinical competency, the optimal scoring systems remains debatable with limited data from sub-Saharan Africa. This study compared the performance, reliability, and agreement between global rating
scales (GRS) and checklist scores at a comprehensive exit examination for undergraduate medical students in Rwanda.
Methods: This cross-sectional descriptive study was conducted during the final ‘Exit Exams’ of undergraduate medical students at the University of Global Health
Equity, Rwanda. The OSCE included 15 stations spread across major clinical specialties and subspecialties. Each station had a checklist with a total score
of 20 marks and a three-level GRS (failed, borderline, or passed)
Results: A total of 36 students took part in the OSCE examinations. The mean (standard deviation) checklist score was 84.3 (5.3) %, which was lower than the mean GRS score, 94.3 (6.9) %, p < 0.001. All students achieved overall scores above the standard-setting pass mark of 64.4% [set from modified Angoff method], on both scoring systems. While no student failed any station using checklist scores, the GRS identified failures in stations 5 and 6 (one student each), and stations 7 and 13 (two students each). Overall internal consistency (Cronbach's Alpha) across all the stations was 0.760 [ranged from -0.216 (station 15) to 0.746 (station 9)]. Pearson correlation demonstrated a very strong positive correlation between the checklist and GRS (r= 0.924, p<0.001). Bland-Altman plot showed a mean (standard deviation) of difference of 10.01 (2.8) in favor of GRS, with a lower and upper limit of agreement of 4.44 to 15.58 respectively.
Conclusion: Checklist and GRS scores in the OSCEs demonstrated strong positive correlation, but GRS showed a higher discriminatory ability, identifying
performance differences that checklists did not capture. Incorporating GRS alongside checklists may enhance the robustness of high-stakes clinical
assessment.
Key words: Checklist, Global rating scale, High-stake examinations, Undergraduates, Sub-Saharan, Rwanda
scales (GRS) and checklist scores at a comprehensive exit examination for undergraduate medical students in Rwanda.
Methods: This cross-sectional descriptive study was conducted during the final ‘Exit Exams’ of undergraduate medical students at the University of Global Health
Equity, Rwanda. The OSCE included 15 stations spread across major clinical specialties and subspecialties. Each station had a checklist with a total score
of 20 marks and a three-level GRS (failed, borderline, or passed)
Results: A total of 36 students took part in the OSCE examinations. The mean (standard deviation) checklist score was 84.3 (5.3) %, which was lower than the mean GRS score, 94.3 (6.9) %, p < 0.001. All students achieved overall scores above the standard-setting pass mark of 64.4% [set from modified Angoff method], on both scoring systems. While no student failed any station using checklist scores, the GRS identified failures in stations 5 and 6 (one student each), and stations 7 and 13 (two students each). Overall internal consistency (Cronbach's Alpha) across all the stations was 0.760 [ranged from -0.216 (station 15) to 0.746 (station 9)]. Pearson correlation demonstrated a very strong positive correlation between the checklist and GRS (r= 0.924, p<0.001). Bland-Altman plot showed a mean (standard deviation) of difference of 10.01 (2.8) in favor of GRS, with a lower and upper limit of agreement of 4.44 to 15.58 respectively.
Conclusion: Checklist and GRS scores in the OSCEs demonstrated strong positive correlation, but GRS showed a higher discriminatory ability, identifying
performance differences that checklists did not capture. Incorporating GRS alongside checklists may enhance the robustness of high-stakes clinical
assessment.
Key words: Checklist, Global rating scale, High-stake examinations, Undergraduates, Sub-Saharan, Rwanda
File(s)
No Thumbnail Available
Name
s12909-026-08809-4_reference.pdf
Size
950.84 KB
Format
Adobe PDF
Checksum
(MD5):e1dfffdf8ae120ae47327243eb3e0a07