Agreement and reliability of global rating versus checklist scores in a high-stakes undergraduate OSCE in Rwanda

Olayinka Rasheed Ibrahim; Natalie McCall; Abebe Bekele; Biniam Ewnte Zelelew; Oluwaseun Ojomo; Anteneh Gadisa Belachew; Equlinet Misganaw Amare; Zelalem Mengistu Gashaw; Birhanu Abera Ayana; Ariane Nina Ndayikeje

doi:10.1186/s12909-026-08809-4

Agreement and reliability of global rating versus checklist scores in a high-stakes undergraduate OSCE in Rwanda

Journal

BMC Medical Education

ISSN

1472-6920

Date Issued

2026-02-14

Author(s)

Olayinka Rasheed Ibrahim

Natalie McCall

Abebe Bekele

Biniam Ewnte Zelelew

Oluwaseun Ojomo

Anteneh Gadisa Belachew

Equlinet Misganaw Amare

Zelalem Mengistu Gashaw

Birhanu Abera Ayana

Ariane Nina Ndayikeje

DOI

10.1186/s12909-026-08809-4

Abstract

Background: Despite the objective structured clinical examinations (OSCE) being widely used in the assessment of clinical competency, the optimal scoring systems remains debatable with limited data from sub-Saharan Africa. This study compared the performance, reliability, and agreement between global rating
scales (GRS) and checklist scores at a comprehensive exit examination for undergraduate medical students in Rwanda.

Methods: This cross-sectional descriptive study was conducted during the final ‘Exit Exams’ of undergraduate medical students at the University of Global Health
Equity, Rwanda. The OSCE included 15 stations spread across major clinical specialties and subspecialties. Each station had a checklist with a total score
of 20 marks and a three-level GRS (failed, borderline, or passed)

Results: A total of 36 students took part in the OSCE examinations. The mean (standard deviation) checklist score was 84.3 (5.3) %, which was lower than the mean GRS score, 94.3 (6.9) %, p < 0.001. All students achieved overall scores above the standard-setting pass mark of 64.4% [set from modified Angoff method], on both scoring systems. While no student failed any station using checklist scores, the GRS identified failures in stations 5 and 6 (one student each), and stations 7 and 13 (two students each). Overall internal consistency (Cronbach's Alpha) across all the stations was 0.760 [ranged from -0.216 (station 15) to 0.746 (station 9)]. Pearson correlation demonstrated a very strong positive correlation between the checklist and GRS (r= 0.924, p<0.001). Bland-Altman plot showed a mean (standard deviation) of difference of 10.01 (2.8) in favor of GRS, with a lower and upper limit of agreement of 4.44 to 15.58 respectively.

Conclusion: Checklist and GRS scores in the OSCEs demonstrated strong positive correlation, but GRS showed a higher discriminatory ability, identifying
performance differences that checklists did not capture. Incorporating GRS alongside checklists may enhance the robustness of high-stakes clinical
assessment.

Key words: Checklist, Global rating scale, High-stake examinations, Undergraduates, Sub-Saharan, Rwanda

Subjects

Global rating scale

Undergraduates

Sub-Saharan

Rwanda

OSCE

Examinations

Clinical competency

Assessment

Performance

Medical education

File(s)

Name

s12909-026-08809-4_reference.pdf

Size

950.84 KB

Format

Adobe PDF

Checksum

(MD5):e1dfffdf8ae120ae47327243eb3e0a07

Options

Agreement and reliability of global rating versus checklist scores in a high-stakes undergraduate OSCE in Rwanda