Disparities in medical recommendations from AI-based chatbots across different countries/regions

Khanisyah E. Gumilar; Birama R. Indraprasta; Yu-Cheng Hsu; Zih-Ying Yu; Hong Chen; Budi Irawan; Zulkarnain Tambunan; Bagus M. Wibowo; Hari Nugroho; Brahmana A. Tjokroprawiro; Erry G. Dachlan; Pungky Mulawardhana; Eccita Rahestyningtyas; Herlangga Pramuditya; Very Great E. Putra; Setyo T. Waluyo; Nathan R. Tan; Royhaan Folarin; Ibrahim H. Ibrahim; Cheng-Han Lin

doi:10.1038/s41598-024-67689-0

Disparities in medical recommendations from AI-based chatbots across different countries/regions

Journal

Scientific Reports

ISSN

2045-2322

Date Issued

2024-07-24

Author(s)

Khanisyah E. Gumilar

Birama R. Indraprasta

Yu-Cheng Hsu

Zih-Ying Yu

Hong Chen

Budi Irawan

Zulkarnain Tambunan

Bagus M. Wibowo

Hari Nugroho

Brahmana A. Tjokroprawiro

Erry G. Dachlan

Pungky Mulawardhana

Eccita Rahestyningtyas

Herlangga Pramuditya

Very Great E. Putra

Setyo T. Waluyo

Nathan R. Tan

Royhaan Folarin

Ibrahim H. Ibrahim

Cheng-Han Lin

Tai-Yu Hung

Ting-Fang Lu

Yen-Fu Chen

Yu-Hsiang Shih

Shao-Jing Wang

Jingshan Huang

Clayton C. Yates

Chien-Hsing Lu

Li-Na Liao

Ming Tan

DOI

10.1038/s41598-024-67689-0

Abstract

This study explores disparities and opportunities in healthcare information provided by AI chatbots. We focused on recommendations for adjuvant therapy in endometrial cancer, analyzing responses across four regions (Indonesia, Nigeria, Taiwan, USA) and three platforms (Bard, Bing, ChatGPT-3.5). Utilizing previously published cases, we asked identical questions to chatbots from each location within a 24-h window. Responses were evaluated in a double-blinded manner on relevance, clarity, depth, focus, and coherence by ten experts in endometrial cancer. Our analysis revealed significant variations across different countries/regions (p< 0.001). Interestingly, Bing’s responses in Nigeria consistently outperformed others (p< 0.05), excelling in all evaluation criteria (p< 0.001). Bard also
performed better in Nigeria compared to other regions (p< 0.05), consistently surpassing them across all categories (p< 0.001, with relevance reaching p< 0.01). Notably, Bard’s overall scores were significantly higher than those of ChatGPT-3.5 and Bing in all locations (p< 0.001). These findings highlight disparities and opportunities in the quality of AI-powered healthcare information based on user location and platform. This emphasizes the necessity for more research and development to guarantee equal access to trustworthy medical information through AI technologies.

Keywords Artificial intelligence, Endometrial cancer, Bing, Bard, ChatGPT, Disparity

Subjects

Artifcial intelligenc...

Endometrial cancer

Bing

Bard

ChatGPT

Disparity

File(s)

Name

s41598-024-67689-0.pdf

Size

1.79 MB

Format

Adobe PDF

Checksum

(MD5):b5ee0b10dcd0d9bd897cd2c44619d94d

Options

Disparities in medical recommendations from AI-based chatbots across different countries/regions