Evaluating artificial intelligence chatbots for patient education in oral and maxillofacial radiology

ObjectiveThis study aimed to compare the quality and readability of the responses generated by 3 publicly available artificial intelligence (AI) chatbots in answering frequently asked questions (FAQs) related to Oral and Maxillofacial Radiology (OMR) to assess their suitability for patient education...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Helvacioglu-Yigit, Dilek (author)
مؤلفون آخرون: Demirturk, Husniye (author), Ali, Kamran (author), Tamimi, Dania (author), Koenig, Lisa (author), Almashraqi, Abeer (author)
التنسيق: article
منشور في: 2025
الموضوعات:
الوصول للمادة أونلاين:http://dx.doi.org/10.1016/j.oooo.2025.01.001
https://www.sciencedirect.com/science/article/pii/S2212440325000033
http://hdl.handle.net/10576/64060
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:ObjectiveThis study aimed to compare the quality and readability of the responses generated by 3 publicly available artificial intelligence (AI) chatbots in answering frequently asked questions (FAQs) related to Oral and Maxillofacial Radiology (OMR) to assess their suitability for patient education. Study DesignFifteen OMR-related questions were selected from professional patient information websites. These questions were posed to ChatGPT-3.5 by OpenAI, Gemini 1.5 Pro by Google, and Copilot by Microsoft to generate responses. Three board-certified OMR specialists evaluated the responses regarding scientific adequacy, ease of understanding, and overall reader satisfaction. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores. The Wilcoxon signed-rank test was conducted to compare the scores assigned by the evaluators to the responses from the chatbots and professional websites. Interevaluator agreement was examined by calculating the Fleiss kappa coefficient. ResultsThere were no significant differences between groups in terms of scientific adequacy. In terms of readability, chatbots had overall mean FKGL and FRE scores of 12.97 and 34.11, respectively. Interevaluator agreement level was generally high. ConclusionsAlthough chatbots are relatively good at responding to FAQs, validating AI-generated information using input from healthcare professionals can enhance patient care and safety. Readability of the text content in the chatbots and websites requires high reading levels.