Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review

<h3>Background</h3><p dir="ltr">Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on he...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Alaa Abd-Alrazaq (17430900) (author)
مؤلفون آخرون:	Zeineb Safi (18281719) (author), Mohannad Alajlani (9392676) (author), Jim Warren (9507905) (author), Mowafa Househ (9154124) (author), Kerstin Denecke (11534035) (author)
منشور في:	2020
الموضوعات:	Health sciences Health services and systems Information and computing sciences Human-centred computing chatbots conversational agents health care evaluation metrics
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

_version_	1864513510863536128
author	Alaa Abd-Alrazaq (17430900)
author2	Zeineb Safi (18281719) Mohannad Alajlani (9392676) Jim Warren (9507905) Mowafa Househ (9154124) Kerstin Denecke (11534035)
author2_role	author author author author author
author_facet	Alaa Abd-Alrazaq (17430900) Zeineb Safi (18281719) Mohannad Alajlani (9392676) Jim Warren (9507905) Mowafa Househ (9154124) Kerstin Denecke (11534035)
author_role	author
dc.creator.none.fl_str_mv	Alaa Abd-Alrazaq (17430900) Zeineb Safi (18281719) Mohannad Alajlani (9392676) Jim Warren (9507905) Mowafa Househ (9154124) Kerstin Denecke (11534035)
dc.date.none.fl_str_mv	2020-06-05T03:00:00Z
dc.identifier.none.fl_str_mv	10.2196/18301
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Technical_Metrics_Used_to_Evaluate_Health_Care_Chatbots_Scoping_Review/26299558
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Health sciences Health services and systems Information and computing sciences Human-centred computing chatbots conversational agents health care evaluation metrics
dc.title.none.fl_str_mv	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<h3>Background</h3><p dir="ltr">Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field.</p><h3>Objective</h3><p dir="ltr">This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots.</p><h3>Methods</h3><p dir="ltr">Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated.</p><h3>Results</h3><p dir="ltr">Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content).</p><h3>Conclusions</h3><p dir="ltr">The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies.</p><h2>Other Information</h2><p dir="ltr">Published in: Journal of Medical Internet Research<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.2196/18301" target="_blank">https://dx.doi.org/10.2196/18301</a></p>
eu_rights_str_mv	openAccess
id	Manara2_8d423c88655c138e0d993019bf32509f
identifier_str_mv	10.2196/18301
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/26299558
publishDate	2020
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping ReviewAlaa Abd-Alrazaq (17430900)Zeineb Safi (18281719)Mohannad Alajlani (9392676)Jim Warren (9507905)Mowafa Househ (9154124)Kerstin Denecke (11534035)Health sciencesHealth services and systemsInformation and computing sciencesHuman-centred computingchatbotsconversational agentshealth careevaluationmetrics<h3>Background</h3><p dir="ltr">Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field.</p><h3>Objective</h3><p dir="ltr">This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots.</p><h3>Methods</h3><p dir="ltr">Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated.</p><h3>Results</h3><p dir="ltr">Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content).</p><h3>Conclusions</h3><p dir="ltr">The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies.</p><h2>Other Information</h2><p dir="ltr">Published in: Journal of Medical Internet Research<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.2196/18301" target="_blank">https://dx.doi.org/10.2196/18301</a></p>2020-06-05T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.2196/18301https://figshare.com/articles/journal_contribution/Technical_Metrics_Used_to_Evaluate_Health_Care_Chatbots_Scoping_Review/26299558CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/262995582020-06-05T03:00:00Z
spellingShingle	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review Alaa Abd-Alrazaq (17430900) Health sciences Health services and systems Information and computing sciences Human-centred computing chatbots conversational agents health care evaluation metrics
status_str	publishedVersion
title	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
title_full	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
title_fullStr	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
title_full_unstemmed	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
title_short	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
title_sort	Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review
topic	Health sciences Health services and systems Information and computing sciences Human-centred computing chatbots conversational agents health care evaluation metrics

Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review

مواد مشابهة