Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study

<h3>Background</h3><p dir="ltr">Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods f...

Full description

Saved in:

Bibliographic Details
Main Author:	Alaa Abd-alrazaq (17058018) (author)
Other Authors:	Abdulqadir J Nashwan (17991280) (author), Zubair Shah (231886) (author), Ahmad Abujaber (9100064) (author), Dari Alhuwail (6497858) (author), Jens Schneider (16885948) (author), Rawan AlSaad (14159019) (author), Hazrat Ali (421019) (author), Waleed Alomoush (19325926) (author), Arfan Ahmed (17541309) (author), Sarah Aziz (17541312) (author)
Published:	2024
Subjects:	Health sciences Epidemiology Public health Information and computing sciences Machine learning research gap research topic research topics scientific literature literature review machine learning COVID-19 BERTopic topic clustering text analysis BERT NLP natural language processing review methods review methodology SARS-CoV-2 coronavirus
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513509679693824
author	Alaa Abd-alrazaq (17058018)
author2	Abdulqadir J Nashwan (17991280) Zubair Shah (231886) Ahmad Abujaber (9100064) Dari Alhuwail (6497858) Jens Schneider (16885948) Rawan AlSaad (14159019) Hazrat Ali (421019) Waleed Alomoush (19325926) Arfan Ahmed (17541309) Sarah Aziz (17541312)
author2_role	author author author author author author author author author author
author_facet	Alaa Abd-alrazaq (17058018) Abdulqadir J Nashwan (17991280) Zubair Shah (231886) Ahmad Abujaber (9100064) Dari Alhuwail (6497858) Jens Schneider (16885948) Rawan AlSaad (14159019) Hazrat Ali (421019) Waleed Alomoush (19325926) Arfan Ahmed (17541309) Sarah Aziz (17541312)
author_role	author
dc.creator.none.fl_str_mv	Alaa Abd-alrazaq (17058018) Abdulqadir J Nashwan (17991280) Zubair Shah (231886) Ahmad Abujaber (9100064) Dari Alhuwail (6497858) Jens Schneider (16885948) Rawan AlSaad (14159019) Hazrat Ali (421019) Waleed Alomoush (19325926) Arfan Ahmed (17541309) Sarah Aziz (17541312)
dc.date.none.fl_str_mv	2024-03-05T03:00:00Z
dc.identifier.none.fl_str_mv	10.2196/49411
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Machine_Learning_Based_Approach_for_Identifying_Research_Gaps_COVID-19_as_a_Case_Study/26491117
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Health sciences Epidemiology Public health Information and computing sciences Machine learning research gap research topic research topics scientific literature literature review machine learning COVID-19 BERTopic topic clustering text analysis BERT NLP natural language processing review methods review methodology SARS-CoV-2 coronavirus
dc.title.none.fl_str_mv	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<h3>Background</h3><p dir="ltr">Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time-sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest.</p><h3>Objective</h3><p dir="ltr">In this paper, we propose a machine learning–based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study.</p><h3>Methods</h3><p dir="ltr">We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD-19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class-based term frequency-inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance).</p><h3>Results</h3><p dir="ltr">After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: “virus of COVID-19,” “risk factors of COVID-19,” “prevention of COVID-19,” “treatment of COVID-19,” “health care delivery during COVID-19,” “and impact of COVID-19.” The most prominent topic, observed in over half of the analyzed studies, was “the impact of COVID-19.”</p><h3>Conclusions</h3><p dir="ltr">The proposed machine learning–based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.</p><h2>Other Information</h2><p dir="ltr">Published in: JMIR Formative Research<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.2196/49411" target="_blank">https://dx.doi.org/10.2196/49411</a></p>
eu_rights_str_mv	openAccess
id	Manara2_8571162ddab0140383ea1078a50bd669
identifier_str_mv	10.2196/49411
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/26491117
publishDate	2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case StudyAlaa Abd-alrazaq (17058018)Abdulqadir J Nashwan (17991280)Zubair Shah (231886)Ahmad Abujaber (9100064)Dari Alhuwail (6497858)Jens Schneider (16885948)Rawan AlSaad (14159019)Hazrat Ali (421019)Waleed Alomoush (19325926)Arfan Ahmed (17541309)Sarah Aziz (17541312)Health sciencesEpidemiologyPublic healthInformation and computing sciencesMachine learningresearch gapresearch topicresearch topicsscientific literatureliterature reviewmachine learningCOVID-19BERTopictopic clusteringtext analysisBERTNLPnatural language processingreview methodsreview methodologySARS-CoV-2coronavirus<h3>Background</h3><p dir="ltr">Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time-sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest.</p><h3>Objective</h3><p dir="ltr">In this paper, we propose a machine learning–based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study.</p><h3>Methods</h3><p dir="ltr">We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD-19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class-based term frequency-inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance).</p><h3>Results</h3><p dir="ltr">After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: “virus of COVID-19,” “risk factors of COVID-19,” “prevention of COVID-19,” “treatment of COVID-19,” “health care delivery during COVID-19,” “and impact of COVID-19.” The most prominent topic, observed in over half of the analyzed studies, was “the impact of COVID-19.”</p><h3>Conclusions</h3><p dir="ltr">The proposed machine learning–based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.</p><h2>Other Information</h2><p dir="ltr">Published in: JMIR Formative Research<br>License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.2196/49411" target="_blank">https://dx.doi.org/10.2196/49411</a></p>2024-03-05T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.2196/49411https://figshare.com/articles/journal_contribution/Machine_Learning_Based_Approach_for_Identifying_Research_Gaps_COVID-19_as_a_Case_Study/26491117CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/264911172024-03-05T03:00:00Z
spellingShingle	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study Alaa Abd-alrazaq (17058018) Health sciences Epidemiology Public health Information and computing sciences Machine learning research gap research topic research topics scientific literature literature review machine learning COVID-19 BERTopic topic clustering text analysis BERT NLP natural language processing review methods review methodology SARS-CoV-2 coronavirus
status_str	publishedVersion
title	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
title_full	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
title_fullStr	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
title_full_unstemmed	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
title_short	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
title_sort	Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study
topic	Health sciences Epidemiology Public health Information and computing sciences Machine learning research gap research topic research topics scientific literature literature review machine learning COVID-19 BERTopic topic clustering text analysis BERT NLP natural language processing review methods review methodology SARS-CoV-2 coronavirus

Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study

Similar Items