New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management

ObjectivesThe aim of this study was to highlight the effects of entering duplicated or overlapping data from published studies using the same data registries into a meta-analysis, including its identification and management using a novel structured framework. Study Design and SettingSecondary analys...

Full description

Saved in:

Bibliographic Details
Main Author:	Merilyn, Lock (author)
Other Authors:	El Ansari, Walid (author)
Format:	article
Published:	2024
Subjects:	Meta-analysis Systematic review Duplicate data Big data Registries Metabolic and bariatric surgery
Online Access:	http://dx.doi.org/10.1016/j.jclinepi.2024.111641 https://www.sciencedirect.com/science/article/pii/S0895435624003974 http://hdl.handle.net/10576/64042
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1857415083775229952
author	Merilyn, Lock
author2	El Ansari, Walid
author2_role	author
author_facet	Merilyn, Lock El Ansari, Walid
author_role	author
dc.creator.none.fl_str_mv	Merilyn, Lock El Ansari, Walid
dc.date.none.fl_str_mv	2024-12-16 2025-03-30T08:11:31Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	http://dx.doi.org/10.1016/j.jclinepi.2024.111641 Lock, M., & El Ansari, W. (2025). New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management. Journal of Clinical Epidemiology, 179, 111641. 0895-4356 https://www.sciencedirect.com/science/article/pii/S0895435624003974 http://hdl.handle.net/10576/64042 179 1878-5921
dc.language.none.fl_str_mv	en
dc.publisher.none.fl_str_mv	Elsevier
dc.rights.none.fl_str_mv	http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Meta-analysis Systematic review Duplicate data Big data Registries Metabolic and bariatric surgery
dc.title.none.fl_str_mv	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
dc.type.none.fl_str_mv	Article info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article
description	ObjectivesThe aim of this study was to highlight the effects of entering duplicated or overlapping data from published studies using the same data registries into a meta-analysis, including its identification and management using a novel structured framework. Study Design and SettingSecondary analysis of data from a proportional meta-analysis of 30-day cumulative incidence of venous thromboembolic events (VTE) after metabolic and bariatric surgery was performed. Sensitivity analysis was conducted a) including all studies regardless of duplication (uncorrected sample) and b) comparing it to a corrected sample of studies. We developed a decision tree framework to identify duplicated data from prospective studies and data registries. ResultsWe demonstrated that biasing from duplicated data, primarily from data registries, underestimated the incidence of VTE in the literature by 0.15% of the patient population (an erroneous difference equivalent to 22.06% of total VTE). This error persisted at 8.16% of total VTE when limiting to studies using a primarily laparoscopic approach. The decision tree framework used a comparison of the data source (country and hospital or registry), sampling time frame (dates/years of included data) and inclusion characteristics (included procedures/diagnoses or inclusion criteria) to identify potentially duplicated data. Inter-rater reliability was excellent (κ = 1.00, P < .001), although only 17.86% of studies coded as containing data duplication were verified by the authors while the remaining studies could not be verified. Lastly, we identified a strong lack of diversity in the geographical origins of the data from the included studies. ConclusionWe demonstrated that inadvertently including duplicated data in a meta-analysis can result in substantially inaccurate pooled estimates. We outlined a comprehensive decision tree framework that future researchers can apply to assist with decision making when identifying and managing duplicated data, including that from prospective trials and data registries or other publicly accessible datasets. Plain Language SummaryWe explored the effects of entering duplicated or overlapping data from published studies using the same data registries into a meta-analysis; and developed a decision tree framework to identify such duplicated data from prospective studies and data registries. We analyzed data of 30-day incidence of venous thromboembolic events after metabolic and bariatric surgery. We demonstrated that including duplicated data, mainly from data registries, in a meta-analysis can result in substantially inaccurate pooled estimates, underestimating the incidence of total venous thromboembolic events by 22.06%. We also found a lack of diversity in the geographical origins of the data. The decision tree compared data source (country and hospital/registry), sampling time frame (dates/years of included data) and inclusion characteristics (inclusion criteria/procedures/diagnoses) to identify potentially duplicated data. Future researchers can apply the framework to make decisions when identifying and managing duplicated data from data registries or other publicly accessible datasets.
eu_rights_str_mv	openAccess
format	article
id	qu_ef8ebf2c67270638df52f959fd0621db
identifier_str_mv	Lock, M., & El Ansari, W. (2025). New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management. Journal of Clinical Epidemiology, 179, 111641. 0895-4356 179 1878-5921
language_invalid_str_mv	en
network_acronym_str	qu
network_name_str	Qatar University repository
oai_identifier_str	oai:qspace.qu.edu.qa:10576/64042
publishDate	2024
publisher.none.fl_str_mv	Elsevier
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	http://creativecommons.org/licenses/by/4.0/
spelling	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and managementMerilyn, LockEl Ansari, WalidMeta-analysisSystematic reviewDuplicate dataBig dataRegistriesMetabolic and bariatric surgeryObjectivesThe aim of this study was to highlight the effects of entering duplicated or overlapping data from published studies using the same data registries into a meta-analysis, including its identification and management using a novel structured framework. Study Design and SettingSecondary analysis of data from a proportional meta-analysis of 30-day cumulative incidence of venous thromboembolic events (VTE) after metabolic and bariatric surgery was performed. Sensitivity analysis was conducted a) including all studies regardless of duplication (uncorrected sample) and b) comparing it to a corrected sample of studies. We developed a decision tree framework to identify duplicated data from prospective studies and data registries. ResultsWe demonstrated that biasing from duplicated data, primarily from data registries, underestimated the incidence of VTE in the literature by 0.15% of the patient population (an erroneous difference equivalent to 22.06% of total VTE). This error persisted at 8.16% of total VTE when limiting to studies using a primarily laparoscopic approach. The decision tree framework used a comparison of the data source (country and hospital or registry), sampling time frame (dates/years of included data) and inclusion characteristics (included procedures/diagnoses or inclusion criteria) to identify potentially duplicated data. Inter-rater reliability was excellent (κ = 1.00, P < .001), although only 17.86% of studies coded as containing data duplication were verified by the authors while the remaining studies could not be verified. Lastly, we identified a strong lack of diversity in the geographical origins of the data from the included studies. ConclusionWe demonstrated that inadvertently including duplicated data in a meta-analysis can result in substantially inaccurate pooled estimates. We outlined a comprehensive decision tree framework that future researchers can apply to assist with decision making when identifying and managing duplicated data, including that from prospective trials and data registries or other publicly accessible datasets. Plain Language SummaryWe explored the effects of entering duplicated or overlapping data from published studies using the same data registries into a meta-analysis; and developed a decision tree framework to identify such duplicated data from prospective studies and data registries. We analyzed data of 30-day incidence of venous thromboembolic events after metabolic and bariatric surgery. We demonstrated that including duplicated data, mainly from data registries, in a meta-analysis can result in substantially inaccurate pooled estimates, underestimating the incidence of total venous thromboembolic events by 22.06%. We also found a lack of diversity in the geographical origins of the data. The decision tree compared data source (country and hospital/registry), sampling time frame (dates/years of included data) and inclusion characteristics (inclusion criteria/procedures/diagnoses) to identify potentially duplicated data. Future researchers can apply the framework to make decisions when identifying and managing duplicated data from data registries or other publicly accessible datasets.Elsevier2025-03-30T08:11:31Z2024-12-16Articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.1016/j.jclinepi.2024.111641Lock, M., & El Ansari, W. (2025). New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management. Journal of Clinical Epidemiology, 179, 111641.0895-4356https://www.sciencedirect.com/science/article/pii/S0895435624003974http://hdl.handle.net/10576/640421791878-5921enhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessoai:qspace.qu.edu.qa:10576/640422025-03-30T19:06:03Z
spellingShingle	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management Merilyn, Lock Meta-analysis Systematic review Duplicate data Big data Registries Metabolic and bariatric surgery
status_str	publishedVersion
title	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
title_full	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
title_fullStr	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
title_full_unstemmed	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
title_short	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
title_sort	New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management
topic	Meta-analysis Systematic review Duplicate data Big data Registries Metabolic and bariatric surgery
url	http://dx.doi.org/10.1016/j.jclinepi.2024.111641 https://www.sciencedirect.com/science/article/pii/S0895435624003974 http://hdl.handle.net/10576/64042

New world of big data—new challenges for evidence synthesis: impact of data duplication on estimates generated by meta-analyses and the development of a framework for its identification and management

Similar Items