Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV

Introduction<p>Bayesian approaches to patient phenotyping in clinical observational studies have been limited by the computational challenges associated with applying the Markov Chain Monte Carlo (MCMC) approach to real-world data. Approximate Bayesian inference via optimization of the variati...

Full description

Saved in:
Bibliographic Details
Main Author: Brian Buckley (137185) (author)
Other Authors: Adrian O'Hagan (19779114) (author), Marie Galligan (4402525) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852026235531558912
author Brian Buckley (137185)
author2 Adrian O'Hagan (19779114)
Marie Galligan (4402525)
author2_role author
author
author_facet Brian Buckley (137185)
Adrian O'Hagan (19779114)
Marie Galligan (4402525)
author_role author
dc.creator.none.fl_str_mv Brian Buckley (137185)
Adrian O'Hagan (19779114)
Marie Galligan (4402525)
dc.date.none.fl_str_mv 2024-10-02T04:22:10Z
dc.identifier.none.fl_str_mv 10.3389/fams.2024.1302825.s001
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Data_Sheet_1_Variational_Bayes_latent_class_analysis_for_EHR-based_phenotyping_with_large_real-world_data_CSV/27148170
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Statistics
Computation Theory and Mathematics
Ordinary Differential Equations, Difference Equations and Dynamical Systems
Financial Mathematics
Applied Mathematics not elsewhere classified
Optimisation
Numerical and Computational Mathematics not elsewhere classified
Applied Statistics
Mathematical Physics not elsewhere classified
Numerical Computation
Computation Theory and Mathematics not elsewhere classified
variational Bayes
latent class analysis
patient phenotyping
real-world evidence
electronic health records
dc.title.none.fl_str_mv Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description Introduction<p>Bayesian approaches to patient phenotyping in clinical observational studies have been limited by the computational challenges associated with applying the Markov Chain Monte Carlo (MCMC) approach to real-world data. Approximate Bayesian inference via optimization of the variational evidence lower bound, variational Bayes (VB), has been successfully demonstrated for other applications.</p>Methods<p>We investigate the performance and characteristics of currently available VB and MCMC software to explore the practicability of available approaches and provide guidance for clinical practitioners. Two case studies are used to fully explore the methods covering a variety of real-world data. First, we use the publicly available Pima Indian diabetes data to comprehensively compare VB implementations of logistic regression. Second, a large real-world data set, Optum™ EHR with approximately one million diabetes patients extended the analysis to large, highly unbalanced data containing discrete and continuous variables. A Bayesian patient phenotyping composite model incorporating latent class analysis (LCA) and regression was implemented with the second case study.</p>Results<p>We find that several data characteristics common in clinical data, such as sparsity, significantly affect the posterior accuracy of automatic VB methods compared with conditionally conjugate mean-field methods. We find that for both models, automatic VB approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to stopping time compared with closed-form VB methods.</p>Discussion<p>Our results indicate that the patient phenotyping composite Bayes model is more easily usable for real-world studies if Monte Carlo is replaced with VB. It can potentially become a uniquely useful tool for decision support, especially for rare diseases where gold-standard biomarker data are sparse but prior knowledge can be used to assist model diagnosis and may suggest when biomarker tests are warranted.</p>
eu_rights_str_mv openAccess
id Manara_a5ea95df3698de5272fa774cb7e4cede
identifier_str_mv 10.3389/fams.2024.1302825.s001
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/27148170
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSVBrian Buckley (137185)Adrian O'Hagan (19779114)Marie Galligan (4402525)StatisticsComputation Theory and MathematicsOrdinary Differential Equations, Difference Equations and Dynamical SystemsFinancial MathematicsApplied Mathematics not elsewhere classifiedOptimisationNumerical and Computational Mathematics not elsewhere classifiedApplied StatisticsMathematical Physics not elsewhere classifiedNumerical ComputationComputation Theory and Mathematics not elsewhere classifiedvariational Bayeslatent class analysispatient phenotypingreal-world evidenceelectronic health recordsIntroduction<p>Bayesian approaches to patient phenotyping in clinical observational studies have been limited by the computational challenges associated with applying the Markov Chain Monte Carlo (MCMC) approach to real-world data. Approximate Bayesian inference via optimization of the variational evidence lower bound, variational Bayes (VB), has been successfully demonstrated for other applications.</p>Methods<p>We investigate the performance and characteristics of currently available VB and MCMC software to explore the practicability of available approaches and provide guidance for clinical practitioners. Two case studies are used to fully explore the methods covering a variety of real-world data. First, we use the publicly available Pima Indian diabetes data to comprehensively compare VB implementations of logistic regression. Second, a large real-world data set, Optum™ EHR with approximately one million diabetes patients extended the analysis to large, highly unbalanced data containing discrete and continuous variables. A Bayesian patient phenotyping composite model incorporating latent class analysis (LCA) and regression was implemented with the second case study.</p>Results<p>We find that several data characteristics common in clinical data, such as sparsity, significantly affect the posterior accuracy of automatic VB methods compared with conditionally conjugate mean-field methods. We find that for both models, automatic VB approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to stopping time compared with closed-form VB methods.</p>Discussion<p>Our results indicate that the patient phenotyping composite Bayes model is more easily usable for real-world studies if Monte Carlo is replaced with VB. It can potentially become a uniquely useful tool for decision support, especially for rare diseases where gold-standard biomarker data are sparse but prior knowledge can be used to assist model diagnosis and may suggest when biomarker tests are warranted.</p>2024-10-02T04:22:10ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.3389/fams.2024.1302825.s001https://figshare.com/articles/dataset/Data_Sheet_1_Variational_Bayes_latent_class_analysis_for_EHR-based_phenotyping_with_large_real-world_data_CSV/27148170CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/271481702024-10-02T04:22:10Z
spellingShingle Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
Brian Buckley (137185)
Statistics
Computation Theory and Mathematics
Ordinary Differential Equations, Difference Equations and Dynamical Systems
Financial Mathematics
Applied Mathematics not elsewhere classified
Optimisation
Numerical and Computational Mathematics not elsewhere classified
Applied Statistics
Mathematical Physics not elsewhere classified
Numerical Computation
Computation Theory and Mathematics not elsewhere classified
variational Bayes
latent class analysis
patient phenotyping
real-world evidence
electronic health records
status_str publishedVersion
title Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
title_full Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
title_fullStr Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
title_full_unstemmed Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
title_short Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
title_sort Data_Sheet_1_Variational Bayes latent class analysis for EHR-based phenotyping with large real-world data.CSV
topic Statistics
Computation Theory and Mathematics
Ordinary Differential Equations, Difference Equations and Dynamical Systems
Financial Mathematics
Applied Mathematics not elsewhere classified
Optimisation
Numerical and Computational Mathematics not elsewhere classified
Applied Statistics
Mathematical Physics not elsewhere classified
Numerical Computation
Computation Theory and Mathematics not elsewhere classified
variational Bayes
latent class analysis
patient phenotyping
real-world evidence
electronic health records