Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.

<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammad Ehsanul Karim (3234213) (author)
Other Authors: Yang Lei (136316) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852019947741380608
author Mohammad Ehsanul Karim (3234213)
author2 Yang Lei (136316)
author2_role author
author_facet Mohammad Ehsanul Karim (3234213)
Yang Lei (136316)
author_role author
dc.creator.none.fl_str_mv Mohammad Ehsanul Karim (3234213)
Yang Lei (136316)
dc.date.none.fl_str_mv 2025-05-28T17:49:40Z
dc.identifier.none.fl_str_mv 10.1371/journal.pone.0324639.t002
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/Comparison_of_variable_overlap_of_selected_proxies_across_different_methods_used_to_evaluate_the_association_between_obesity_and_diabetes_from_the_National_Health_and_Nutrition_Examination_Survey_NHANES_for_the_years_2013_2018_Diagonal_entr/29172715
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Medicine
Biotechnology
Sociology
Space Science
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
viable alternatives due
using multivariate statistical
traditional statistical approaches
nutrition examination survey
kitchen sink model
dimensional propensity score
traditional statistical methods
machine learning methods
compared methods including
simpler approaches may
fewer computational demands
conclusion :</ b
scenarios prioritizing precision
methods :</ b
results :</ b
least reliable method
exhibited higher bias
outcome prevalence scenarios
rare exposure scenarios
consistently high bias
simpler methods
higher bias
results highlight
outcome prevalence
computational efficiency
rare outcome
rare exposure
enhance precision
better precision
minimizing bias
low bias
bias reduction
various exposure
systematically evaluate
study aimed
standard error
specific characteristics
se ),
random forest
national health
less suited
key metrics
genetic algorithm
ga ).
frequent outcome
frequent exposure
forward selection
elastic net
competitive advantage
balanced approach
dc.title.none.fl_str_mv Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description <p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.</p>
eu_rights_str_mv openAccess
id Manara_7253d6864f2d4f2ab17cb3607eefbbdd
identifier_str_mv 10.1371/journal.pone.0324639.t002
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/29172715
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.Mohammad Ehsanul Karim (3234213)Yang Lei (136316)MedicineBiotechnologySociologySpace ScienceEnvironmental Sciences not elsewhere classifiedBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedviable alternatives dueusing multivariate statisticaltraditional statistical approachesnutrition examination surveykitchen sink modeldimensional propensity scoretraditional statistical methodsmachine learning methodscompared methods includingsimpler approaches mayfewer computational demandsconclusion :</ bscenarios prioritizing precisionmethods :</ bresults :</ bleast reliable methodexhibited higher biasoutcome prevalence scenariosrare exposure scenariosconsistently high biassimpler methodshigher biasresults highlightoutcome prevalencecomputational efficiencyrare outcomerare exposureenhance precisionbetter precisionminimizing biaslow biasbias reductionvarious exposuresystematically evaluatestudy aimedstandard errorspecific characteristicsse ),random forestnational healthless suitedkey metricsgenetic algorithmga ).frequent outcomefrequent exposureforward selectionelastic netcompetitive advantagebalanced approach<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.</p>2025-05-28T17:49:40ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0324639.t002https://figshare.com/articles/dataset/Comparison_of_variable_overlap_of_selected_proxies_across_different_methods_used_to_evaluate_the_association_between_obesity_and_diabetes_from_the_National_Health_and_Nutrition_Examination_Survey_NHANES_for_the_years_2013_2018_Diagonal_entr/29172715CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/291727152025-05-28T17:49:40Z
spellingShingle Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
Mohammad Ehsanul Karim (3234213)
Medicine
Biotechnology
Sociology
Space Science
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
viable alternatives due
using multivariate statistical
traditional statistical approaches
nutrition examination survey
kitchen sink model
dimensional propensity score
traditional statistical methods
machine learning methods
compared methods including
simpler approaches may
fewer computational demands
conclusion :</ b
scenarios prioritizing precision
methods :</ b
results :</ b
least reliable method
exhibited higher bias
outcome prevalence scenarios
rare exposure scenarios
consistently high bias
simpler methods
higher bias
results highlight
outcome prevalence
computational efficiency
rare outcome
rare exposure
enhance precision
better precision
minimizing bias
low bias
bias reduction
various exposure
systematically evaluate
study aimed
standard error
specific characteristics
se ),
random forest
national health
less suited
key metrics
genetic algorithm
ga ).
frequent outcome
frequent exposure
forward selection
elastic net
competitive advantage
balanced approach
status_str publishedVersion
title Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_full Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_fullStr Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_full_unstemmed Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_short Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_sort Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
topic Medicine
Biotechnology
Sociology
Space Science
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
viable alternatives due
using multivariate statistical
traditional statistical approaches
nutrition examination survey
kitchen sink model
dimensional propensity score
traditional statistical methods
machine learning methods
compared methods including
simpler approaches may
fewer computational demands
conclusion :</ b
scenarios prioritizing precision
methods :</ b
results :</ b
least reliable method
exhibited higher bias
outcome prevalence scenarios
rare exposure scenarios
consistently high bias
simpler methods
higher bias
results highlight
outcome prevalence
computational efficiency
rare outcome
rare exposure
enhance precision
better precision
minimizing bias
low bias
bias reduction
various exposure
systematically evaluate
study aimed
standard error
specific characteristics
se ),
random forest
national health
less suited
key metrics
genetic algorithm
ga ).
frequent outcome
frequent exposure
forward selection
elastic net
competitive advantage
balanced approach