Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.

<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by...

Full description

Saved in:

Bibliographic Details
Main Author:	Mohammad Ehsanul Karim (3234213) (author)
Other Authors:	Yang Lei (136316) (author)
Published:	2025
Subjects:	Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1852019947741380608
author	Mohammad Ehsanul Karim (3234213)
author2	Yang Lei (136316)
author2_role	author
author_facet	Mohammad Ehsanul Karim (3234213) Yang Lei (136316)
author_role	author
dc.creator.none.fl_str_mv	Mohammad Ehsanul Karim (3234213) Yang Lei (136316)
dc.date.none.fl_str_mv	2025-05-28T17:49:40Z
dc.identifier.none.fl_str_mv	10.1371/journal.pone.0324639.t002
dc.relation.none.fl_str_mv	https://figshare.com/articles/dataset/Comparison_of_variable_overlap_of_selected_proxies_across_different_methods_used_to_evaluate_the_association_between_obesity_and_diabetes_from_the_National_Health_and_Nutrition_Examination_Survey_NHANES_for_the_years_2013_2018_Diagonal_entr/29172715
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach
dc.title.none.fl_str_mv	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
dc.type.none.fl_str_mv	Dataset info:eu-repo/semantics/publishedVersion dataset
description	<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.</p>
eu_rights_str_mv	openAccess
id	Manara_7253d6864f2d4f2ab17cb3607eefbbdd
identifier_str_mv	10.1371/journal.pone.0324639.t002
network_acronym_str	Manara
network_name_str	ManaraRepo
oai_identifier_str	oai:figshare.com:article/29172715
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.Mohammad Ehsanul Karim (3234213)Yang Lei (136316)MedicineBiotechnologySociologySpace ScienceEnvironmental Sciences not elsewhere classifiedBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedviable alternatives dueusing multivariate statisticaltraditional statistical approachesnutrition examination surveykitchen sink modeldimensional propensity scoretraditional statistical methodsmachine learning methodscompared methods includingsimpler approaches mayfewer computational demandsconclusion :</ bscenarios prioritizing precisionmethods :</ bresults :</ bleast reliable methodexhibited higher biasoutcome prevalence scenariosrare exposure scenariosconsistently high biassimpler methodshigher biasresults highlightoutcome prevalencecomputational efficiencyrare outcomerare exposureenhance precisionbetter precisionminimizing biaslow biasbias reductionvarious exposuresystematically evaluatestudy aimedstandard errorspecific characteristicsse ),random forestnational healthless suitedkey metricsgenetic algorithmga ).frequent outcomefrequent exposureforward selectionelastic netcompetitive advantagebalanced approach<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.</p>2025-05-28T17:49:40ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0324639.t002https://figshare.com/articles/dataset/Comparison_of_variable_overlap_of_selected_proxies_across_different_methods_used_to_evaluate_the_association_between_obesity_and_diabetes_from_the_National_Health_and_Nutrition_Examination_Survey_NHANES_for_the_years_2013_2018_Diagonal_entr/29172715CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/291727152025-05-28T17:49:40Z
spellingShingle	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. Mohammad Ehsanul Karim (3234213) Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach
status_str	publishedVersion
title	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_full	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_fullStr	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_full_unstemmed	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_short	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
title_sort	Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
topic	Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach

Similar Items