Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.
<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1852019947741380608 |
|---|---|
| author | Mohammad Ehsanul Karim (3234213) |
| author2 | Yang Lei (136316) |
| author2_role | author |
| author_facet | Mohammad Ehsanul Karim (3234213) Yang Lei (136316) |
| author_role | author |
| dc.creator.none.fl_str_mv | Mohammad Ehsanul Karim (3234213) Yang Lei (136316) |
| dc.date.none.fl_str_mv | 2025-05-28T17:49:40Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pone.0324639.t002 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/dataset/Comparison_of_variable_overlap_of_selected_proxies_across_different_methods_used_to_evaluate_the_association_between_obesity_and_diabetes_from_the_National_Health_and_Nutrition_Examination_Survey_NHANES_for_the_years_2013_2018_Diagonal_entr/29172715 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach |
| dc.title.none.fl_str_mv | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| dc.type.none.fl_str_mv | Dataset info:eu-repo/semantics/publishedVersion dataset |
| description | <p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.</p> |
| eu_rights_str_mv | openAccess |
| id | Manara_7253d6864f2d4f2ab17cb3607eefbbdd |
| identifier_str_mv | 10.1371/journal.pone.0324639.t002 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/29172715 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.Mohammad Ehsanul Karim (3234213)Yang Lei (136316)MedicineBiotechnologySociologySpace ScienceEnvironmental Sciences not elsewhere classifiedBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedviable alternatives dueusing multivariate statisticaltraditional statistical approachesnutrition examination surveykitchen sink modeldimensional propensity scoretraditional statistical methodsmachine learning methodscompared methods includingsimpler approaches mayfewer computational demandsconclusion :</ bscenarios prioritizing precisionmethods :</ bresults :</ bleast reliable methodexhibited higher biasoutcome prevalence scenariosrare exposure scenariosconsistently high biassimpler methodshigher biasresults highlightoutcome prevalencecomputational efficiencyrare outcomerare exposureenhance precisionbetter precisionminimizing biaslow biasbias reductionvarious exposuresystematically evaluatestudy aimedstandard errorspecific characteristicsse ),random forestnational healthless suitedkey metricsgenetic algorithmga ).frequent outcomefrequent exposureforward selectionelastic netcompetitive advantagebalanced approach<p>Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings.</p>2025-05-28T17:49:40ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0324639.t002https://figshare.com/articles/dataset/Comparison_of_variable_overlap_of_selected_proxies_across_different_methods_used_to_evaluate_the_association_between_obesity_and_diabetes_from_the_National_Health_and_Nutrition_Examination_Survey_NHANES_for_the_years_2013_2018_Diagonal_entr/29172715CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/291727152025-05-28T17:49:40Z |
| spellingShingle | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. Mohammad Ehsanul Karim (3234213) Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach |
| status_str | publishedVersion |
| title | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| title_full | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| title_fullStr | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| title_full_unstemmed | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| title_short | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| title_sort | Comparison of variable overlap of selected proxies across different methods used to evaluate the association between obesity and diabetes from the National Health and Nutrition Examination Survey (NHANES) for the years 2013–2018. Diagonal entries show the total number of proxies selected by each method, and off-diagonal entries represent the count of shared variables between method pairs. Most methods share a moderate number of proxies (typically 50-60 percent of the smaller set), indicating partial agreement in variable selection. Higher overlap is observed between closely related methods (e.g., LASSO and Elastic Net, or Hybrid with Bross/LASSO), while methods like XGBoost and Genetic Algorithm show lower overlap with others, reflecting divergent selection behavior in high-dimensional settings. |
| topic | Medicine Biotechnology Sociology Space Science Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified viable alternatives due using multivariate statistical traditional statistical approaches nutrition examination survey kitchen sink model dimensional propensity score traditional statistical methods machine learning methods compared methods including simpler approaches may fewer computational demands conclusion :</ b scenarios prioritizing precision methods :</ b results :</ b least reliable method exhibited higher bias outcome prevalence scenarios rare exposure scenarios consistently high bias simpler methods higher bias results highlight outcome prevalence computational efficiency rare outcome rare exposure enhance precision better precision minimizing bias low bias bias reduction various exposure systematically evaluate study aimed standard error specific characteristics se ), random forest national health less suited key metrics genetic algorithm ga ). frequent outcome frequent exposure forward selection elastic net competitive advantage balanced approach |