List of datasets in AqSolDB.

<div><p>Aqueous solubility, an essential physical property of compounds, has significant applications across various fields. However, verifying the solubility of compounds through experimental methods often requires substantial human and material resources. To address this issue, this st...

Full description

Saved in:
Bibliographic Details
Main Author: Bin Pan (742525) (author)
Other Authors: Xiaoyu Hou (1571344) (author), Mingxin Zhang (4898947) (author), Jingxian Yu (1332696) (author), Conghui Zhang (3456536) (author), Yunhui Zhang (65111) (author), Xiaolong Su (2015140) (author), Shuangcai Li (16458999) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852017156584112128
author Bin Pan (742525)
author2 Xiaoyu Hou (1571344)
Mingxin Zhang (4898947)
Jingxian Yu (1332696)
Conghui Zhang (3456536)
Yunhui Zhang (65111)
Xiaolong Su (2015140)
Shuangcai Li (16458999)
author2_role author
author
author
author
author
author
author
author_facet Bin Pan (742525)
Xiaoyu Hou (1571344)
Mingxin Zhang (4898947)
Jingxian Yu (1332696)
Conghui Zhang (3456536)
Yunhui Zhang (65111)
Xiaolong Su (2015140)
Shuangcai Li (16458999)
author_role author
dc.creator.none.fl_str_mv Bin Pan (742525)
Xiaoyu Hou (1571344)
Mingxin Zhang (4898947)
Jingxian Yu (1332696)
Conghui Zhang (3456536)
Yunhui Zhang (65111)
Xiaolong Su (2015140)
Shuangcai Li (16458999)
dc.date.none.fl_str_mv 2025-08-29T17:56:10Z
dc.identifier.none.fl_str_mv 10.1371/journal.pone.0330598.t001
dc.relation.none.fl_str_mv https://figshare.com/articles/dataset/List_of_datasets_in_AqSolDB_/30014345
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Genetics
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
prediction results indicate
mean absolute error
essential physical property
generalizable prediction model
stackboost model excels
stackboost model decreases
successfully identified compounds
div >< p
predicting aqueous solubility
stackboost model
model ’
aqueous solubility
xgboost ),
water solubility
transfer learning
throughput screening
systematically compares
significantly outperforming
scale datasets
rf ).
random forest
organic compounds
material resources
high potential
generalization ability
five well
different datasets
determination ()
conducted high
comparative models
adaptive boosting
dc.title.none.fl_str_mv List of datasets in AqSolDB.
dc.type.none.fl_str_mv Dataset
info:eu-repo/semantics/publishedVersion
dataset
description <div><p>Aqueous solubility, an essential physical property of compounds, has significant applications across various fields. However, verifying the solubility of compounds through experimental methods often requires substantial human and material resources. To address this issue, this study introduces the StackBoost model for predicting the solubility of organic compounds and systematically compares it with five well-known ensemble learning algorithms: Adaptive Boosting (AdaBoost), Gradient Boosted Regression Trees (GBRT), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). The prediction results indicate that the StackBoost model excels in predicting aqueous solubility, achieving a coefficient of determination () of 0.90, a root mean square error (RMSE) of 0.29, and a mean absolute error (MAE) of 0.22, significantly outperforming the other comparative models. Furthermore, this study further conducted high-throughput screening on large-scale datasets and successfully identified compounds with high potential for water solubility. Additionally, the model’s generalization ability is verified through transfer learning. Although the performance of the StackBoost model decreases when applied to different datasets, it still shows considerable transferability, making it a more generalizable prediction model for aqueous solubility.</p></div>
eu_rights_str_mv openAccess
id Manara_7bf4c2b4cff3d1b663232da2bc549a39
identifier_str_mv 10.1371/journal.pone.0330598.t001
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/30014345
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling List of datasets in AqSolDB.Bin Pan (742525)Xiaoyu Hou (1571344)Mingxin Zhang (4898947)Jingxian Yu (1332696)Conghui Zhang (3456536)Yunhui Zhang (65111)Xiaolong Su (2015140)Shuangcai Li (16458999)GeneticsEnvironmental Sciences not elsewhere classifiedBiological Sciences not elsewhere classifiedMathematical Sciences not elsewhere classifiedChemical Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedprediction results indicatemean absolute erroressential physical propertygeneralizable prediction modelstackboost model excelsstackboost model decreasessuccessfully identified compoundsdiv >< ppredicting aqueous solubilitystackboost modelmodel ’aqueous solubilityxgboost ),water solubilitytransfer learningthroughput screeningsystematically comparessignificantly outperformingscale datasetsrf ).random forestorganic compoundsmaterial resourceshigh potentialgeneralization abilityfive welldifferent datasetsdetermination ()conducted highcomparative modelsadaptive boosting<div><p>Aqueous solubility, an essential physical property of compounds, has significant applications across various fields. However, verifying the solubility of compounds through experimental methods often requires substantial human and material resources. To address this issue, this study introduces the StackBoost model for predicting the solubility of organic compounds and systematically compares it with five well-known ensemble learning algorithms: Adaptive Boosting (AdaBoost), Gradient Boosted Regression Trees (GBRT), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). The prediction results indicate that the StackBoost model excels in predicting aqueous solubility, achieving a coefficient of determination () of 0.90, a root mean square error (RMSE) of 0.29, and a mean absolute error (MAE) of 0.22, significantly outperforming the other comparative models. Furthermore, this study further conducted high-throughput screening on large-scale datasets and successfully identified compounds with high potential for water solubility. Additionally, the model’s generalization ability is verified through transfer learning. Although the performance of the StackBoost model decreases when applied to different datasets, it still shows considerable transferability, making it a more generalizable prediction model for aqueous solubility.</p></div>2025-08-29T17:56:10ZDatasetinfo:eu-repo/semantics/publishedVersiondataset10.1371/journal.pone.0330598.t001https://figshare.com/articles/dataset/List_of_datasets_in_AqSolDB_/30014345CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/300143452025-08-29T17:56:10Z
spellingShingle List of datasets in AqSolDB.
Bin Pan (742525)
Genetics
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
prediction results indicate
mean absolute error
essential physical property
generalizable prediction model
stackboost model excels
stackboost model decreases
successfully identified compounds
div >< p
predicting aqueous solubility
stackboost model
model ’
aqueous solubility
xgboost ),
water solubility
transfer learning
throughput screening
systematically compares
significantly outperforming
scale datasets
rf ).
random forest
organic compounds
material resources
high potential
generalization ability
five well
different datasets
determination ()
conducted high
comparative models
adaptive boosting
status_str publishedVersion
title List of datasets in AqSolDB.
title_full List of datasets in AqSolDB.
title_fullStr List of datasets in AqSolDB.
title_full_unstemmed List of datasets in AqSolDB.
title_short List of datasets in AqSolDB.
title_sort List of datasets in AqSolDB.
topic Genetics
Environmental Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Mathematical Sciences not elsewhere classified
Chemical Sciences not elsewhere classified
Information Systems not elsewhere classified
prediction results indicate
mean absolute error
essential physical property
generalizable prediction model
stackboost model excels
stackboost model decreases
successfully identified compounds
div >< p
predicting aqueous solubility
stackboost model
model ’
aqueous solubility
xgboost ),
water solubility
transfer learning
throughput screening
systematically compares
significantly outperforming
scale datasets
rf ).
random forest
organic compounds
material resources
high potential
generalization ability
five well
different datasets
determination ()
conducted high
comparative models
adaptive boosting