Output datasets from ML–assisted bibliometric workflow in African phytochemical metabolomics research
<p dir="ltr">This collection contains supplementary datasets generated during the machine learning–assisted bibliometric workflow for metabolomics and phytochemical research. The datasets represent sequential outputs derived from the integration and harmonisation of bibliographic met...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | <p dir="ltr">This collection contains supplementary datasets generated during the machine learning–assisted bibliometric workflow for metabolomics and phytochemical research. The datasets represent sequential outputs derived from the integration and harmonisation of bibliographic metadata from <b>Scopus</b>, <b>Web of Science (WoS)</b>, and <b>Dimensions</b>, processed via R and Python environments.</p><p dir="ltr">The datasets were produced through distinct workflow stages:</p><ul><li><b>Dataset 1A (merged_dataset2.xlsx):</b> Consolidated metadata produced in R from the merged raw bibliographic exports of Scopus, WoS, and Dimensions.</li><li><b>Dataset 1B (sampled_data.xlsx):</b> A stratified random sample generated in Python for pretraining and manual annotation.</li><li><b>Dataset 1C (sample_data_pretrained.xlsx):</b> Annotated sample dataset manually screened according to inclusion and exclusion criteria.</li><li><b>Dataset 1D (highlighted_full_data_with_predictions.xlsx):</b> The complete harmonised dataset automatically classified using the trained XGBoost model.</li><li><b>Dataset 1E (absolute_metabolomics_data.xlsx):</b> Final curated dataset of relevant records extracted from the ML-filtered corpus.</li></ul><p dir="ltr">Importantly, the <b>file names of each dataset</b> presented here were <b>renamed from their original Google Drive file paths</b> (referenced in the Python Google Colab scripts) to ensure <b>sequential, descriptive, and logically ordered naming</b>. This adjustment enhances clarity, reproducibility, and cross-reference consistency across all linked repositories.</p> |
|---|