Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images

<p dir="ltr">Synthetic data offers a compelling solution to the challenges associated with acquiring high-quality medical data, which is often constrained by privacy concerns and limited accessibility. This study explores the efficacy of synthetic data generated using diffusion model...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Abdullah Hosseini (22466602) (author)
مؤلفون آخرون: Ahmed Serag (2945643) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513534589665280
author Abdullah Hosseini (22466602)
author2 Ahmed Serag (2945643)
author2_role author
author_facet Abdullah Hosseini (22466602)
Ahmed Serag (2945643)
author_role author
dc.creator.none.fl_str_mv Abdullah Hosseini (22466602)
Ahmed Serag (2945643)
dc.date.none.fl_str_mv 2025-04-09T06:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2025.3555619
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Self-Supervised_Learning_Powered_by_Synthetic_Data_From_Diffusion_Models_Application_to_X-Ray_Images/30405526
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Health sciences
Health services and systems
Information and computing sciences
Artificial intelligence
Cybersecurity and privacy
Machine learning
Artificial intelligence
biomedical imaging
deep learning
diffusion probabilistic
self-supervised learning
synthetic data
Synthetic data
Data models
Training
Biological system modeling
Image segmentation
X-ray imaging
Diffusion processes
Diffusion models
Biomarkers
Medical diagnostic imaging
dc.title.none.fl_str_mv Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">Synthetic data offers a compelling solution to the challenges associated with acquiring high-quality medical data, which is often constrained by privacy concerns and limited accessibility. This study explores the efficacy of synthetic data generated using diffusion models for training deep learning models within a self-supervised learning framework. The primary objective is to evaluate whether synthetic data can effectively preserve critical medical biomarkers and support reliable downstream tasks such as classification and segmentation. Using chest X-ray images as a case study, the results reveal that models pretrained on synthetic data achieve performance comparable to or surpassing those pretrained on real data. Specifically, in pneumonia classification task, the model trained on synthetic data outperformed established benchmarks, achieving an Area Under the Curve of 99.1 and an F1-score of 96.1%. Similarly, for segmentation tasks, the model trained on synthetic data demonstrated robust performance, attaining a Dice score of 0.85. These findings underscore a significant advancement in the generation of synthetic medical images, providing a viable approach to creating realistic, biomarker-preserving datasets that ensure patient confidentiality and enable diverse applications in medical imaging.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3555619" target="_blank">https://dx.doi.org/10.1109/access.2025.3555619</a></p>
eu_rights_str_mv openAccess
id Manara2_5c2996fd6334e966da5d3c4365c429bf
identifier_str_mv 10.1109/access.2025.3555619
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/30405526
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray ImagesAbdullah Hosseini (22466602)Ahmed Serag (2945643)Health sciencesHealth services and systemsInformation and computing sciencesArtificial intelligenceCybersecurity and privacyMachine learningArtificial intelligencebiomedical imagingdeep learningdiffusion probabilisticself-supervised learningsynthetic dataSynthetic dataData modelsTrainingBiological system modelingImage segmentationX-ray imagingDiffusion processesDiffusion modelsBiomarkersMedical diagnostic imaging<p dir="ltr">Synthetic data offers a compelling solution to the challenges associated with acquiring high-quality medical data, which is often constrained by privacy concerns and limited accessibility. This study explores the efficacy of synthetic data generated using diffusion models for training deep learning models within a self-supervised learning framework. The primary objective is to evaluate whether synthetic data can effectively preserve critical medical biomarkers and support reliable downstream tasks such as classification and segmentation. Using chest X-ray images as a case study, the results reveal that models pretrained on synthetic data achieve performance comparable to or surpassing those pretrained on real data. Specifically, in pneumonia classification task, the model trained on synthetic data outperformed established benchmarks, achieving an Area Under the Curve of 99.1 and an F1-score of 96.1%. Similarly, for segmentation tasks, the model trained on synthetic data demonstrated robust performance, attaining a Dice score of 0.85. These findings underscore a significant advancement in the generation of synthetic medical images, providing a viable approach to creating realistic, biomarker-preserving datasets that ensure patient confidentiality and enable diverse applications in medical imaging.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3555619" target="_blank">https://dx.doi.org/10.1109/access.2025.3555619</a></p>2025-04-09T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3555619https://figshare.com/articles/journal_contribution/Self-Supervised_Learning_Powered_by_Synthetic_Data_From_Diffusion_Models_Application_to_X-Ray_Images/30405526CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/304055262025-04-09T06:00:00Z
spellingShingle Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
Abdullah Hosseini (22466602)
Health sciences
Health services and systems
Information and computing sciences
Artificial intelligence
Cybersecurity and privacy
Machine learning
Artificial intelligence
biomedical imaging
deep learning
diffusion probabilistic
self-supervised learning
synthetic data
Synthetic data
Data models
Training
Biological system modeling
Image segmentation
X-ray imaging
Diffusion processes
Diffusion models
Biomarkers
Medical diagnostic imaging
status_str publishedVersion
title Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
title_full Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
title_fullStr Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
title_full_unstemmed Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
title_short Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
title_sort Self-Supervised Learning Powered by Synthetic Data From Diffusion Models: Application to X-Ray Images
topic Health sciences
Health services and systems
Information and computing sciences
Artificial intelligence
Cybersecurity and privacy
Machine learning
Artificial intelligence
biomedical imaging
deep learning
diffusion probabilistic
self-supervised learning
synthetic data
Synthetic data
Data models
Training
Biological system modeling
Image segmentation
X-ray imaging
Diffusion processes
Diffusion models
Biomarkers
Medical diagnostic imaging