The effects of data augmentation on NN age prediction performance.
<p>Data augmentation introduces random noise to each sample of the training data in each batch of training. This method can help NN algorithms with a large number of parameters generalize better, by preventing the memorization of the samples in the training set. When augmentation levels grow v...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , , , , , , , , , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1852017578909630464 |
|---|---|
| author | John Kruper (18809386) |
| author2 | Adam Richie-Halford (7874510) Joanna Qiao (22057760) Asa Gilmore (18809395) Kelly Chang (22057763) Mareike Grotheer (22057766) Ethan Roy (22057769) Sendy Caffarra (3720577) Teresa Gomez (14486220) Sam Chou (22057772) Matthew Cieslak (3112521) Serge Koudoro (11837609) Eleftherios Garyfallidis (3158781) Theodore D. Satterthwaite (11006319) Jason D. Yeatman (7304606) Ariel Rokem (1369482) |
| author2_role | author author author author author author author author author author author author author author author |
| author_facet | John Kruper (18809386) Adam Richie-Halford (7874510) Joanna Qiao (22057760) Asa Gilmore (18809395) Kelly Chang (22057763) Mareike Grotheer (22057766) Ethan Roy (22057769) Sendy Caffarra (3720577) Teresa Gomez (14486220) Sam Chou (22057772) Matthew Cieslak (3112521) Serge Koudoro (11837609) Eleftherios Garyfallidis (3158781) Theodore D. Satterthwaite (11006319) Jason D. Yeatman (7304606) Ariel Rokem (1369482) |
| author_role | author |
| dc.creator.none.fl_str_mv | John Kruper (18809386) Adam Richie-Halford (7874510) Joanna Qiao (22057760) Asa Gilmore (18809395) Kelly Chang (22057763) Mareike Grotheer (22057766) Ethan Roy (22057769) Sendy Caffarra (3720577) Teresa Gomez (14486220) Sam Chou (22057772) Matthew Cieslak (3112521) Serge Koudoro (11837609) Eleftherios Garyfallidis (3158781) Theodore D. Satterthwaite (11006319) Jason D. Yeatman (7304606) Ariel Rokem (1369482) |
| dc.date.none.fl_str_mv | 2025-08-14T22:22:32Z |
| dc.identifier.none.fl_str_mv | 10.1371/journal.pcbi.1013323.s002 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/figure/The_effects_of_data_augmentation_on_NN_age_prediction_performance_/29916245 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Biochemistry Biological Sciences not elsewhere classified Information Systems not elsewhere classified tissue properties within including hypothesis testing computational advances implemented assess physical properties tractometry uses diffusion div >< p statistical analysis methods software offer orders insight </ p ecosystem also provides brain tractometry processing brain connections also demonstrate software ecosystem integrative ecosystem transformative environment taken together subject age predictive analysis org </ group differences extract insights extensible tools different datasets characteristic structure based data analysis tasks >— provide |
| dc.title.none.fl_str_mv | The effects of data augmentation on NN age prediction performance. |
| dc.type.none.fl_str_mv | Image Figure info:eu-repo/semantics/publishedVersion image |
| description | <p>Data augmentation introduces random noise to each sample of the training data in each batch of training. This method can help NN algorithms with a large number of parameters generalize better, by preventing the memorization of the samples in the training set. When augmentation levels grow very large, however, the signal in the data is overwhelmed by the noise that is added in augmentation, and the algorithm can no longer learn. In our data, we found that augmentation can have dramatic effects on algorithm performance in the brain age prediction task. For example, the resnet NN algorithm, which had poor <i>R</i><sup>2</sup> in the augmentation-free condition, reaches parity with the baseline model at relatively high augmentation levels ( standard error of the mean (SEM), red curves). The lstmfcn NN, which also performs poorly with no augmentation, reached even higher <i>R</i><sup>2</sup> than the baseline model with high levels of augmentation ( SEM, pink curves). However, at these higher levels of augmentation, the data requirements of these two models also increases. Algorithms that were similar in their performance to the baseline in the absence of augmentation improve slightly with the introduction of small amounts of augmentation. For example, the highest <i>R</i><sup>2</sup> reached by any model in these experiments is reached by the blstm1 model at a low value of augmentation ( SEM, gray curves). The relatively-simple mlp4 model architecture that does not perform very well in the absence of augmentation, only becomes worse with the introduction of augmentation (blue curves). Further quantification of these trends is laid out in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s003" target="_blank">S3 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s004" target="_blank">S4 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s005" target="_blank">S5 Fig</a>.</p> <p>(PNG)</p> |
| eu_rights_str_mv | openAccess |
| id | Manara_24beba211877fc3956c8a67950f0671e |
| identifier_str_mv | 10.1371/journal.pcbi.1013323.s002 |
| network_acronym_str | Manara |
| network_name_str | ManaraRepo |
| oai_identifier_str | oai:figshare.com:article/29916245 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | The effects of data augmentation on NN age prediction performance.John Kruper (18809386)Adam Richie-Halford (7874510)Joanna Qiao (22057760)Asa Gilmore (18809395)Kelly Chang (22057763)Mareike Grotheer (22057766)Ethan Roy (22057769)Sendy Caffarra (3720577)Teresa Gomez (14486220)Sam Chou (22057772)Matthew Cieslak (3112521)Serge Koudoro (11837609)Eleftherios Garyfallidis (3158781)Theodore D. Satterthwaite (11006319)Jason D. Yeatman (7304606)Ariel Rokem (1369482)BiochemistryBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtissue properties withinincluding hypothesis testingcomputational advances implementedassess physical propertiestractometry uses diffusiondiv >< pstatistical analysis methodssoftware offer ordersinsight </ pecosystem also providesbrain tractometry processingbrain connectionsalso demonstratesoftware ecosystemintegrative ecosystemtransformative environmenttaken togethersubject agepredictive analysisorg </group differencesextract insightsextensible toolsdifferent datasetscharacteristic structurebased dataanalysis tasks>— provide<p>Data augmentation introduces random noise to each sample of the training data in each batch of training. This method can help NN algorithms with a large number of parameters generalize better, by preventing the memorization of the samples in the training set. When augmentation levels grow very large, however, the signal in the data is overwhelmed by the noise that is added in augmentation, and the algorithm can no longer learn. In our data, we found that augmentation can have dramatic effects on algorithm performance in the brain age prediction task. For example, the resnet NN algorithm, which had poor <i>R</i><sup>2</sup> in the augmentation-free condition, reaches parity with the baseline model at relatively high augmentation levels ( standard error of the mean (SEM), red curves). The lstmfcn NN, which also performs poorly with no augmentation, reached even higher <i>R</i><sup>2</sup> than the baseline model with high levels of augmentation ( SEM, pink curves). However, at these higher levels of augmentation, the data requirements of these two models also increases. Algorithms that were similar in their performance to the baseline in the absence of augmentation improve slightly with the introduction of small amounts of augmentation. For example, the highest <i>R</i><sup>2</sup> reached by any model in these experiments is reached by the blstm1 model at a low value of augmentation ( SEM, gray curves). The relatively-simple mlp4 model architecture that does not perform very well in the absence of augmentation, only becomes worse with the introduction of augmentation (blue curves). Further quantification of these trends is laid out in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s003" target="_blank">S3 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s004" target="_blank">S4 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s005" target="_blank">S5 Fig</a>.</p> <p>(PNG)</p>2025-08-14T22:22:32ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.1371/journal.pcbi.1013323.s002https://figshare.com/articles/figure/The_effects_of_data_augmentation_on_NN_age_prediction_performance_/29916245CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/299162452025-08-14T22:22:32Z |
| spellingShingle | The effects of data augmentation on NN age prediction performance. John Kruper (18809386) Biochemistry Biological Sciences not elsewhere classified Information Systems not elsewhere classified tissue properties within including hypothesis testing computational advances implemented assess physical properties tractometry uses diffusion div >< p statistical analysis methods software offer orders insight </ p ecosystem also provides brain tractometry processing brain connections also demonstrate software ecosystem integrative ecosystem transformative environment taken together subject age predictive analysis org </ group differences extract insights extensible tools different datasets characteristic structure based data analysis tasks >— provide |
| status_str | publishedVersion |
| title | The effects of data augmentation on NN age prediction performance. |
| title_full | The effects of data augmentation on NN age prediction performance. |
| title_fullStr | The effects of data augmentation on NN age prediction performance. |
| title_full_unstemmed | The effects of data augmentation on NN age prediction performance. |
| title_short | The effects of data augmentation on NN age prediction performance. |
| title_sort | The effects of data augmentation on NN age prediction performance. |
| topic | Biochemistry Biological Sciences not elsewhere classified Information Systems not elsewhere classified tissue properties within including hypothesis testing computational advances implemented assess physical properties tractometry uses diffusion div >< p statistical analysis methods software offer orders insight </ p ecosystem also provides brain tractometry processing brain connections also demonstrate software ecosystem integrative ecosystem transformative environment taken together subject age predictive analysis org </ group differences extract insights extensible tools different datasets characteristic structure based data analysis tasks >— provide |