The effects of data augmentation on NN age prediction performance.

<p>Data augmentation introduces random noise to each sample of the training data in each batch of training. This method can help NN algorithms with a large number of parameters generalize better, by preventing the memorization of the samples in the training set. When augmentation levels grow v...

Full description

Saved in:
Bibliographic Details
Main Author: John Kruper (18809386) (author)
Other Authors: Adam Richie-Halford (7874510) (author), Joanna Qiao (22057760) (author), Asa Gilmore (18809395) (author), Kelly Chang (22057763) (author), Mareike Grotheer (22057766) (author), Ethan Roy (22057769) (author), Sendy Caffarra (3720577) (author), Teresa Gomez (14486220) (author), Sam Chou (22057772) (author), Matthew Cieslak (3112521) (author), Serge Koudoro (11837609) (author), Eleftherios Garyfallidis (3158781) (author), Theodore D. Satterthwaite (11006319) (author), Jason D. Yeatman (7304606) (author), Ariel Rokem (1369482) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1852017578909630464
author John Kruper (18809386)
author2 Adam Richie-Halford (7874510)
Joanna Qiao (22057760)
Asa Gilmore (18809395)
Kelly Chang (22057763)
Mareike Grotheer (22057766)
Ethan Roy (22057769)
Sendy Caffarra (3720577)
Teresa Gomez (14486220)
Sam Chou (22057772)
Matthew Cieslak (3112521)
Serge Koudoro (11837609)
Eleftherios Garyfallidis (3158781)
Theodore D. Satterthwaite (11006319)
Jason D. Yeatman (7304606)
Ariel Rokem (1369482)
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author_facet John Kruper (18809386)
Adam Richie-Halford (7874510)
Joanna Qiao (22057760)
Asa Gilmore (18809395)
Kelly Chang (22057763)
Mareike Grotheer (22057766)
Ethan Roy (22057769)
Sendy Caffarra (3720577)
Teresa Gomez (14486220)
Sam Chou (22057772)
Matthew Cieslak (3112521)
Serge Koudoro (11837609)
Eleftherios Garyfallidis (3158781)
Theodore D. Satterthwaite (11006319)
Jason D. Yeatman (7304606)
Ariel Rokem (1369482)
author_role author
dc.creator.none.fl_str_mv John Kruper (18809386)
Adam Richie-Halford (7874510)
Joanna Qiao (22057760)
Asa Gilmore (18809395)
Kelly Chang (22057763)
Mareike Grotheer (22057766)
Ethan Roy (22057769)
Sendy Caffarra (3720577)
Teresa Gomez (14486220)
Sam Chou (22057772)
Matthew Cieslak (3112521)
Serge Koudoro (11837609)
Eleftherios Garyfallidis (3158781)
Theodore D. Satterthwaite (11006319)
Jason D. Yeatman (7304606)
Ariel Rokem (1369482)
dc.date.none.fl_str_mv 2025-08-14T22:22:32Z
dc.identifier.none.fl_str_mv 10.1371/journal.pcbi.1013323.s002
dc.relation.none.fl_str_mv https://figshare.com/articles/figure/The_effects_of_data_augmentation_on_NN_age_prediction_performance_/29916245
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Biochemistry
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
tissue properties within
including hypothesis testing
computational advances implemented
assess physical properties
tractometry uses diffusion
div >< p
statistical analysis methods
software offer orders
insight </ p
ecosystem also provides
brain tractometry processing
brain connections
also demonstrate
software ecosystem
integrative ecosystem
transformative environment
taken together
subject age
predictive analysis
org </
group differences
extract insights
extensible tools
different datasets
characteristic structure
based data
analysis tasks
>— provide
dc.title.none.fl_str_mv The effects of data augmentation on NN age prediction performance.
dc.type.none.fl_str_mv Image
Figure
info:eu-repo/semantics/publishedVersion
image
description <p>Data augmentation introduces random noise to each sample of the training data in each batch of training. This method can help NN algorithms with a large number of parameters generalize better, by preventing the memorization of the samples in the training set. When augmentation levels grow very large, however, the signal in the data is overwhelmed by the noise that is added in augmentation, and the algorithm can no longer learn. In our data, we found that augmentation can have dramatic effects on algorithm performance in the brain age prediction task. For example, the resnet NN algorithm, which had poor <i>R</i><sup>2</sup> in the augmentation-free condition, reaches parity with the baseline model at relatively high augmentation levels ( standard error of the mean (SEM), red curves). The lstmfcn NN, which also performs poorly with no augmentation, reached even higher <i>R</i><sup>2</sup> than the baseline model with high levels of augmentation ( SEM, pink curves). However, at these higher levels of augmentation, the data requirements of these two models also increases. Algorithms that were similar in their performance to the baseline in the absence of augmentation improve slightly with the introduction of small amounts of augmentation. For example, the highest <i>R</i><sup>2</sup> reached by any model in these experiments is reached by the blstm1 model at a low value of augmentation ( SEM, gray curves). The relatively-simple mlp4 model architecture that does not perform very well in the absence of augmentation, only becomes worse with the introduction of augmentation (blue curves). Further quantification of these trends is laid out in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s003" target="_blank">S3 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s004" target="_blank">S4 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s005" target="_blank">S5 Fig</a>.</p> <p>(PNG)</p>
eu_rights_str_mv openAccess
id Manara_24beba211877fc3956c8a67950f0671e
identifier_str_mv 10.1371/journal.pcbi.1013323.s002
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/29916245
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling The effects of data augmentation on NN age prediction performance.John Kruper (18809386)Adam Richie-Halford (7874510)Joanna Qiao (22057760)Asa Gilmore (18809395)Kelly Chang (22057763)Mareike Grotheer (22057766)Ethan Roy (22057769)Sendy Caffarra (3720577)Teresa Gomez (14486220)Sam Chou (22057772)Matthew Cieslak (3112521)Serge Koudoro (11837609)Eleftherios Garyfallidis (3158781)Theodore D. Satterthwaite (11006319)Jason D. Yeatman (7304606)Ariel Rokem (1369482)BiochemistryBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedtissue properties withinincluding hypothesis testingcomputational advances implementedassess physical propertiestractometry uses diffusiondiv >< pstatistical analysis methodssoftware offer ordersinsight </ pecosystem also providesbrain tractometry processingbrain connectionsalso demonstratesoftware ecosystemintegrative ecosystemtransformative environmenttaken togethersubject agepredictive analysisorg </group differencesextract insightsextensible toolsdifferent datasetscharacteristic structurebased dataanalysis tasks>— provide<p>Data augmentation introduces random noise to each sample of the training data in each batch of training. This method can help NN algorithms with a large number of parameters generalize better, by preventing the memorization of the samples in the training set. When augmentation levels grow very large, however, the signal in the data is overwhelmed by the noise that is added in augmentation, and the algorithm can no longer learn. In our data, we found that augmentation can have dramatic effects on algorithm performance in the brain age prediction task. For example, the resnet NN algorithm, which had poor <i>R</i><sup>2</sup> in the augmentation-free condition, reaches parity with the baseline model at relatively high augmentation levels ( standard error of the mean (SEM), red curves). The lstmfcn NN, which also performs poorly with no augmentation, reached even higher <i>R</i><sup>2</sup> than the baseline model with high levels of augmentation ( SEM, pink curves). However, at these higher levels of augmentation, the data requirements of these two models also increases. Algorithms that were similar in their performance to the baseline in the absence of augmentation improve slightly with the introduction of small amounts of augmentation. For example, the highest <i>R</i><sup>2</sup> reached by any model in these experiments is reached by the blstm1 model at a low value of augmentation ( SEM, gray curves). The relatively-simple mlp4 model architecture that does not perform very well in the absence of augmentation, only becomes worse with the introduction of augmentation (blue curves). Further quantification of these trends is laid out in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s003" target="_blank">S3 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s004" target="_blank">S4 Fig</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013323#pcbi.1013323.s005" target="_blank">S5 Fig</a>.</p> <p>(PNG)</p>2025-08-14T22:22:32ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.1371/journal.pcbi.1013323.s002https://figshare.com/articles/figure/The_effects_of_data_augmentation_on_NN_age_prediction_performance_/29916245CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/299162452025-08-14T22:22:32Z
spellingShingle The effects of data augmentation on NN age prediction performance.
John Kruper (18809386)
Biochemistry
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
tissue properties within
including hypothesis testing
computational advances implemented
assess physical properties
tractometry uses diffusion
div >< p
statistical analysis methods
software offer orders
insight </ p
ecosystem also provides
brain tractometry processing
brain connections
also demonstrate
software ecosystem
integrative ecosystem
transformative environment
taken together
subject age
predictive analysis
org </
group differences
extract insights
extensible tools
different datasets
characteristic structure
based data
analysis tasks
>— provide
status_str publishedVersion
title The effects of data augmentation on NN age prediction performance.
title_full The effects of data augmentation on NN age prediction performance.
title_fullStr The effects of data augmentation on NN age prediction performance.
title_full_unstemmed The effects of data augmentation on NN age prediction performance.
title_short The effects of data augmentation on NN age prediction performance.
title_sort The effects of data augmentation on NN age prediction performance.
topic Biochemistry
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
tissue properties within
including hypothesis testing
computational advances implemented
assess physical properties
tractometry uses diffusion
div >< p
statistical analysis methods
software offer orders
insight </ p
ecosystem also provides
brain tractometry processing
brain connections
also demonstrate
software ecosystem
integrative ecosystem
transformative environment
taken together
subject age
predictive analysis
org </
group differences
extract insights
extensible tools
different datasets
characteristic structure
based data
analysis tasks
>— provide