Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg

<p>The ability to accurately predict the yields of different crop genotypes in response to weather variability is crucial for developing climate resilient crop cultivars. Genotype-environment interactions introduce large variations in crop-climate responses, and are hard to factor in to breedi...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Zahra Khalilzadeh (8877869) (author)
مؤلفون آخرون: Motahareh Kashanian (20413949) (author), Saeed Khaki (8355738) (author), Lizhi Wang (804431) (author)
منشور في: 2024
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1852024537162448896
author Zahra Khalilzadeh (8877869)
author2 Motahareh Kashanian (20413949)
Saeed Khaki (8355738)
Lizhi Wang (804431)
author2_role author
author
author
author_facet Zahra Khalilzadeh (8877869)
Motahareh Kashanian (20413949)
Saeed Khaki (8355738)
Lizhi Wang (804431)
author_role author
dc.creator.none.fl_str_mv Zahra Khalilzadeh (8877869)
Motahareh Kashanian (20413949)
Saeed Khaki (8355738)
Lizhi Wang (804431)
dc.date.none.fl_str_mv 2024-12-11T06:44:39Z
dc.identifier.none.fl_str_mv 10.3389/frai.2024.1312115.s011
dc.relation.none.fl_str_mv https://figshare.com/articles/figure/Image_11_A_hybrid_deep_learning-based_approach_for_optimal_genotype_by_environment_selection_jpeg/28005761
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Knowledge Representation and Machine Learning
convolutional neural network
genotype selection
crop yield prediction
Generalized Ensemble Method
genotype-environment interaction
feature importance analysis
dc.title.none.fl_str_mv Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
dc.type.none.fl_str_mv Image
Figure
info:eu-repo/semantics/publishedVersion
image
description <p>The ability to accurately predict the yields of different crop genotypes in response to weather variability is crucial for developing climate resilient crop cultivars. Genotype-environment interactions introduce large variations in crop-climate responses, and are hard to factor in to breeding programs. Data-driven approaches, particularly those based on machine learning, can help guide breeding efforts by factoring in genotype-environment interactions when making yield predictions. Using a new yield dataset containing 93,028 records of soybean hybrids across 159 locations, 28 states, and 13 years, with 5,838 distinct genotypes and daily weather data over a 214-day growing season, we developed two convolutional neural network (CNN) models: one that integrates CNN and fully-connected neural networks (CNN model), and another that incorporates a long short-term memory (LSTM) layer after the CNN component (CNN-LSTM model). By applying the Generalized Ensemble Method (GEM), we combined the CNN-based models and optimized their weights to improve overall predictive performance. The dataset provided unique genotype information on seeds, enabling an investigation into the potential of planting different genotypes based on weather variables. We employed the proposed GEM model to identify the best-performing genotypes across various locations and weather conditions, making yield predictions for all potential genotypes in each specific setting. To assess the performance of the GEM model, we evaluated it on unseen genotype-location combinations, simulating real-world scenarios where new genotypes are introduced. By combining the base models, the GEM ensemble approach provided much better prediction accuracy compared to using the CNN-LSTM model alone and slightly better accuracy than the CNN model, as measured by both RMSE and MAE on the validation and test sets. The proposed data-driven approach can be valuable for genotype selection in scenarios with limited testing years. In addition, we explored the impact of incorporating state-level soil data alongside the weather, location, genotype and year variables. Due to data constraints, including the absence of latitude and longitude details, we used uniform soil variables for all locations within the same state. This limitation restricted our spatial information to state-level knowledge. Our findings suggested that integrating state-level soil variables did not substantially enhance the predictive capabilities of the models. We also performed a feature importance analysis using RMSE change to identify crucial predictors. Location showed the highest RMSE change, followed by genotype and year. Among weather variables, maximum direct normal irradiance (MDNI) and average precipitation (AP) displayed higher RMSE changes, indicating their importance.</p>
eu_rights_str_mv openAccess
id Manara_b910ee3e351ec7ee143f6deae3a357fb
identifier_str_mv 10.3389/frai.2024.1312115.s011
network_acronym_str Manara
network_name_str ManaraRepo
oai_identifier_str oai:figshare.com:article/28005761
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpegZahra Khalilzadeh (8877869)Motahareh Kashanian (20413949)Saeed Khaki (8355738)Lizhi Wang (804431)Knowledge Representation and Machine Learningconvolutional neural networkgenotype selectioncrop yield predictionGeneralized Ensemble Methodgenotype-environment interactionfeature importance analysis<p>The ability to accurately predict the yields of different crop genotypes in response to weather variability is crucial for developing climate resilient crop cultivars. Genotype-environment interactions introduce large variations in crop-climate responses, and are hard to factor in to breeding programs. Data-driven approaches, particularly those based on machine learning, can help guide breeding efforts by factoring in genotype-environment interactions when making yield predictions. Using a new yield dataset containing 93,028 records of soybean hybrids across 159 locations, 28 states, and 13 years, with 5,838 distinct genotypes and daily weather data over a 214-day growing season, we developed two convolutional neural network (CNN) models: one that integrates CNN and fully-connected neural networks (CNN model), and another that incorporates a long short-term memory (LSTM) layer after the CNN component (CNN-LSTM model). By applying the Generalized Ensemble Method (GEM), we combined the CNN-based models and optimized their weights to improve overall predictive performance. The dataset provided unique genotype information on seeds, enabling an investigation into the potential of planting different genotypes based on weather variables. We employed the proposed GEM model to identify the best-performing genotypes across various locations and weather conditions, making yield predictions for all potential genotypes in each specific setting. To assess the performance of the GEM model, we evaluated it on unseen genotype-location combinations, simulating real-world scenarios where new genotypes are introduced. By combining the base models, the GEM ensemble approach provided much better prediction accuracy compared to using the CNN-LSTM model alone and slightly better accuracy than the CNN model, as measured by both RMSE and MAE on the validation and test sets. The proposed data-driven approach can be valuable for genotype selection in scenarios with limited testing years. In addition, we explored the impact of incorporating state-level soil data alongside the weather, location, genotype and year variables. Due to data constraints, including the absence of latitude and longitude details, we used uniform soil variables for all locations within the same state. This limitation restricted our spatial information to state-level knowledge. Our findings suggested that integrating state-level soil variables did not substantially enhance the predictive capabilities of the models. We also performed a feature importance analysis using RMSE change to identify crucial predictors. Location showed the highest RMSE change, followed by genotype and year. Among weather variables, maximum direct normal irradiance (MDNI) and average precipitation (AP) displayed higher RMSE changes, indicating their importance.</p>2024-12-11T06:44:39ZImageFigureinfo:eu-repo/semantics/publishedVersionimage10.3389/frai.2024.1312115.s011https://figshare.com/articles/figure/Image_11_A_hybrid_deep_learning-based_approach_for_optimal_genotype_by_environment_selection_jpeg/28005761CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/280057612024-12-11T06:44:39Z
spellingShingle Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
Zahra Khalilzadeh (8877869)
Knowledge Representation and Machine Learning
convolutional neural network
genotype selection
crop yield prediction
Generalized Ensemble Method
genotype-environment interaction
feature importance analysis
status_str publishedVersion
title Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
title_full Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
title_fullStr Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
title_full_unstemmed Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
title_short Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
title_sort Image 11_A hybrid deep learning-based approach for optimal genotype by environment selection.jpeg
topic Knowledge Representation and Machine Learning
convolutional neural network
genotype selection
crop yield prediction
Generalized Ensemble Method
genotype-environment interaction
feature importance analysis