Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
<p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513534163943424 |
|---|---|
| author | Mohamed Massaoudi (16888710) |
| author2 | Katherine R. Davis (20462726) |
| author2_role | author |
| author_facet | Mohamed Massaoudi (16888710) Katherine R. Davis (20462726) |
| author_role | author |
| dc.creator.none.fl_str_mv | Mohamed Massaoudi (16888710) Katherine R. Davis (20462726) |
| dc.date.none.fl_str_mv | 2025-05-02T06:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1109/access.2025.3563419 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Adaptive_PPO_With_Multi-Armed_Bandit_Clipping_and_Meta-Control_for_Robust_Power_Grid_Operation_Under_Adversarial_Attacks/30406078 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Engineering Electrical engineering Electronics, sensors and digital hardware Information and computing sciences Artificial intelligence Cybersecurity and privacy Machine learning Adversarial training deep reinforcement learning dynamic epsilon-clipping power grid control power system security proximal policy optimization Training Perturbation methods Optimization Vectors Robustness Adaptation models Stability criteria Resilience Tuning |
| dc.title.none.fl_str_mv | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes a novel composite enhanced proximal policy optimization (CePPO) algorithm to improve power grid operation under adversarial conditions. Specifically, our approach introduces three key innovations: 1) multi-armed bandit (MAB) mechanism for dynamic epsilon-clipping that adaptively adjusts exploration-exploitation trade-offs; 2) meta-controller framework that automatically tunes hyperparameters including the activation learning rate (ALR) penalties and exploration factors; and 3) integrated gradient-based optimization approach that combines policy gradients with environmental feedback. The effectiveness of the proposed model on the IEEE 14-bus system demonstrates that the CePPO achieves approximately 50% higher average rewards and 51% longer stability periods compared to standard PPO while reducing computational overhead by 35%. CePPO demonstrates superior performance under adversarial attacks compared to baseline approaches. The simulation results validate that CePPO’s adaptive parameter tuning and enhanced exploration strategies make it particularly well-suited for the dynamic nature of power grid control. To foster further research and reproducibility, the code is available upon request at https://github.com/Dr-Kate-Davis-s-Research-Team/DRL-CP.S</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3563419" target="_blank">https://dx.doi.org/10.1109/access.2025.3563419</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_545fd2754af59a86b63dc381d749ab71 |
| identifier_str_mv | 10.1109/access.2025.3563419 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/30406078 |
| publishDate | 2025 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial AttacksMohamed Massaoudi (16888710)Katherine R. Davis (20462726)EngineeringElectrical engineeringElectronics, sensors and digital hardwareInformation and computing sciencesArtificial intelligenceCybersecurity and privacyMachine learningAdversarial trainingdeep reinforcement learningdynamic epsilon-clippingpower grid controlpower system securityproximal policy optimizationTrainingPerturbation methodsOptimizationVectorsRobustnessAdaptation modelsStability criteriaResilienceTuning<p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes a novel composite enhanced proximal policy optimization (CePPO) algorithm to improve power grid operation under adversarial conditions. Specifically, our approach introduces three key innovations: 1) multi-armed bandit (MAB) mechanism for dynamic epsilon-clipping that adaptively adjusts exploration-exploitation trade-offs; 2) meta-controller framework that automatically tunes hyperparameters including the activation learning rate (ALR) penalties and exploration factors; and 3) integrated gradient-based optimization approach that combines policy gradients with environmental feedback. The effectiveness of the proposed model on the IEEE 14-bus system demonstrates that the CePPO achieves approximately 50% higher average rewards and 51% longer stability periods compared to standard PPO while reducing computational overhead by 35%. CePPO demonstrates superior performance under adversarial attacks compared to baseline approaches. The simulation results validate that CePPO’s adaptive parameter tuning and enhanced exploration strategies make it particularly well-suited for the dynamic nature of power grid control. To foster further research and reproducibility, the code is available upon request at https://github.com/Dr-Kate-Davis-s-Research-Team/DRL-CP.S</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3563419" target="_blank">https://dx.doi.org/10.1109/access.2025.3563419</a></p>2025-05-02T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3563419https://figshare.com/articles/journal_contribution/Adaptive_PPO_With_Multi-Armed_Bandit_Clipping_and_Meta-Control_for_Robust_Power_Grid_Operation_Under_Adversarial_Attacks/30406078CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/304060782025-05-02T06:00:00Z |
| spellingShingle | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks Mohamed Massaoudi (16888710) Engineering Electrical engineering Electronics, sensors and digital hardware Information and computing sciences Artificial intelligence Cybersecurity and privacy Machine learning Adversarial training deep reinforcement learning dynamic epsilon-clipping power grid control power system security proximal policy optimization Training Perturbation methods Optimization Vectors Robustness Adaptation models Stability criteria Resilience Tuning |
| status_str | publishedVersion |
| title | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| title_full | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| title_fullStr | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| title_full_unstemmed | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| title_short | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| title_sort | Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks |
| topic | Engineering Electrical engineering Electronics, sensors and digital hardware Information and computing sciences Artificial intelligence Cybersecurity and privacy Machine learning Adversarial training deep reinforcement learning dynamic epsilon-clipping power grid control power system security proximal policy optimization Training Perturbation methods Optimization Vectors Robustness Adaptation models Stability criteria Resilience Tuning |