Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks

<p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes...

Full description

Saved in:
Bibliographic Details
Main Author: Mohamed Massaoudi (16888710) (author)
Other Authors: Katherine R. Davis (20462726) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513534163943424
author Mohamed Massaoudi (16888710)
author2 Katherine R. Davis (20462726)
author2_role author
author_facet Mohamed Massaoudi (16888710)
Katherine R. Davis (20462726)
author_role author
dc.creator.none.fl_str_mv Mohamed Massaoudi (16888710)
Katherine R. Davis (20462726)
dc.date.none.fl_str_mv 2025-05-02T06:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2025.3563419
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Adaptive_PPO_With_Multi-Armed_Bandit_Clipping_and_Meta-Control_for_Robust_Power_Grid_Operation_Under_Adversarial_Attacks/30406078
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Electrical engineering
Electronics, sensors and digital hardware
Information and computing sciences
Artificial intelligence
Cybersecurity and privacy
Machine learning
Adversarial training
deep reinforcement learning
dynamic epsilon-clipping
power grid control
power system security
proximal policy optimization
Training
Perturbation methods
Optimization
Vectors
Robustness
Adaptation models
Stability criteria
Resilience
Tuning
dc.title.none.fl_str_mv Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes a novel composite enhanced proximal policy optimization (CePPO) algorithm to improve power grid operation under adversarial conditions. Specifically, our approach introduces three key innovations: 1) multi-armed bandit (MAB) mechanism for dynamic epsilon-clipping that adaptively adjusts exploration-exploitation trade-offs; 2) meta-controller framework that automatically tunes hyperparameters including the activation learning rate (ALR) penalties and exploration factors; and 3) integrated gradient-based optimization approach that combines policy gradients with environmental feedback. The effectiveness of the proposed model on the IEEE 14-bus system demonstrates that the CePPO achieves approximately 50% higher average rewards and 51% longer stability periods compared to standard PPO while reducing computational overhead by 35%. CePPO demonstrates superior performance under adversarial attacks compared to baseline approaches. The simulation results validate that CePPO’s adaptive parameter tuning and enhanced exploration strategies make it particularly well-suited for the dynamic nature of power grid control. To foster further research and reproducibility, the code is available upon request at https://github.com/Dr-Kate-Davis-s-Research-Team/DRL-CP.S</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3563419" target="_blank">https://dx.doi.org/10.1109/access.2025.3563419</a></p>
eu_rights_str_mv openAccess
id Manara2_545fd2754af59a86b63dc381d749ab71
identifier_str_mv 10.1109/access.2025.3563419
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/30406078
publishDate 2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial AttacksMohamed Massaoudi (16888710)Katherine R. Davis (20462726)EngineeringElectrical engineeringElectronics, sensors and digital hardwareInformation and computing sciencesArtificial intelligenceCybersecurity and privacyMachine learningAdversarial trainingdeep reinforcement learningdynamic epsilon-clippingpower grid controlpower system securityproximal policy optimizationTrainingPerturbation methodsOptimizationVectorsRobustnessAdaptation modelsStability criteriaResilienceTuning<p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes a novel composite enhanced proximal policy optimization (CePPO) algorithm to improve power grid operation under adversarial conditions. Specifically, our approach introduces three key innovations: 1) multi-armed bandit (MAB) mechanism for dynamic epsilon-clipping that adaptively adjusts exploration-exploitation trade-offs; 2) meta-controller framework that automatically tunes hyperparameters including the activation learning rate (ALR) penalties and exploration factors; and 3) integrated gradient-based optimization approach that combines policy gradients with environmental feedback. The effectiveness of the proposed model on the IEEE 14-bus system demonstrates that the CePPO achieves approximately 50% higher average rewards and 51% longer stability periods compared to standard PPO while reducing computational overhead by 35%. CePPO demonstrates superior performance under adversarial attacks compared to baseline approaches. The simulation results validate that CePPO’s adaptive parameter tuning and enhanced exploration strategies make it particularly well-suited for the dynamic nature of power grid control. To foster further research and reproducibility, the code is available upon request at https://github.com/Dr-Kate-Davis-s-Research-Team/DRL-CP.S</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3563419" target="_blank">https://dx.doi.org/10.1109/access.2025.3563419</a></p>2025-05-02T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3563419https://figshare.com/articles/journal_contribution/Adaptive_PPO_With_Multi-Armed_Bandit_Clipping_and_Meta-Control_for_Robust_Power_Grid_Operation_Under_Adversarial_Attacks/30406078CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/304060782025-05-02T06:00:00Z
spellingShingle Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
Mohamed Massaoudi (16888710)
Engineering
Electrical engineering
Electronics, sensors and digital hardware
Information and computing sciences
Artificial intelligence
Cybersecurity and privacy
Machine learning
Adversarial training
deep reinforcement learning
dynamic epsilon-clipping
power grid control
power system security
proximal policy optimization
Training
Perturbation methods
Optimization
Vectors
Robustness
Adaptation models
Stability criteria
Resilience
Tuning
status_str publishedVersion
title Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
title_full Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
title_fullStr Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
title_full_unstemmed Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
title_short Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
title_sort Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks
topic Engineering
Electrical engineering
Electronics, sensors and digital hardware
Information and computing sciences
Artificial intelligence
Cybersecurity and privacy
Machine learning
Adversarial training
deep reinforcement learning
dynamic epsilon-clipping
power grid control
power system security
proximal policy optimization
Training
Perturbation methods
Optimization
Vectors
Robustness
Adaptation models
Stability criteria
Resilience
Tuning