Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks

<p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Mohamed Massaoudi (16888710) (author)
مؤلفون آخرون:	Katherine R. Davis (20462726) (author)
منشور في:	2025
الموضوعات:	Engineering Electrical engineering Electronics, sensors and digital hardware Information and computing sciences Artificial intelligence Cybersecurity and privacy Machine learning Adversarial training deep reinforcement learning dynamic epsilon-clipping power grid control power system security proximal policy optimization Training Perturbation methods Optimization Vectors Robustness Adaptation models Stability criteria Resilience Tuning
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

الوصف
الملخص:	<p dir="ltr">The seamless and resilient operation of power grids is crucial for ensuring a reliable electricity supply. However, maintaining high operational stability is increasingly challenging due to evolving grid complexities and potential adversarial threats. This paper proposes a novel composite enhanced proximal policy optimization (CePPO) algorithm to improve power grid operation under adversarial conditions. Specifically, our approach introduces three key innovations: 1) multi-armed bandit (MAB) mechanism for dynamic epsilon-clipping that adaptively adjusts exploration-exploitation trade-offs; 2) meta-controller framework that automatically tunes hyperparameters including the activation learning rate (ALR) penalties and exploration factors; and 3) integrated gradient-based optimization approach that combines policy gradients with environmental feedback. The effectiveness of the proposed model on the IEEE 14-bus system demonstrates that the CePPO achieves approximately 50% higher average rewards and 51% longer stability periods compared to standard PPO while reducing computational overhead by 35%. CePPO demonstrates superior performance under adversarial attacks compared to baseline approaches. The simulation results validate that CePPO’s adaptive parameter tuning and enhanced exploration strategies make it particularly well-suited for the dynamic nature of power grid control. To foster further research and reproducibility, the code is available upon request at https://github.com/Dr-Kate-Davis-s-Research-Team/DRL-CP.S</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3563419" target="_blank">https://dx.doi.org/10.1109/access.2025.3563419</a></p>

Adaptive PPO With Multi-Armed Bandit Clipping and Meta-Control for Robust Power Grid Operation Under Adversarial Attacks

مواد مشابهة