Time(s) and GFLOPs savings of single-agent tasks.

<div><p>Deep reinforcement learning has achieved significant success in complex decision-making tasks. However, the high computational cost of policies based on deep neural networks restricts their practical application. Specifically, each decision made by an agent requires a complete ne...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Hongjie Zhang (136127) (author)
مؤلفون آخرون:	Zhenyu Chen (2359471) (author), Hourui Deng (20685396) (author), Chaosheng Feng (20685399) (author)
منشور في:	2025
الموضوعات:	Medicine Biotechnology Sociology Developmental Biology Science Policy Environmental Sciences not elsewhere classified Biological Sciences not elsewhere classified Information Systems not elsewhere classified state skipping branch h ?\ rlkey h ?% 5crlkey establish optimization objectives deep reinforcement learning achieved significant success high computational cost div >< p algorithm significantly reduces computational cost utilize pre tuning techniques practical application policies based minimal impact mappo frameworks making tasks making patterns linear increase lazy actor involve reasoning human decision flops required decision made continuous decision complex decision complete tasks approximately 80 actor network
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

الوصف
الملخص:	<div><p>Deep reinforcement learning has achieved significant success in complex decision-making tasks. However, the high computational cost of policies based on deep neural networks restricts their practical application. Specifically, each decision made by an agent requires a complete neural network computation, leading to a linear increase in computational cost with the number of interactions and agents. Inspired by human decision-making patterns, which involve reasoning only on critical states in continuous decision-making tasks without considering all states, we introduce the <i>LazyAct</i> algorithm. This algorithm significantly reduces the number of inferences while preserving the quality of the policy. Firstly, we incorporate a state skipping branch into the actor network to bypass states with minimal impact. Subsequently, we establish optimization objectives for single-agent and multi-agents inference, incorporating cost constraints based on the IMPALA and MAPPO frameworks, respectively. Finally, we utilize pre-training and fine-tuning techniques to train the policy network. Extensive experimental results indicate that <i>LazyAct</i> reduces the number of inferences by approximately 80% and 40% in single-agent and multi-agents scenarios, respectively, while sustaining comparable policy performance. The inferences reduction significantly decreases the time and FLOPs required by the <i>LazyAct</i> algorithm to complete tasks. Code is available here <a href="https://www.dropbox.com/scl/fo/wyoqo6q9gyt86zobfgbvx/h?\rlkey=0moyxsnoiisfs9y4h89hsou1l&dl=0" target="_blank">https://www.dropbox.com/scl/fo/wyoqo6q9gyt86zobfgbvx/h?\rlkey=0moyxsnoiisfs9y4h89hsou1l&dl=0</a>.</p></div>

Time(s) and GFLOPs savings of single-agent tasks.

مواد مشابهة