The average cumulative reward of algorithms.
<div><p>The H-beam riveting and welding work cell is an automated unit used for processing H-beams. By coordinating the gripping and welding robots, the work cell achieves processes such as riveting and welding stiffener plates, transforming the H-beam into a stiffened H-beam. In the con...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | <div><p>The H-beam riveting and welding work cell is an automated unit used for processing H-beams. By coordinating the gripping and welding robots, the work cell achieves processes such as riveting and welding stiffener plates, transforming the H-beam into a stiffened H-beam. In the context of intelligent manufacturing, there is still significant potential for improving the productivity of riveting and welding tasks in existing H-beam riveting and welding work cells. In response to the multi-agent system of the H-beam riveting and welding work cell, a recurrent multi-agent proximal policy optimization algorithm (rMAPPO) is proposed to address the multi-agent scheduling problem in the H-beam processing. The algorithm employs recurrent neural networks to capture and process historical information. Action masking is used to filter out invalid states and actions, while a shared reward mechanism is adopted to balance cooperation efficiency among agents. Additionally, value function normalization and adaptive learning rate strategies are applied to accelerate convergence. This paper first analyzes the H-beam processing flow and appropriately simplifies it, develops a reinforcement learning environment for multi-agent scheduling, and applies the rMAPPO algorithm to make scheduling decisions. The effectiveness of the proposed method is then verified on both the physical work cell for riveting and welding and its digital twin platform, and it is compared with other baseline multi-agent reinforcement learning methods (MAPPO, MADDPG, and MASAC). Experimental results show that, compared with other baseline methods, the rMAPPO-based agent scheduling method can reduce robot waiting times more effectively, demonstrate greater adaptability in handling different riveting and welding tasks, and significantly enhance the manufacturing efficiency of stiffened H-beam.</p></div> |
|---|