Training results for the grid world environment.
<p>a) Evolution of the length of the trajectories during the training, for different scaling parameters ranging from −3 to 3, and different preference distributions: the agent can either learn to complete the task from the start (“task”), or first explore the grid (“explore”). We represent the...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|