Training results for the grid world environment.

<p>a) Evolution of the length of the trajectories during the training, for different scaling parameters ranging from −3 to 3, and different preference distributions: the agent can either learn to complete the task from the start (“task”), or first explore the grid (“explore”). We represent the...

Full description

Saved in:
Bibliographic Details
Main Author: Joséphine Pazem (22184363) (author)
Other Authors: Marius Krumm (22184366) (author), Alexander Q. Vining (11320591) (author), Lukas J. Fiderer (4865587) (author), Hans J. Briegel (6383642) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!