Architecture and training of an FEPS agent.

<p>a) Architecture of a FEPS agent, with four sensory states (squares) and two possible actions (diamonds). The agent has two main components: the world model and the policy. The world model is composed of vertices representing observations (squares) while clone clips represent all values a be...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Joséphine Pazem (22184363) (author)
مؤلفون آخرون:	Marius Krumm (22184366) (author), Alexander Q. Vining (11320591) (author), Lukas J. Fiderer (4865587) (author), Hans J. Briegel (6383642) (author)
منشور في:	2025
الموضوعات:	Science Policy Biological Sciences not elsewhere classified Information Systems not elsewhere classified using internal rewards partially observable grid free energy principle expected free energy agents &# 8217 timed response task various reinforcement learning partially observable environments feps agents build interpretability </ p build upon model agents navigation task given task feps ). feps model xlink "> understanding aspects term goals target observation results show recent work prediction accuracy observations based multidisciplinary interest mathematical models last decade including elements constraints imposed complex environments behavioral biology appropriately contextualizing active inference
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

الوصف
الملخص:	<p>a) Architecture of a FEPS agent, with four sensory states (squares) and two possible actions (diamonds). The agent has two main components: the world model and the policy. The world model is composed of vertices representing observations (squares) while clone clips represent all values a belief state can take (circles). As in a clone-structured graph, each clone clip <i>b</i> relates to exactly one observation <i>s</i> and the emission function is deterministic. The clone clips, together with the set of edges between them, form an ECM. A belief state, circled in purple, is designated by an excited clone clip. The weighted edges in the ECM encode the transition function and are trainable with reinforcement: there is one set of edges per action (light and dark turquoise arrows). The belief state in the ECM is an input to the policy, where the probability of sampling an action is a function of the EFE. In turn, the action that was selected determines the edge set to sample from in the world model in order to make a prediction for the next belief state and observation. b) Training of the world model of a FEPS agent. The agent interacts with the environment by receiving observations and implementing actions. When an action <i>a</i><sub><i>t</i></sub> is chosen, a corresponding edge is sampled in the world model, from the current to the next belief state, conditioned on the action. The observation <i>s</i><sub><i>t</i> + 1</sub> associated with the next belief state is the prediction for the next sensory state. Simultaneously, the action is applied to the environment and creates a transition in the hidden states of the environment, (bottom, green rectangle). This transition is perceived by the agent through the observation . Finally, the weights of the edges are updated. The reinforcement of an edge is proportional to the number of correct predictions it enabled in a row, as depicted with the thickness of the arrows in the world model. When the agent makes an incorrect prediction (the purple arrow), the reinforcements are applied to the edges that contributed to the trajectory. The last, incorrect, edge is not reinforced.</p>

Architecture and training of an FEPS agent.

مواد مشابهة