The behaviour of different reinforcement-learning models in a task environment in which unexpected and expected uncertainties were independently manipulated.
<p>All models converge reasonably well with the actual mean of variable rewards. The learning rate for the Rescorla-Wagner model (η, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e001" target="_blank">Eq 1</a>) i...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , |
| منشور في: |
2025
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| الملخص: | <p>All models converge reasonably well with the actual mean of variable rewards. The learning rate for the Rescorla-Wagner model (η, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e001" target="_blank">Eq 1</a>) is 0.32. For the hybrid Pearce-Hall model, ω (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e002" target="_blank">Eq 2</a>) is 0.48 and λ is (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e004" target="_blank">Eq 3</a>) is 1.56. For the cubic model κ is 0.11 (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e005" target="_blank">Eqs 4</a>-<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e006" target="_blank">5</a>). For the exponential-logarithmic model, the parameters δ and λ are 0.83 and 1.45, respectively (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e009" target="_blank">Eq 7</a>). Because models perform ever so comparably, their differences are illustrated in <b>Fig B in</b> <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.s001" target="_blank">S1 Text</a>, showing the average prediction error values relative to the simulated outcomes in the task environment. Note that, the simulation environment shown was generated only once, covering many possibilities of environmental volatility and noise, and their interaction, whereas the models were fitted iteratively until parameters minimising the average magnitude of the prediction error relative to the actual outcome sequence are identified.</p> |
|---|