The behaviour of different reinforcement-learning models in a task environment in which unexpected and expected uncertainties were independently manipulated.

<p>All models converge reasonably well with the actual mean of variable rewards. The learning rate for the Rescorla-Wagner model (η, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e001" target="_blank">Eq 1</a>) i...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Boluwatife Ikwunne (22238697) (author)
مؤلفون آخرون: Jolie Parham (22238700) (author), Erdem Pulcu (517414) (author)
منشور في: 2025
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:<p>All models converge reasonably well with the actual mean of variable rewards. The learning rate for the Rescorla-Wagner model (η, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e001" target="_blank">Eq 1</a>) is 0.32. For the hybrid Pearce-Hall model, ω (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e002" target="_blank">Eq 2</a>) is 0.48 and λ is (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e004" target="_blank">Eq 3</a>) is 1.56. For the cubic model κ is 0.11 (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e005" target="_blank">Eqs 4</a>-<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e006" target="_blank">5</a>). For the exponential-logarithmic model, the parameters δ and λ are 0.83 and 1.45, respectively (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.e009" target="_blank">Eq 7</a>). Because models perform ever so comparably, their differences are illustrated in <b>Fig B in</b> <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1013445#pcbi.1013445.s001" target="_blank">S1 Text</a>, showing the average prediction error values relative to the simulated outcomes in the task environment. Note that, the simulation environment shown was generated only once, covering many possibilities of environmental volatility and noise, and their interaction, whereas the models were fitted iteratively until parameters minimising the average magnitude of the prediction error relative to the actual outcome sequence are identified.</p>