A distributional code for value in dopamine-based reinforcement learning
It has become well established that dopamine release reflects a reward prediction error, a surprise signal that drives learning of reward predictions and shapes future behavior. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. In the present work, we propose a significant modification of the standard reward prediction error theory. Inspired by recent artificial intelligence research on distributional reinforcement learning, we hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea leads immediately to a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.
Date: 23 October 2019, 13:30 (Wednesday, 2nd week, Michaelmas 2019)
Venue: Le Gros Clark Building, off South Parks Road OX1 3QX
Venue Details: Lecture Theatre
Speaker: Dr Zeb Kurth-Nelson (Google DeepMind)
Organiser: Moritz Moeller (University of Oxford)
Topics:
Booking required?: Not required
Audience: Members of the University only
Editor: Chaitanya Chintaluri