Oxford Events, the new replacement for OxTalks, will launch on 16th March. The two-week OxTalks freeze period starts on Monday 2nd March. During this time, there will be no facility to publish or edit events. The existing OxTalks site will remain available to view during this period. Once Oxford Events launches, you will need a Halo login to submit events. Full details are available on the Staff Gateway.
Effective economic decision-making depends on the brain’s capacity to integrate uncertainty about the state of the world and reward/loss outcomes. Various theories, such as expected utility and prospect theory, have been proposed to model economic preferences of the agents. However, a full characterization of the specific preference measures employed by agents has yet to be achieved. Moreover, while progress in neuroeconomics has been substantial, critical gaps persist in understanding how dopaminergic system computes such preference measures. Here, we present a framework motivated by an aspiration to bridge an analytical gap between system and cognitive level perspectives. We focus on multi-armed bandit tasks where we assume that the reward distribution in each arm is represented by the expected values of the encoding functions of dopamine neurons, a representational scheme referred to as distributed distributional coding (DDC). We extend the temporal difference (TD) learning to distributional DDC-TD learning and propose that the neural response of dopamine neurons can be characterized by the DDC-TD error. Using dopaminergic recordings from the ventral tegmental area (VTA) in mice, we show that unit activity broadly accords with the predictions of DDC-TD model. Building on these empirical observations, we go on to detail a neural network that utilizes DDC values associated with alternative choice options as input. In conjunction with a simple learning rule, this DDC network can infer a large class of preference/risk measures that an agent may adopt, including expected utility, conditional value at risk (expected tail gain), and compound utility-risk measures. A core feature of the network is that a dopaminergic preference computation emerges intrinsically within the network’s configuration, enabling any inferred preference measure to be computed from a mapping of the linear readouts of dopamine neurons. Thus, our framework provides a robust approach for inferring agents’ preference measures as well as detailing how dopaminergic system might compute these probabilistic measures.